Intro

Background

Hamilton, chapters 15-16 A commond decomposition of macroeconomic time series is into trend and cycle. #+latex:\\~\\ If $Y^T$ corresponds to real per capita GDP $gdp_t$ of the United States. According to this components approach to time series, $y_t$ is expressed as $$ y_t = \ln gdp_t = trend_t + fluctuations_t $$ we will examine regression techniques that decompose $y_t$ in a trend and a cyclical component.

An identification problem

what features of the time series do we regard as trend and what do we regard as fluctuations around the trend?

Let's guess a linear deterministic time trend: $$ y_t = \beta_1 + \beta_2 t + u_t $$

A decomposition of $y_t$ into trend and fluctuations can be obtained by estimating $\beta_1$ and $\beta_2$: \begin{eqnarray} y_t &=& \widehat{trend}_t + \widehat{fluctuations}_t \nonumber \\ &=& (\hat{\beta}_1 + \hat{\beta}_2 t) + ( y_t - \hat{\beta}_1 - \hat{\beta}_2t). \end{eqnarray}

When $y_t$ is logged, the coefficient $\beta_2$ has the interpretation of an average growth rate.

Deterministic Trend Model

Consider the deterministic trend model $$ y_t = \beta_1 + \beta_2 t + u_t $$ with $\mathbb E[u_t]=0$ and $var[u_t] = \sigma^2$. There are several difficulties associated with the large sample analysis of the OLS estimators $\hat{\beta}_{1,T}$ and $\hat{\beta}_{2,T}$. Taking $x_t = [1, t]'$,

  1. The matrix $\frac{1}{T} \sum x_t x_t'$ does not converge to a non-singular matrix $Q$.
  1. In a time series model, the disturbances $u_t$ are in general
  2. dependent. This will change the limiting distribution of quantities such as $\sqrt{T} \frac{1}{T} \sum x_t u_t$.
  1. If the $u_t$'s are serially correlated, then the OLS estimator will in general be inefficient.

Asymptotic Stuff

Rates of Convergence for OLS Estimator

Roughly speaking, convergence rates tell us how fast we can learn the ``true'' value of a parameter in a sampling experiment. #+latex:\\~\\ "standard" OLS then the variance of the $\hat\beta$ converges to zero at rate $1/T$. #+latex:\\~\\ This isn't true for models with deterministic trends. #+latex:\\~\\ Let's look at the distributions of $\sqrt{T}(\hat\beta_0 - \beta_0)$ and $\sqrt{T}(\hat\beta_1 - \beta_1)$

A Monte Carlo

import matplotlib.pyplot as plt import pandas as p import statsmodels.api as sm N = 1000 beta0, beta1 = 0.2, 0.5

def monte_carlo(T): t = np.arange(T) y = beta0 + beta1 * t + np.random.normal(size=T) X = sm.add_constant(t) mod = sm.OLS(y,X).fit() return mod.params

res = {} for T in [50, 500, 2000]: t_params = [monte_carlo(T) for _ in range(N)] res[T] = p.DataFrame(t_params, columns=['beta0_hat', 'beta1_hat']) res[T].beta0_hat = np.sqrt(T)*(res[T].beta0_hat - beta0) res[T].beta1_hat = np.sqrt(T)*(res[T].beta1_hat - beta1)

fig, ax = plt.subplots(nrows=1, ncols=2); fig.set_size_inches(12,6) ax[0].set_title(r'$\sqrt{T}(\hat\beta_0 - \beta_0)$',fontsize=20) [res[T].beta0_hat.plot(kind='kde',ax=ax[0]) for T in [50,500,2000]];

ax[1].set_title(r'$\sqrt{T}(\hat\beta_1 - \beta_1)$',fontsize=20) [res[T].beta1_hat.plot(kind='kde',ax=ax[1]) for T in [50,500,2000]];

asymptotics.png

Some math

Facts: \begin{eqnarray} \sum_{t=1}^T 1 = T,\quad \sum_{t=1}^T t = T(T+1)/2, \quad\sum_{t=1}^T t^2 = T(T+1)(2T+1)/6. \end{eqnarray}

(Assume $u_t's$ are independently distributed.)

\[ \frac{1}{T} \sum x_t x_t' = \frac{1}{T} \left( \begin{array}{cc} \sum 1 & \sum t \\ \sum t & \sum t^2 \end{array} \right) \] are not convergent!

On the other hand \[ \frac{1}{T^3} \sum x_t x_t' \longrightarrow \left( \begin{array}{cc} 0 & 0 \\ 0 & 1/3 \end{array} \right) \] which is singular and not invertible!

*Message*: Trends change the rate of convergence of estimators!

More on Rates of Convergence

It turns out that $\hat{\beta}_{1,T}$ and $\hat{\beta}_{2,T}$ have different asymptotic rates of convergence. In particular, we will learn faster about the slope of the trend line than the intercept.

To analyze the asymptotic behavior of the estimators we define the matrix \[ G_T = \left( \begin{array}{cc} 1 & 0 \\ 0 & T \end{array} \right). \] Note that the matrix is equivalent to its transpose, that is, $G_T = G_T'$.

Asymptotic Distributions

We will analyze the following quantity \[ G_T(\hat{\beta}_T - \beta) = \left( \frac{1}{T} \sum G_T^{-1} x_t x_t'G_T^{-1} \right)^{-1} \left( \frac{1}{T}\sum G_T^{-1} x_t u_t \right). \] It can be easily verified that \[ \frac{1}{T} \sum G_T^{-1} x_t x_t' G_T^{-1} = \frac{1}{T} \left( \begin{array}{cc} \sum 1 & \sum t/T \\ \sum t/T & \sum (t/T)^2 \end{array} \right) \longrightarrow Q, \] where \[ Q = \left( \begin{array}{cc} 1 & 1/2 \\ 1/2 & 1/3 \end{array} \right). \]

Standardization

The term $\frac{1}{T} \sum G_T^{-1} x_t u_t$ has the components $\frac{1}{T} \sum u_t$ and $\frac{1}{T} \sum (t/T) u_t$ which converge in probability to zero based on the weak law of large numbers for non-identically distributed random variables..

*Note*: Without the proper standardization $\frac{1}{T} \sum t u_t$ will not converge to its expected value of zero. The variance of the random variable $T u_T$ is getting larger and larger with sample size which prohibits the convergence of the sample mean to its expectation. $\Box$

*Result*: Suppose \[ y_t = \beta_1 + \beta_2 t + u_t, \quad u_t \sim iid(0,\sigma^2). \] Let $\hat{\beta}_{i,T}$, $i=1,2$ be the OLS estimators of the intercept and slope coefficient, respectively. Then \begin{eqnarray} \hat{\beta}_{1,T} -\beta_1 & \stackrel{p}{\longrightarrow} & 0 \label{eq_rint}\\ T(\hat{\beta}_{2,T} - \beta_2) & \stackrel{p}{\longrightarrow} & 0 \label{eq_rslp}. \quad \Box \end{eqnarray}

CLT

I'm not going to show the details of proof for CLT, but

  • We use a CLT for independently but not identically distributed random variables (Liapounov)
  • Also, Cramer and Wold device that can be used to deduce the convergence of a random vector
  • based on the convergence of arbitrary linear combinations of its elements.

Note

*Result* \[ y_t = \beta_1 + \beta_2 t + u_t, \quad u_t \sim iid(0,\sigma^2). \] Let $\hat{\beta}_{i,T}$, $i=1,2$ be the OLS estimators of the intercept and slope coefficient, respectively. The sampling distribution of the OLS estimators has the following large sample behavior \[ \sqrt{T} G_T (\hat{\beta}_T - \beta) \Longrightarrow {\cal N}(0,\sigma^2Q^{-1}) \] This is equivalent to \[ \left[ \begin{array}{c} \sqrt{T}(\hat{\beta}_{1,T} - \beta) \\ T^{3/2}(\hat{\beta}_{2,T} - \beta_2) \end{array} \right] \Longrightarrow {\cal N} \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \sigma^2 \left[ \begin{array}{cc} 4 & -6 \\ -6 & 12 \end{array} \right] \right). \quad \Box \]

Having Said All this

We we consider this case where the variance is unknown:

\[ \hat \sigma^2 = \frac{1}{T-2}\sum(y_t - \hat\beta_1 - \hat\beta_2 t)^2 \]

Despite the fact that $\beta_1$ and $\beta_2$ have different asymptic rates of convergence, the t statistics still have $N(0,1)$ limited distribution because the standard error estimates have offsetting behaviour.

OLS and Serial Dependence

\[ y_t = \beta t + u_t \] $u_t$ are serially correlated, that is, $\mathbb E[u_{t}u_{t-h}] \not= 0$ for some $h \implies$ OLS not efficient.

Let's look at example with $MA(1)$ errors. \[ u_t = \epsilon_t + \theta \epsilon_{t-1}, \quad \epsilon_{t} \sim iid(0,\sigma^2_\epsilon). \] can verify

\begin{eqnarray} \mathbb E[u_t^2] & = & \mathbb E[(\epsilon_t + \theta \epsilon_{t-1})^2] = (1+\theta^2)\sigma^2_\epsilon \\ \mathbb E[u_tu_{t-1}] & = & \mathbb E[ (\epsilon_t + \theta \epsilon_{t-1})(\epsilon_{t-1} + \theta \epsilon_{t-2})] = \theta \sigma^2_\epsilon \\ \mathbb E[u_tu_{t-h}] & = & 0 \quad h > 1. \end{eqnarray}

The OLS estimator

Denominator

\[ \hat{\beta}_T - \beta = \frac{\sum tu_t}{\sum t^2}. \] To find the limiting distribution, note that \[ \frac{1}{T^3} \sum_{t=1}^T t^2 = \frac{T(T+1)(2T+1)}{6T} \longrightarrow \frac{1}{3}. \] The denominator can be manipulated as follows \begin{eqnarray} \sum t u_t & = & \sum t(\epsilon_t + \theta \epsilon_{t-1}) \nonumber \\ & = & \begin{array}{ccccc} 0 & \epsilon_1 & +2\epsilon_2 & 3\epsilon_3 & + \ldots \\ \theta \epsilon_0 & 2 \theta \epsilon_1 & + 3 \theta \epsilon_2 & + 4 \theta \epsilon_3 & + \ldots \end{array} \nonumber \\ & = & \sum_{t=1}^{T-1} (t + \theta(t+1)) \epsilon_t \; + \theta \epsilon_0 + T \epsilon_T \nonumber \\ & = & \sum_{t=1}^{T-1} (1+\theta) t \epsilon_t + \sum_{t=1}^{T-1} \theta \epsilon_t + \theta \epsilon_0 + T \epsilon_T \nonumber \\ & = & \sum_{t=1}^T (1+\theta) t \epsilon_t \underbrace{ - \theta T \epsilon_T + \theta \sum_{t=1}^T \epsilon_{t-1} }_{\mbox{asymp. negligible}}. \end{eqnarray}

OLS, Continued

After standardization by $T^{-3/2}$ we obtain

    \begin{eqnarray*} T^{-3/2} \sum tu_t = \frac{1}{\sqrt{T}} (1+\theta) \sum_{t=1}^T (t/T) \epsilon_t
  • \frac{1}{\sqrt{T}} \theta \epsilon_T + \frac{\theta}{T} \frac{1}{\sqrt{T}} \sum_{t=1}^T \epsilon_{t-1}.
  • \end{eqnarray*}
  • First term obeys CLT
  • Second Term goes to zero
  • Third Term goes to zero

Remark

Consider the following model with $iid$ disturbances \[ y_t = \beta t + u_t, \quad u_t \sim iid(0,\sigma_\epsilon^2(1+\theta^2)). \] The unconditional variance of the disturbances is the same as in the model with moving average disturbances. It can be verified that

\[ T^{3/2}( \hat{\beta}_T - \beta) \Longrightarrow \big(0,3\sigma^2_\epsilon (1+\theta^2) \big). \]

If $\theta$ is positive then the limit variance of the OLS estimator in the model with $iid$ disturbances is smaller than in the trend model with moving average disturbances.

Positive serial correlated data are less informative than $iid$ data.

Stochastic Trends

We looked at stationary model and deterministic trend models so far. Now

we will examine univariate models with a stochastic trend of the form

\[ y_t = \phi_0 + y_{t-1} + \epsilon_t \quad \epsilon_t \sim iid(0,\sigma^2) \]

This particular model is called a random walk with drift.

The variable $y_t$ is said to be integrated of order one.

Cointegration

Moreover, we will consider bivariate models with a common stochastic trend \begin{eqnarray} y_{1,t} & = & \gamma y_{2,t} + u_{1,t} \\ y_{2,t} & = & y_{2,t-1} + u_{2,t} \end{eqnarray} where $[u_{1,t}, u_{2,t}]' \sim iid(0,\Omega)$. Both $y_{1,t}$ and $y_{2,t}$ have a stochastic trend. However, there exists a linear combination of $y_{1,t}$ and $y_{2,t}$, namely, \[ y_{1,t} - \gamma y_{2,t} = u_t \] that is stationary. Therefore, $y_{1,t}$ and $y_{2,t}$ are called _cointegrated_.

Background

In the late 80s and early 90s, this was a super hot research area.

  • Dickey and Fuller (1979) examined the sampling distribution of
  • estimators for autoregressive time series with a unit root and provided tables with critical values for unit root tests.
  • In 1986 and 1987, Phillips published two papers on spurious
  • regression and time series regressions with a unit root that employ the mathematical theory of convergence of probability measures for metric spaces. This marks a ``technological breakthrough'' and the field started to grow at an exponential rate thereafter.

Three Choices

    Consider the first order autoregressive model with mean zero: \[ y_t = \phi y_{t-1} + \epsilon_t, \quad \epsilon_t \sim iid{\cal N}(0,\sigma^2) \] Three cases
  • $|\phi|<1$: stationarity! we talked about this last week
  • $|\phi|>1$: explosive! We will not analyze explosive processes in
  • this course.
  • $|\phi|=1$. This is the unit root and will be the focus of this
  • part of the lecture. If $\phi=1$ then the AR(1) model simplifies to

\[ y_t = y_{t-1} + \epsilon_t \]

With $\Delta=1-L$, we have $\Delta y_t = \epsilon_t$ form a stationary process, the random walk is called integrated of order one, denoted by $I(1)$.

Difference b/w Stationary AR and Unit Root

Suppose that the AR process is initialized by $y_0 \sim {\cal N}(0,1)$. Then $y_t$ can be expressed as

\[ y_t = \phi^{t} y_0 + \sum_{\tau=1}^t \phi^{\tau -1} \epsilon_{t+1-\tau} \]

  • The unconditional mean of $y_t$ is given by
  • \[ \mathbb E[y_t] = \phi^{t-1} \mathbb E[y_0] + \sum_{\tau=1}^t \phi^{\tau -1} \mathbb E[\epsilon_\tau] = 0 \]

Differences, continued

The unconditional variance is $y_t$ is given by \begin{eqnarray} var[y_t] & = & \phi^{2(t-1)} var[y_0] + \sum_{\tau=1}^t \phi^{2(\tau-1)} var[\epsilon_\tau] \\ & = & \phi^{2(t-1)} var[y_0] + \sigma^2 \sum_{\tau=1}^t \phi^{2(\tau-1)} \nonumber \\ & = & \left\{ \begin{array}{lclcl} \phi^{2(t-1)} var[y_0] + \sigma^2 \frac{1 - \phi^{2t}}{1 - \phi^2} & \longrightarrow & \frac{\sigma^2}{1-\phi^2} & \mbox{if} & |\phi| < 1 \\ var[y_0] + \sigma^2 t & \longrightarrow & \infty & \mbox{if} & |\phi| =1 \end{array} \right. \nonumber \end{eqnarray} as $t \rightarrow \infty$.

Differences, continued

The conditional expectation of $y_{t}$ given $y_0$ is \begin{eqnarray*} \mathbb E[y_{t}|y_0] = \phi^{\tau-1} y_0 \longrightarrow \left\{ \begin{array}{lcl} 0 & \mbox{if} & |\phi| < 1 \\ y_0 & \mbox{if} & \phi = 1 \end{array} \right\} \end{eqnarray*} as $t \rightarrow \infty$.

In the unit root case, the best prediction of future $y_{t}$ is the initial $y_0$ at all horizons, that is, ``no change''.

In the stationary case, the conditional expectation converges to the unconditional mean. For this reason, stationary processes are also called ``mean reverting''.

Result

Stationary and unit root processes differ in their behavior over long time horizons. Suppose that $\sigma^2=1$, and $y_0=1$. Then the conditional mean and variance of a process $y_t$ with $\phi = 0.995$ is given by

Horizon $t$ 1 2 5 10 20 50 100
$\mathbb E[y_t y_0]$ 0.995 0.990 0.975 0.951 0.905 0.778 0.606
$var[y_t y_0]$ 1.000 1.990 4.901 9.563 18.21 39.52 63.46

If interestered in long run predictions, very important to distringuish these two cases.

But note: long run predictions face serious extrapolation problem.

Frequentist Approach

To get a unit root test of the null hypothesis $H_0: \phi = 1$, we have to find the sampling distribution of a suitable test statistic such as the $t$ ratio \[ \frac{\hat{\phi}_T -1}{\sqrt{ \sigma^2 / \sum y_{t-1}^2 }} \] Under the generating mechanism \[ y_t = \phi_0 + y_{t-1} + \epsilon_t, \quad iid(0,\sigma^2) \] _For stationary processes used a variety of WLLN and CLTs_, unfortunately, these don't apply.

Heuristic Overview of Asymptotics

Assume that $\phi_0 = 0$, $\sigma = 1$, and $y_0 = 0$. Thus, the process $y_t$ can be represented as \[ y_T = \sum_{t=1}^T \epsilon_t \] Summations will range from $t=1$ to $T$ unless stated otherwise. The central limit theorem for $iid$ random variables implies \[ \frac{y_T}{\sqrt{T} } = \frac{1}{\sqrt{T}} \sum \epsilon_t \Longrightarrow {\cal N}(0,1) \] This suggests that \[ \frac{1}{T} \sum y_t = \frac{1}{\sqrt{T}} \sum \left[ \sqrt{ \frac{t}{T} } \frac{1}{\sqrt{t}} \sum_{\tau =1}^t \epsilon_\tau \right] \] will not converge to a constant in probability but instead to a random variable.

Need a more elegant approach!

A Twist on our framework

We used $T = \{0, \pm 1, \pm 2, \ldots\}$.

Consider $S = [0,1]$. Consider random elements $W(t)$ that correspond to functions this interval.

We will place some probability $Q$ on these functions and show that $Q$ can be helpful in the approximation of the distribution of $\sum y_t$

Defining probability distributions on function spaces is a pain.

Wiener Measure

Let ${\cal C}$ be the space of continuous functions on the interval $[0,1]$.

We will define a probability distribution for the function space ${\cal C}$.

This probability distribution is called ``Wiener measure''.

Whenever we draw an element from the probability space we obtain a function $W(s)$, $s \in [0,1]$. Let $Q[ \cdot ]$ denote the expectation operator under the Wiener measure.

Properties of $W(s)$

  • If we repeatedly draw functions under the Wiener measure and
  • evaluate these functions at a particular value $s = s'$, then \[ Q[ \{ W(s') \le w \} ] = \frac{1}{\sqrt{2\pi s'}} \int_{-\infty}^{w} e^{- u^2/2s'} du\] that is, \[ W(s') \sim {\cal N}(0,s') \] If $s'=0$ then the equations is interpreted to mean $Q[\{W(0)=0 \}] = 1$. Thus $W(0) = 0$ with probability one.

Properties of $W(s)$

  • The random function $W(s)$ has independent increments. If \[ 0 \le
  • s_1 \le s_2 \le \ldots \le s_k \le 1 \] Then the random variables \[ W(s_2) - W(s_1), \; W(s_3) - W(s_2)\; \ldots , \; W(s_k) - W(s_{k-1}) \] are independent. #+latex:~\\
  • The random function $W(s)$ is continuous on $s \in [0,1]$.
  • Otherwise, contradiction.

More of $W(s)$

It can be shown that there indeed exists a probability distribution on ${\cal C}$ with these properties.

Rougly speaking, the Wiener measure is to the theory of stochastic processes, what the normal distribution is to the theory related to real valued random variables.

Note: $W(1) \sim {\cal N}(0,1)$.

Relating this back to our discrete processes

Define the partial sum process \[ Y_T(s) = \frac{1}{\sqrt{T}} \sum \{ t \le \lfloor Ts \rfloor \} \epsilon_t \] where $\lfloor x \rfloor$ denotes the integer part of $x$. Since we assumed that $\epsilon_t \sim iid(0,1)$, the partial sum process is a random step function.

    Interpolation: \begin{eqnarray*} \bar{Y}_T(s) = \frac{1}{\sqrt{T}} \sum \{ t \le \lfloor Ts \rfloor \} \epsilon_t
  • (Ts - \lfloor Ts \rfloor ) \epsilon_{\lfloor Ts \rfloor + 1} / \sqrt{T}
  • \end{eqnarray*}

Two ways to randomly generate continuous functions

  • Draw a function $W(s)$ from the Wiener distribution. We did not examine how to do the sampling in practice, but since the Wiener distribution is well-defined, it is theoretically possible.
  • Generate a sequence $\epsilon_1, \ldots, \epsilon_T$, where $\epsilon_t \sim iid(0,1)$ and compute $\bar{Y}_T(s)$.

As $T\longrightarrow\infty$, these are basically the same.

\begin{eqnarray*} Y_T(s) = \frac{1}{\sigma \sqrt{T} } \sum_{t=1}^T \{ t \le \lfloor Ts \rfloor \} \epsilon_t \Longrightarrow W(s) \quad \Box \end{eqnarray*}

Simulation of Wiener Process

import scipy import matplotlib.pyplot as plt

N = 100 T = 1. Delta = T/N

W = scipy.zeros(N+1)

t = scipy.linspace(0, T, N+1); W[1:N+1] = scipy.cumsum(scipy.sqrt(Delta)*scipy.random.standard_normal(N))

#print( "Simulation of the Wiener process:\n", W)

plt.plot(t,W) plt.xlabel('t') plt.ylabel('W') plt.title('Wiener process')

wiener.png

The upshot

The sum \[ \frac{1}{T} \sum y_{t-1} \epsilon_t \] convergences to a _stochastic integral_; i.e.,

Suppose that $y_t = y_{t-1} + \epsilon_t$, where $\epsilon_t \sim iid(0,\sigma^2)$ and $y_0=0$. Then \[ \frac{1}{\sigma^2 T} \sum y_{t-1} \epsilon_t \Longrightarrow \int W(s) dW(s) \] where $W(s)$ denotes a standard Wiener process.

we can use this to develop tests!

Theorem

Suppose that $y_t = \phi y_{t-1} + \epsilon_t$, where $\epsilon_t \sim iid(0,\sigma^2)$, $\phi=1$, and $y_0=0$. The sampling distribution of the OLS estimator $\hat{\phi}_T$ of the autoregressive parameter $\phi=1$ and the sampling distribution of the corresponding $t$-statistic have the following asymptotic approximations \begin{eqnarray} z(\hat{\phi}_T) & \Longrightarrow & \frac{\frac{1}{2} ( W(1)^2 -1 ) }{ \int_0^1 W(s)^2 ds } \\ t(\hat{\phi}_T) & \Longrightarrow & \frac{\frac{1}{2} ( W(1)^2 -1 ) }{ \left[ \int_0^1 W(s)^2 ds \right]^{1/2} } \end{eqnarray}

where $W(s)$ denotes a standard Wiener process. $\Box$