Hamilton, chapters 15-16 A commond decomposition of macroeconomic time series is into trend and cycle. #+latex:\\~\\ If $Y^T$ corresponds to real per capita GDP $gdp_t$ of the United States. According to this components approach to time series, $y_t$ is expressed as $$ y_t = \ln gdp_t = trend_t + fluctuations_t $$ we will examine regression techniques that decompose $y_t$ in a trend and a cyclical component.
what features of the time series do we regard as trend and what do we regard as fluctuations around the trend?
Let's guess a linear deterministic time trend: $$ y_t = \beta_1 + \beta_2 t + u_t $$
A decomposition of $y_t$ into trend and fluctuations can be obtained by estimating $\beta_1$ and $\beta_2$: \begin{eqnarray} y_t &=& \widehat{trend}_t + \widehat{fluctuations}_t \nonumber \\ &=& (\hat{\beta}_1 + \hat{\beta}_2 t) + ( y_t - \hat{\beta}_1 - \hat{\beta}_2t). \end{eqnarray}
When $y_t$ is logged, the coefficient $\beta_2$ has the interpretation of an average growth rate.
Consider the deterministic trend model $$ y_t = \beta_1 + \beta_2 t + u_t $$ with $\mathbb E[u_t]=0$ and $var[u_t] = \sigma^2$. There are several difficulties associated with the large sample analysis of the OLS estimators $\hat{\beta}_{1,T}$ and $\hat{\beta}_{2,T}$. Taking $x_t = [1, t]'$,
Roughly speaking, convergence rates tell us how fast we can learn the ``true'' value of a parameter in a sampling experiment. #+latex:\\~\\ "standard" OLS then the variance of the $\hat\beta$ converges to zero at rate $1/T$. #+latex:\\~\\ This isn't true for models with deterministic trends. #+latex:\\~\\ Let's look at the distributions of $\sqrt{T}(\hat\beta_0 - \beta_0)$ and $\sqrt{T}(\hat\beta_1 - \beta_1)$
import matplotlib.pyplot as plt import pandas as p import statsmodels.api as sm N = 1000 beta0, beta1 = 0.2, 0.5
def monte_carlo(T): t = np.arange(T) y = beta0 + beta1 * t + np.random.normal(size=T) X = sm.add_constant(t) mod = sm.OLS(y,X).fit() return mod.params
res = {} for T in [50, 500, 2000]: t_params = [monte_carlo(T) for _ in range(N)] res[T] = p.DataFrame(t_params, columns=['beta0_hat', 'beta1_hat']) res[T].beta0_hat = np.sqrt(T)*(res[T].beta0_hat - beta0) res[T].beta1_hat = np.sqrt(T)*(res[T].beta1_hat - beta1)
fig, ax = plt.subplots(nrows=1, ncols=2); fig.set_size_inches(12,6) ax[0].set_title(r'$\sqrt{T}(\hat\beta_0 - \beta_0)$',fontsize=20) [res[T].beta0_hat.plot(kind='kde',ax=ax[0]) for T in [50,500,2000]];
ax[1].set_title(r'$\sqrt{T}(\hat\beta_1 - \beta_1)$',fontsize=20) [res[T].beta1_hat.plot(kind='kde',ax=ax[1]) for T in [50,500,2000]];
Facts: \begin{eqnarray} \sum_{t=1}^T 1 = T,\quad \sum_{t=1}^T t = T(T+1)/2, \quad\sum_{t=1}^T t^2 = T(T+1)(2T+1)/6. \end{eqnarray}
(Assume $u_t's$ are independently distributed.)
\[ \frac{1}{T} \sum x_t x_t' = \frac{1}{T} \left( \begin{array}{cc} \sum 1 & \sum t \\ \sum t & \sum t^2 \end{array} \right) \] are not convergent!
On the other hand \[ \frac{1}{T^3} \sum x_t x_t' \longrightarrow \left( \begin{array}{cc} 0 & 0 \\ 0 & 1/3 \end{array} \right) \] which is singular and not invertible!
*Message*: Trends change the rate of convergence of estimators!
It turns out that $\hat{\beta}_{1,T}$ and $\hat{\beta}_{2,T}$ have different asymptotic rates of convergence. In particular, we will learn faster about the slope of the trend line than the intercept.
To analyze the asymptotic behavior of the estimators we define the matrix \[ G_T = \left( \begin{array}{cc} 1 & 0 \\ 0 & T \end{array} \right). \] Note that the matrix is equivalent to its transpose, that is, $G_T = G_T'$.
We will analyze the following quantity \[ G_T(\hat{\beta}_T - \beta) = \left( \frac{1}{T} \sum G_T^{-1} x_t x_t'G_T^{-1} \right)^{-1} \left( \frac{1}{T}\sum G_T^{-1} x_t u_t \right). \] It can be easily verified that \[ \frac{1}{T} \sum G_T^{-1} x_t x_t' G_T^{-1} = \frac{1}{T} \left( \begin{array}{cc} \sum 1 & \sum t/T \\ \sum t/T & \sum (t/T)^2 \end{array} \right) \longrightarrow Q, \] where \[ Q = \left( \begin{array}{cc} 1 & 1/2 \\ 1/2 & 1/3 \end{array} \right). \]
The term $\frac{1}{T} \sum G_T^{-1} x_t u_t$ has the components $\frac{1}{T} \sum u_t$ and $\frac{1}{T} \sum (t/T) u_t$ which converge in probability to zero based on the weak law of large numbers for non-identically distributed random variables..
*Note*: Without the proper standardization $\frac{1}{T} \sum t u_t$ will not converge to its expected value of zero. The variance of the random variable $T u_T$ is getting larger and larger with sample size which prohibits the convergence of the sample mean to its expectation. $\Box$
*Result*: Suppose \[ y_t = \beta_1 + \beta_2 t + u_t, \quad u_t \sim iid(0,\sigma^2). \] Let $\hat{\beta}_{i,T}$, $i=1,2$ be the OLS estimators of the intercept and slope coefficient, respectively. Then \begin{eqnarray} \hat{\beta}_{1,T} -\beta_1 & \stackrel{p}{\longrightarrow} & 0 \label{eq_rint}\\ T(\hat{\beta}_{2,T} - \beta_2) & \stackrel{p}{\longrightarrow} & 0 \label{eq_rslp}. \quad \Box \end{eqnarray}
I'm not going to show the details of proof for CLT, but
*Result* \[ y_t = \beta_1 + \beta_2 t + u_t, \quad u_t \sim iid(0,\sigma^2). \] Let $\hat{\beta}_{i,T}$, $i=1,2$ be the OLS estimators of the intercept and slope coefficient, respectively. The sampling distribution of the OLS estimators has the following large sample behavior \[ \sqrt{T} G_T (\hat{\beta}_T - \beta) \Longrightarrow {\cal N}(0,\sigma^2Q^{-1}) \] This is equivalent to \[ \left[ \begin{array}{c} \sqrt{T}(\hat{\beta}_{1,T} - \beta) \\ T^{3/2}(\hat{\beta}_{2,T} - \beta_2) \end{array} \right] \Longrightarrow {\cal N} \left( \left[ \begin{array}{c} 0 \\ 0 \end{array} \right], \sigma^2 \left[ \begin{array}{cc} 4 & -6 \\ -6 & 12 \end{array} \right] \right). \quad \Box \]
We we consider this case where the variance is unknown:
\[ \hat \sigma^2 = \frac{1}{T-2}\sum(y_t - \hat\beta_1 - \hat\beta_2 t)^2 \]
Despite the fact that $\beta_1$ and $\beta_2$ have different asymptic rates of convergence, the t statistics still have $N(0,1)$ limited distribution because the standard error estimates have offsetting behaviour.
\[ y_t = \beta t + u_t \] $u_t$ are serially correlated, that is, $\mathbb E[u_{t}u_{t-h}] \not= 0$ for some $h \implies$ OLS not efficient.
Let's look at example with $MA(1)$ errors. \[ u_t = \epsilon_t + \theta \epsilon_{t-1}, \quad \epsilon_{t} \sim iid(0,\sigma^2_\epsilon). \] can verify
\begin{eqnarray} \mathbb E[u_t^2] & = & \mathbb E[(\epsilon_t + \theta \epsilon_{t-1})^2] = (1+\theta^2)\sigma^2_\epsilon \\ \mathbb E[u_tu_{t-1}] & = & \mathbb E[ (\epsilon_t + \theta \epsilon_{t-1})(\epsilon_{t-1} + \theta \epsilon_{t-2})] = \theta \sigma^2_\epsilon \\ \mathbb E[u_tu_{t-h}] & = & 0 \quad h > 1. \end{eqnarray}
\[
\hat{\beta}_T - \beta = \frac{\sum tu_t}{\sum t^2}.
\]
To find the limiting distribution, note that
\[
\frac{1}{T^3} \sum_{t=1}^T t^2 = \frac{T(T+1)(2T+1)}{6T} \longrightarrow \frac{1}{3}.
\]
The denominator can be manipulated as follows
\begin{eqnarray}
\sum t u_t
& = & \sum t(\epsilon_t + \theta \epsilon_{t-1}) \nonumber \\
& = & \begin{array}{ccccc}
0 & \epsilon_1 & +2\epsilon_2 & 3\epsilon_3 & + \ldots \\
\theta \epsilon_0 & 2 \theta \epsilon_1 & + 3 \theta \epsilon_2 & + 4 \theta \epsilon_3 & + \ldots
\end{array} \nonumber \\
& = & \sum_{t=1}^{T-1} (t + \theta(t+1)) \epsilon_t \; + \theta \epsilon_0 + T \epsilon_T \nonumber \\
& = & \sum_{t=1}^{T-1} (1+\theta) t \epsilon_t + \sum_{t=1}^{T-1} \theta \epsilon_t + \theta \epsilon_0 + T \epsilon_T \nonumber \\
& = & \sum_{t=1}^T (1+\theta) t \epsilon_t \underbrace{ - \theta T \epsilon_T + \theta \sum_{t=1}^T \epsilon_{t-1} }_{\mbox{asymp. negligible}}.
\end{eqnarray}
After standardization by $T^{-3/2}$ we obtain
Consider the following model with $iid$ disturbances \[ y_t = \beta t + u_t, \quad u_t \sim iid(0,\sigma_\epsilon^2(1+\theta^2)). \] The unconditional variance of the disturbances is the same as in the model with moving average disturbances. It can be verified that
\[ T^{3/2}( \hat{\beta}_T - \beta) \Longrightarrow \big(0,3\sigma^2_\epsilon (1+\theta^2) \big). \]
If $\theta$ is positive then the limit variance of the OLS estimator in the model with $iid$ disturbances is smaller than in the trend model with moving average disturbances.
Positive serial correlated data are less informative than $iid$ data.
We looked at stationary model and deterministic trend models so far. Now
we will examine univariate models with a stochastic trend of the form
\[ y_t = \phi_0 + y_{t-1} + \epsilon_t \quad \epsilon_t \sim iid(0,\sigma^2) \]
This particular model is called a random walk with drift.
The variable $y_t$ is said to be integrated of order one.
Moreover, we will consider bivariate models with a common stochastic trend \begin{eqnarray} y_{1,t} & = & \gamma y_{2,t} + u_{1,t} \\ y_{2,t} & = & y_{2,t-1} + u_{2,t} \end{eqnarray} where $[u_{1,t}, u_{2,t}]' \sim iid(0,\Omega)$. Both $y_{1,t}$ and $y_{2,t}$ have a stochastic trend. However, there exists a linear combination of $y_{1,t}$ and $y_{2,t}$, namely, \[ y_{1,t} - \gamma y_{2,t} = u_t \] that is stationary. Therefore, $y_{1,t}$ and $y_{2,t}$ are called _cointegrated_.
In the late 80s and early 90s, this was a super hot research area.
\[ y_t = y_{t-1} + \epsilon_t \]
With $\Delta=1-L$, we have $\Delta y_t = \epsilon_t$ form a stationary process, the random walk is called integrated of order one, denoted by $I(1)$.
Suppose that the AR process is initialized by $y_0 \sim {\cal N}(0,1)$. Then $y_t$ can be expressed as
\[ y_t = \phi^{t} y_0 + \sum_{\tau=1}^t \phi^{\tau -1} \epsilon_{t+1-\tau} \]
The unconditional variance is $y_t$ is given by \begin{eqnarray} var[y_t] & = & \phi^{2(t-1)} var[y_0] + \sum_{\tau=1}^t \phi^{2(\tau-1)} var[\epsilon_\tau] \\ & = & \phi^{2(t-1)} var[y_0] + \sigma^2 \sum_{\tau=1}^t \phi^{2(\tau-1)} \nonumber \\ & = & \left\{ \begin{array}{lclcl} \phi^{2(t-1)} var[y_0] + \sigma^2 \frac{1 - \phi^{2t}}{1 - \phi^2} & \longrightarrow & \frac{\sigma^2}{1-\phi^2} & \mbox{if} & |\phi| < 1 \\ var[y_0] + \sigma^2 t & \longrightarrow & \infty & \mbox{if} & |\phi| =1 \end{array} \right. \nonumber \end{eqnarray} as $t \rightarrow \infty$.
The conditional expectation of $y_{t}$ given $y_0$ is \begin{eqnarray*} \mathbb E[y_{t}|y_0] = \phi^{\tau-1} y_0 \longrightarrow \left\{ \begin{array}{lcl} 0 & \mbox{if} & |\phi| < 1 \\ y_0 & \mbox{if} & \phi = 1 \end{array} \right\} \end{eqnarray*} as $t \rightarrow \infty$.
In the unit root case, the best prediction of future $y_{t}$ is the initial $y_0$ at all horizons, that is, ``no change''.
In the stationary case, the conditional expectation converges to the unconditional mean. For this reason, stationary processes are also called ``mean reverting''.
Stationary and unit root processes differ in their behavior over long time horizons. Suppose that $\sigma^2=1$, and $y_0=1$. Then the conditional mean and variance of a process $y_t$ with $\phi = 0.995$ is given by
Horizon $t$ | 1 | 2 | 5 | 10 | 20 | 50 | 100 | |
---|---|---|---|---|---|---|---|---|
$\mathbb E[y_t | y_0]$ | 0.995 | 0.990 | 0.975 | 0.951 | 0.905 | 0.778 | 0.606 |
$var[y_t | y_0]$ | 1.000 | 1.990 | 4.901 | 9.563 | 18.21 | 39.52 | 63.46 |
If interestered in long run predictions, very important to distringuish these two cases.
But note: long run predictions face serious extrapolation problem.
To get a unit root test of the null hypothesis $H_0: \phi = 1$, we have to find the sampling distribution of a suitable test statistic such as the $t$ ratio \[ \frac{\hat{\phi}_T -1}{\sqrt{ \sigma^2 / \sum y_{t-1}^2 }} \] Under the generating mechanism \[ y_t = \phi_0 + y_{t-1} + \epsilon_t, \quad iid(0,\sigma^2) \] _For stationary processes used a variety of WLLN and CLTs_, unfortunately, these don't apply.
Assume that $\phi_0 = 0$, $\sigma = 1$, and $y_0 = 0$. Thus, the process $y_t$ can be represented as \[ y_T = \sum_{t=1}^T \epsilon_t \] Summations will range from $t=1$ to $T$ unless stated otherwise. The central limit theorem for $iid$ random variables implies \[ \frac{y_T}{\sqrt{T} } = \frac{1}{\sqrt{T}} \sum \epsilon_t \Longrightarrow {\cal N}(0,1) \] This suggests that \[ \frac{1}{T} \sum y_t = \frac{1}{\sqrt{T}} \sum \left[ \sqrt{ \frac{t}{T} } \frac{1}{\sqrt{t}} \sum_{\tau =1}^t \epsilon_\tau \right] \] will not converge to a constant in probability but instead to a random variable.
Need a more elegant approach!
We used $T = \{0, \pm 1, \pm 2, \ldots\}$.
Consider $S = [0,1]$. Consider random elements $W(t)$ that correspond to functions this interval.
We will place some probability $Q$ on these functions and show that $Q$ can be helpful in the approximation of the distribution of $\sum y_t$
Defining probability distributions on function spaces is a pain.
Let ${\cal C}$ be the space of continuous functions on the interval $[0,1]$.
We will define a probability distribution for the function space ${\cal C}$.
This probability distribution is called ``Wiener measure''.
Whenever we draw an element from the probability space we obtain a function $W(s)$, $s \in [0,1]$. Let $Q[ \cdot ]$ denote the expectation operator under the Wiener measure.
It can be shown that there indeed exists a probability distribution on ${\cal C}$ with these properties.
Rougly speaking, the Wiener measure is to the theory of stochastic processes, what the normal distribution is to the theory related to real valued random variables.
Note: $W(1) \sim {\cal N}(0,1)$.
Define the partial sum process \[ Y_T(s) = \frac{1}{\sqrt{T}} \sum \{ t \le \lfloor Ts \rfloor \} \epsilon_t \] where $\lfloor x \rfloor$ denotes the integer part of $x$. Since we assumed that $\epsilon_t \sim iid(0,1)$, the partial sum process is a random step function.
As $T\longrightarrow\infty$, these are basically the same.
\begin{eqnarray*} Y_T(s) = \frac{1}{\sigma \sqrt{T} } \sum_{t=1}^T \{ t \le \lfloor Ts \rfloor \} \epsilon_t \Longrightarrow W(s) \quad \Box \end{eqnarray*}
import scipy import matplotlib.pyplot as plt
N = 100 T = 1. Delta = T/N
W = scipy.zeros(N+1)
t = scipy.linspace(0, T, N+1); W[1:N+1] = scipy.cumsum(scipy.sqrt(Delta)*scipy.random.standard_normal(N))
#print( "Simulation of the Wiener process:\n", W)
plt.plot(t,W) plt.xlabel('t') plt.ylabel('W') plt.title('Wiener process')
The sum \[ \frac{1}{T} \sum y_{t-1} \epsilon_t \] convergences to a _stochastic integral_; i.e.,
Suppose that $y_t = y_{t-1} + \epsilon_t$, where $\epsilon_t \sim iid(0,\sigma^2)$ and $y_0=0$. Then \[ \frac{1}{\sigma^2 T} \sum y_{t-1} \epsilon_t \Longrightarrow \int W(s) dW(s) \] where $W(s)$ denotes a standard Wiener process.
we can use this to develop tests!
Suppose that $y_t = \phi y_{t-1} + \epsilon_t$, where $\epsilon_t \sim iid(0,\sigma^2)$, $\phi=1$, and $y_0=0$. The sampling distribution of the OLS estimator $\hat{\phi}_T$ of the autoregressive parameter $\phi=1$ and the sampling distribution of the corresponding $t$-statistic have the following asymptotic approximations \begin{eqnarray} z(\hat{\phi}_T) & \Longrightarrow & \frac{\frac{1}{2} ( W(1)^2 -1 ) }{ \int_0^1 W(s)^2 ds } \\ t(\hat{\phi}_T) & \Longrightarrow & \frac{\frac{1}{2} ( W(1)^2 -1 ) }{ \left[ \int_0^1 W(s)^2 ds \right]^{1/2} } \end{eqnarray}
where $W(s)$ denotes a standard Wiener process. $\Box$