ECON 616: Lecture Four: VARs

Introduction

Background

Overview: Chapter 10 from cite:Hamilton.
Technical Details: cite:Lutkepohl_1993
Other stuff: There is a new book by cite:kilian_lutkepohl_2017 – I haven’t read it yet.
Some surveys: cite:Stock2001, cite:Ramey_2016.

VARs

VARs have become an important tool for empirical macroeconomic research.

Reduced Form representations of the data that summarize regular features and are suitable to conduct forecasts.
Structural economic model can give some interpretation to a vector autoregression.

We’ll talk about both today.

Some Theoretical Properties of VARs

A vector autoregression is a generalization of the AR(p) model to the multivariate case:

\begin{eqnarray} y_t = \Phi_0 + \Phi_1 y_{t-1} + \ldots + \Phi_p y_{t-p} + u_t \end{eqnarray}

The random variable $y_t$ is now a $n \times 1$ random vector that takes values in $\mathbf{R}^n$.

For a theoretical analysis, it is often convenient to express the VAR(p) in the so-called companion form.

\begin{eqnarray} \hspace*{-0.5in} \left[ \begin{array}{c} y_t \\ y_{t-1} \\ \vdots \\ y_{t-p+1} \end{array} \right] = \left[ \begin{array}{c} \Phi_0 \\ 0 \\ \vdots \\ 0 \end{array} \right]

\left[ \begin{array}{ccccc} \Phi_1 & \Phi_2 & \cdots & \Phi_{p-1} & \Phi_p \\ I & 0 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & 0 & 0 \\ 0 & 0 & \cdots & I & 0 \end{array} \right] \left[ \begin{array}{c} y_{t-1} \\ y_{t-2} \\ \vdots \\ y_{t-p} \end{array} \right] + \left[ \begin{array}{c} u_t \\ 0 \\ \vdots \\ 0 \end{array} \right] \label{e_varcomp} \nonumber \end{eqnarray}

Let $\xi_t = [y_t’, y_{t-1}’, \ldots, y_{t-p+1}’]’$. The VAR can be rewritten as

\begin{eqnarray} \xi_t = F_0 + F_1 \xi_{t-1} + \nu_t \end{eqnarray}

where the definitions of $F_0$, $F_1$, and $\nu_t$ can be deduced from the previous slide.

define the $n \times np$ matrix $M_n = [I,0]$ where $I$ is an $n \times n$ identity matrix.
It can be easily verified that $y_t = M_n \xi_t$.
The companion form is useful in two respects:

to define stationarity in the context of a VAR
to convince ourselves that without loss of much generality we can restrict econometric analyses to VAR(1) specifications.

**Result** For a vector autoregression to be covariance stationary it is necessary that all eigenvalues of the matrix \$F\_1\$ are less than one in absolute value. \$\Box\$

Example

Consider the univariate AR(2) process \[ y_t = \phi_1 y_{t-1} + \phi_2 y_{t-2} + u_t \] The AR(2) process can be written in companion form as a VAR(1) where $\xi_t = [y_t , y_{t-1}]’$ and \[ F_1 = \left[ \begin{array}{cc} \phi_1 & \phi_2 \\ 1 & 0 \end{array} \right] \] The eigenvalues $\lambda$ of the matrix $F_1$ satisfy the condition \[ det( F_1 - \lambda I) = 0 \iff (\phi_1 - \lambda)(-\lambda) - \phi_2 = 0 \] Provided that $\lambda \not= 0$ the equation can be rewritten as \[ 0 = 1 - \phi_1 \frac{1}{\lambda} - \phi_2 \frac{1}{\lambda^2} \] Thus, the condition $|\lambda| < 1$ is, at least in this example, equivalent to the condition that all the roots of the polynomial $\phi(z)$ are greater than one in absolute value. A generalization of this example can be found in Hamilton (1994, Chapter 1). $\Box$

VAR(p)

Consider a VAR(p). The expected value of $y_t$ has to satisfy the vector difference equation

\begin{eqnarray} {\mathbb E}[y_t] = \Phi_0 + \Phi_1 {\mathbb E}[y_{t-1}] + \ldots \Phi_p {\mathbb E}[y_{t-p}] \quad \mbox{for all} \; t \end{eqnarray}

If the eigenvalues of $F_1$ are all less than one in absolute values and the VAR was initialized in the infinite past, then the expected value is given by

\begin{eqnarray} {\mathbb E}[y_t] = [ I - \Phi_1 - \ldots \Phi_t]^{-1} \Phi_0 \end{eqnarray}

To calculate the autocovariances we will assume that $\Phi_0 = 0$. Consider the companion form

\begin{eqnarray} \xi_t = F_1 \xi_{t-1} + \nu_t \end{eqnarray}

If the eigenvalues of $F_1$ are all less than one in absolute value and the VAR was initialized in the infinite past, than the autocovariance matrix of order zero has to satisfy the equation

\begin{eqnarray} \Gamma_{\xi \xi,0} = {\mathbb E}[\xi_t \xi_t’] = F_1 \Gamma_{\xi \xi,0} F_1’ + {\mathbb E}[ \nu_t \nu_t’] \end{eqnarray}

Obtaining a closed form solution for $\Gamma_{\xi \xi,0}$ is a bit more complicated than in the univariate AR(1) case.

Some Facts

\begin{definition} Let $A$ and $B$ be $2 \times 2$ matrices with the elements \[ A = \left[ \begin{array}{cc} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array} \right], \quad B = \left[ \begin{array}{cc} b_{11} & b_{12} \\ b_{21} & b_{22} \end{array} \right] \] The $vec$ operator is defined as the operator that stacks the columns of a matrix, that is, \[ vec(A) = [ a_{11}, a_{21}, a_{12}, a_{22} ]' \] and the Kronecker product is defined as \[ A \otimes B = \left[ \begin{array}{cc} a_{11}B & a_{12}B \\ a_{21}B & a_{22}B \end{array} \right] \quad \Box \] \end{definition}

\begin{lemma} Let $A$, $B$, $C$ be matrices whose dimension are such that the product $ABC$ exists. Then $vec(ABC) = (C’ \otimes A)vec(B) \quad \Box$ \end{lemma}

VAR(p), continued

A closed form solution for the elements of the covariance matrix of $\xi_t$ can be obtained as follows

\begin{eqnarray} vec(\Gamma_{\xi \xi,0}) & = &(F_1 \otimes F_1) vec(\Gamma_{\xi\xi,0}) + vec( {\mathbb E}[\nu_t \nu_t’] ) \nonumber \\ & = & [ I - (F_1 \otimes F_1)]^{-1} vec( {\mathbb E}[\nu_t \nu_t’] ) \end{eqnarray}

Since

\begin{eqnarray} {\mathbb E}[ \xi_t \xi_{t-h}’ ] = F {\mathbb E}[\xi_{t-1} \xi_{t-h}’] + {\mathbb E}[\nu_t \xi_{t-h}’] \end{eqnarray}

we can deduce that

\begin{eqnarray} \Gamma_{\xi \xi,h} = F^h_1 \Gamma_{\xi \xi,0} \end{eqnarray}

To obtain the autocovariance $\Gamma_{\xi \xi,-h}$ we have to keep track of a transpose in the general matrix case:

\begin{eqnarray} \Gamma_{\xi \xi,-h} = {\mathbb E}[ \xi_{t-h} \xi_t’ ] = \bigg[ {\mathbb E}[ \xi_t \xi_{t-h}’] \bigg]’ = \Gamma_{\xi \xi,h}' \end{eqnarray}

VAR(p), continued

Once we have calculate that autocovariances for the companion form process $\xi_t$ it is straightforward to obtain the autocovariances for the $y_t$ process. Since $y_t = M_n \xi_t$ it follows that

\begin{eqnarray} \Gamma_{yy,h} = {\mathbb E}[y_t y_{t-h}’] = {\mathbb E}[ M_n \xi_t \xi_{t-h}’ M_n’] = M_n \Gamma_{\xi \xi,h} M_n' \end{eqnarray}

Result: Consider the vector autoregression \[ y_t = \Phi_0 + \Phi_1 y_{t-1} + \ldots + \Phi_p y_{t-p} + u_t \] where $u_t \sim iid {\cal N}(0,\Sigma_u)$ with companion form \[ \xi_t = F_0 + F_1 \xi_{t-1} + \nu_t \] Suppose that the eigenvalues of $F_1$ are all less than one in absolute values and that the vector autoregression was initialized in the infinite past. Under these assumptions the vector process $y_t$ is covariance stationary with the moments

\begin{eqnarray} {\mathbb E}[y_t] & = & [ I - \Phi_1 - \ldots \Phi_t]^{-1} \Phi_0 \\ \Gamma_{yy,h} & = & M_n \Gamma_{\xi \xi,h} M_n’ \quad \forall h \end{eqnarray}

where

\begin{eqnarray} vec(\Gamma_{\xi\xi,0}) & = & [ I - (F_1 \otimes F_1)]^{-1} vec( {\mathbb E}[\nu_t \nu_t’] ) \\ \Gamma_{\xi \xi,h} & = & F_1^h \Gamma_{\xi \xi,0} \quad h > 0 \quad \Box \end{eqnarray}

The Likelihood Function

We will now derive the likelihood function for a Gaussian VAR(p), conditional on initial observations $y_0, \ldots, y_{-p+1}$. The density of $y_t$ conditional on $y_{t-1}, y_{t-2}, \ldots$ and the coefficient matrices $\Phi_0, \Phi_1, \ldots, \Sigma$ is of the form

\begin{eqnarray} \hspace*{-0.5in} p(y_t|Y^{t-1}, \Phi_0, \ldots, \Sigma) &\propto& |\Sigma|^{-1/2} \exp \bigg\{ - \frac{1}{2} ( y_t - \Phi_0 - \Phi_1 y_{t-1} - \ldots - \Phi_p y_{t-p} )’ \nonumber \\ &~& \times \Sigma^{-1} ( y_t - \Phi_0 - \Phi_1 y_{t-1} - \ldots - \Phi_p y_{t-p} ) \bigg\} \end{eqnarray}

Define the $(np+1) \times 1$ vector $x_t$ as \[ x_t = [ 1, y_{t-1}’, \ldots, y_{t-p}’]' \] Moreover, define the matrixes \[ Y = \left[ \begin{array}{c} y_1’ \\ \vdots \\y_T’ \end{array} \right], \quad X = \left[ \begin{array}{c} x_1’ \\ \vdots \\x_T’ \end{array} \right], \quad \Phi = [ \Phi_0, \Phi_1, \ldots, \Phi_p]' \]

The conditional density of $y_t$ can be written in more compact notation as

\begin{eqnarray} p(y_t| Y^{t-1}, \Phi, \Sigma) \propto |\Sigma|^{-1/2} \exp \left\{ - \frac{1}{2} ( y_t’ - x_t’\Phi ) \Sigma^{-1} ( y_t’ - x_t’\Phi )’ \right\} \end{eqnarray}

To manipulate the density we will use some matrix algebra facts.
Facts:

Let $a$ be a $n \times 1$ vector, $B$ be a symmetric positive definite $n \times n$ matrix, and $tr$ the trace operator that sums the diagonal elements of a matrix. Then \[ a’Ba = tr[Baa’] \]
Let $A$ and $B$ be two $n \times n$ matrices, then \[ tr[A+B] = tr[A] + tr[B] \]

In a first step, we will replace the inner product in the expression for the conditional density by the trace of the outer product

\begin{eqnarray} p(y_t| Y^{t-1}, \Phi, \Sigma) \propto |\Sigma|^{-1/2} \exp \left\{ - \frac{1}{2} tr[ \Sigma^{-1}( y_t’ - x_t’\Phi )’( y_t’ - x_t’\Phi )] \right\} \end{eqnarray}

In the second step, we will take the product of the conditional densities of $y_1, \ldots, y_T$ to obtain the joint density. Let $Y_0$ be a vector with initial observations

\begin{eqnarray} p(Y|\Phi,\Sigma, Y_0) &=& \prod_{t=1}^T p(y_t|Y^{t-1}, Y_0, \Phi, \Sigma) \nonumber \\ &\propto& |\Sigma|^{-T/2} \exp \left\{ -\frac{1}{2} \sum_{t=1}^T tr[\Sigma^{-1}( y_t’ - x_t’\Phi )’( y_t’ - x_t’\Phi )] \right\} \nonumber \\ &\propto& |\Sigma|^{-T/2} \exp \left\{ -\frac{1}{2} tr\left[\Sigma^{-1}\sum_{t=1}^T( y_t’ - x_t’\Phi )’( y_t’ - x_t’\Phi )\right] \right\} \nonumber \\ &\propto& |\Sigma|^{-T/2} \exp \left\{ -\frac{1}{2} tr [ \Sigma^{-1} (Y-X\Phi)’(Y-X\Phi) ] \right\} \end{eqnarray}

Define the ``OLS’’ estimator

\begin{eqnarray} \hat{\Phi} = (X’X)^{-1} X’Y \end{eqnarray}

and the sum of squared OLS residual matrix

\begin{eqnarray} S = (Y - X \hat{\Phi})’(Y- X \hat{\Phi}) \end{eqnarray}

It can be verified that

\begin{eqnarray} (Y - X\Phi)’(Y-X\Phi) = S + (\Phi - \hat{\Phi})‘X’X(\Phi- \hat{\Phi}) \end{eqnarray}

This leads to the following representation of the likelihood function

\begin{eqnarray} p(Y|\Phi,\Sigma, Y_0) &\propto& |\Sigma|^{-T/2} \exp \left\{ -\frac{1}{2} tr[\Sigma^{-1} S] \right\}\nonumber \\ &~&\times \exp \left\{ -\frac{1}{2} tr[ \Sigma^{-1}(\Phi - \hat{\Phi})‘X’X(\Phi- \hat{\Phi})] \right\} \end{eqnarray}

Alternative Representation

Let $\beta = vec(\Phi)$ and $\hat{\beta} = vec(\hat{\Phi})$. It can be verified that

\begin{eqnarray} tr[ \Sigma^{-1}(\Phi - \hat{\Phi})‘X’X(\Phi- \hat{\Phi})] = (\beta - \hat{\beta})’[ \Sigma \otimes (X’X)^{-1} ]^{-1} (\beta - \hat{\beta}) \end{eqnarray}

and the likelihood function has the alternative representation

\begin{eqnarray} p(Y|\Phi,\Sigma, Y_0) &\propto& |\Sigma|^{-T/2} \exp \left\{ -\frac{1}{2} tr[\Sigma^{-1} S] \right\} \nonumber \\ &~&\times \exp \left\{ -\frac{1}{2} (\beta - \hat{\beta})’[ \Sigma \otimes (X’X)^{-1} ]^{-1} (\beta - \hat{\beta}) \right\} \nonumber \end{eqnarray}

Inference

Above suggests we could estimate $\Phi$ and $\Sigma$ via LS/MLE.
Consider a VAR(1) on the Output Gap, Inflation, and Interest Rate: [1959:Q1-2004:Q4]

\includegraphics[width=4in]{../lecture-four-vars/data}

MLE

\[ \hat\Phi_0 = \left[\begin{array}{c} 0.44 \\ 0.24 \\ 0.24\end{array}\right], \quad \hat\Phi_1 = \left[\begin{array}{ccc} 0.93 & -0.01 & -0.07 \\ 0.07 & 0.84 & 0.07 \\ 0.08 & 0.09 & 0.91\end{array}\right] \]
\[ \hat\Sigma = \left[\begin{array}{ccc} 0.62 & -0.04 & 0.24 \\ -0.04 & 1.27 & 0.16 \\ 0.24 & 0.16 & 0.87 \end{array}\right] \]
$|eig(\hat\Phi_1)| = [0.95, 0.95, 0.78] \implies$ stationary
Unconditional Mean \[ (I - \hat\Phi_1)^{-1}\hat\Phi_0 = [-0.54, 3.75, 6.09] \]

Impulse Response Function

Let’s that $u_{1,t}$ equals 1 in some period $t$. What does that mean for $t+1, t+2, \ldots$?

\includegraphics[width=4in]{../lecture-four-vars/irf}

Formally

IRF(h) = $\frac{dy{t+h}}{du_t}$
Can get via $MA(\infty)$ representation
$y_t = \Phi_0 + u_t + \Phi_1 u_{t-1} + \Phi^2 u_{t-2} + \ldots$
Identifies consequences of a 1 unit increase in innovation $u_{1,t}$ on observables holding all innovations fixed.
Causality? Be careful. We’re in reduced-form…
Still, get a sense of dynamics of system.

How to pick lags of VAR?
Could use information criteria Akaike, Bayesian, Schwarz, … just like OLS.
OTOH, hand isn’t it nice to use more lags = more complex dynamics, MP ``long and variable’’

\begin{center} \begin{tabular}{lcc} \hline & $p=1$ & $p=6$ \\ \hline $std(\hat u_1)$ & 0.62 & 0.45 \\ $std(\hat u_2)$ & 1.27 & 1.01 \\ $std(\hat u_3)$ & 0.87 & 0.61 \\ \hline \end{tabular} \end{center}

How to use more lags without overfitting?

Granger Causality

Economists often use regression results to make statements about causal relationships between variables.

Suppose we would like to examine the monetarist hypothesis that a contraction of the money supply causes a decrease in aggregate output.
It is tempting to regress output on a measure of lagged money supply and interpret a non-zero coefficient as ``causal’’ relationship.
Since this concept of causality is somewhat different from the usual notion of causality it gets a new name.

Bivariate Granger Causality The random variable $y_{2,t}$ fails to Granger cause the random variable $y_{1,t}$ if for all $s>0$ the mean squared error of a forecast of $y_{1,t+s}$ based on $y_{1,t}, y_{1,t-1},\ldots$ is the same as the mean squared error of a forecast that uses both $y_{1,t}, y_{1,t-1},\ldots$ and $y_{2,t}, y_{2,t-1},\ldots$. $\Box$

Example

Consider the bivariate VAR(2)

\begin{eqnarray} \left[ \begin{array}{c} y_{1,t} \\ y_{2,t} \end{array} \right]

\left[ \begin{array}{c} \phi^{(0)}_1 \\ \phi^{(0)}_2 \end{array} \right] + \left[ \begin{array}{cc} \phi^{(1)}_{11} & \phi^{(1)}_{12} \\ \phi^{(1)}_{21} & \phi^{(1)}_{22} \end{array} \right] \left[ \begin{array}{c} y_{1,t-1} \\ y_{2,t-1} \end{array} \right]

\ldots + \left[ \begin{array}{cc} \phi^{(p)}_{11} & \phi^{(p)}_{12} \\ \phi^{(p)}_{21} & \phi^{(p)}_{22} \end{array} \right] \left[ \begin{array}{c} y_{1,t-p} \\ y_{2,t-p} \end{array} \right]

\left[ \begin{array}{c} u_{1,t} \\ u_{2,t} \end{array} \right] \end{eqnarray}

If $y_{2,t}$ fails to Granger cause $y_{1,t}$, then it must be true that

\begin{eqnarray} \phi_{12}^{(1)} = \phi_{12}^{(2)} = \ldots = \phi_{12}^{(p)} =0 \label{e_restr} \end{eqnarray}

A discussion of Granger causality in the context of a VAR with more than two variables can be found in Hamilton (1994). We will now examine Granger causality in the context of forward looking behavior. Roughly speaking:

\begin{quote} The weather forecast Granger causes the weather, but shooting the weatherman will not produce a sunny weekend. (Cochrane, 1994). \end{quote}

Example

Consider a investor who has the choice between a riskless bond that yields a return $r$, and a risky asset that has a price $p_t$ and will pay dividends $d_{t+1}$ in the next period. In equilibrium under the absence of arbitrage

\begin{eqnarray} 1 + r = {\mathbb E}_t \left[ \frac{ p_{t+1} + d_{t+1} }{p_{t}} \right] \end{eqnarray}

The forward solution of this difference equation implies that the price of the risky asset is

\begin{eqnarray} p_t = {\mathbb E}_t \left[ \sum_{\tau=1}^\infty \left( \frac{1}{1+r} \right)^\tau d_{t+\tau} \right] \end{eqnarray}

Thus, according to the model, the stock price incorporates the market’s best forecast of the present value of future dividends. If this forecast is based on more information than past dividends alone, then stock prices will Granger cause dividends, as investors try to anticipate movements in dividends.

Example

Suppose that

\begin{eqnarray} d_t = d + u_t + \delta u_{t-1} + \nu_t \end{eqnarray}

where $u_t$ and $\nu_t$ are independent Gaussian $iid$ series. Suppose that the investor at time $t$ knows values of current and past $u_t$ and $\nu_t$’s. The forecast of $d_{t+\tau}$ based on this information is given by

\begin{eqnarray} {\mathbb E}_t [d_{t+\tau}] = \left\{ \begin{array}{ccc} d + \delta u_t & & \mbox{for}\; \tau = 1 \\ d & & \mbox{for}\; \tau = 2,3,\ldots \end{array} \right. \end{eqnarray}

Thus, the stock price is given by

\begin{eqnarray} p_t = \frac{d}{r} + \frac{\delta u_t}{1+r} \end{eqnarray}

which implies that

\begin{eqnarray} \delta u_{t-1} = (1+r) p_{t-1} - (1+r) d/r \end{eqnarray}

The system can be written as a bivariate VAR

\begin{eqnarray} \left[ \begin{array}{c} p_{t} \\ d_{t} \end{array} \right]

\left[ \begin{array}{c} d/r \\ -d/r \end{array} \right] + \left[ \begin{array}{cc} 0 & 0 \\ 1+r & 0 \end{array} \right] \left[ \begin{array}{c} p_{t-1} \\ d_{t-1} \end{array} \right] + \left[ \begin{array}{c} \delta u_t /(1+r) \\ u_t + \nu_t \end{array} \right] \end{eqnarray}

Upshot

In this model, Granger causation runs the opposite direction from the true causation.
Dividends fail to ``Granger-cause’’ prices, even though investors’ perceptions of dividends are the sole determinant of stock prices.
On the other hand, prices do ``Granger-cause’’ dividends, even though the market’s evaluation of the stock in reality has no effect on the dividend process. (Hamilton, 1994, Chapter 11).

How to think about causation?

Cointegration

A last word about cointegration

We will now analyze a simple bivariate system of cointegrated processes. Consider the model

\begin{eqnarray} y_{1,t} & = & \gamma y_{2,t} + u_{1,t} \\ y_{2,t} & = & y_{2,t-1} + u_{2,t} \end{eqnarray}

where $[u_{1,t},u_{2,t}]’ \sim iid(0,\Omega)$.

Clearly, $y_{2,t}$ is a random walk. Moreover, it can be easily verified that $y_{1,t}$ follows a unit root process.

\begin{eqnarray} y_{1,t} - y_{1,t-1} = \gamma (y_{2,t} - y_{2,t-1}) + u_{1,t} - u_{1,t-1} \end{eqnarray}

Therefore,

\begin{eqnarray} y_{1,t} = y_{1,t-1} + \gamma u_{2,t} + u_{1,t} - u_{1,t-1} \end{eqnarray}

Thus, both $y_{1,t}$ and $y_{2,t}$ are integrated processes.

Model Continued

However, the linear combination

\begin{eqnarray} [1,\; -\gamma] \left[ \begin{array}{c} y_{1,t} \\ y_{2,t} \end{array} \right] = y_{1,t} - \gamma y_{2,t} = u_{1,t} \end{eqnarray}

is stationary. Therefore, $y_{1,t}$ and $y_{2,t}$ are cointegrated.
The vector $[1,-\gamma]’$ is called the cointegrating vector.
Note that the cointegrating vector is only unique up to normalization.

Rewriting the Model

The model can be rewritten as a VAR(1)

\begin{eqnarray} y_t = \Phi_1 y_{t-1} + \epsilon_t \end{eqnarray}

The elements of the matrix $\Phi_1$ and the definition of $\epsilon_t$ is given by

\begin{eqnarray} \left[ \begin{array}{c} y_{1,t} \\ y_{2,t} \end{array} \right] = \left[ \begin{array}{cc} 0 & \gamma \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} y_{1,t-1} \\ y_{2,t-1} \end{array} \right] + \left[ \begin{array}{c} u_{1,t} + \gamma u_{2,t} \\ u_{2,t} \end{array} \right] \end{eqnarray}

The matrix $\Phi_1$ is of reduced rank in this example of cointegration. More generally cointegrated system can be casted in the form of a vector autoregression in levels of $y_t$.
Although both $y_{1,t}$ and $y_{2,t}$ follow univariate random walks, the cointegrated system cannot be expressed as a vector autoregression in differences $[ \Delta y_{1,t}, \Delta y_{2,t} ]’$. Consider

\begin{eqnarray} \left[ \begin{array}{c} \Delta y_{1,t} \\ \Delta y_{2,t} \end{array} \right] = \left[ \begin{array}{cc} 1-L & \gamma L \\ 0 & 1 \end{array} \right] \left[ \begin{array}{c} u_{1,t} \\ u_{2,t} \end{array} \right] = \Theta(L) u_t \end{eqnarray}

Since $|\Theta(1)|=0$ the moving average polynomial is not invertible and no finite order VAR could describe $\Delta y_t$.

VECM

The cointegrated model can be written in the so-called vector error correction model (VECM) form:

\begin{eqnarray} \left[ \begin{array}{c} \Delta y_{1,t} \\ \Delta y_{2,t} \end{array} \right] = \left[ \begin{array}{c} -1 \\ 0 \end{array} \right] \left( \left[ \begin{array}{cc} 1 & - \gamma \end{array} \right] \left[ \begin{array}{c} y_{1,t-1} \\ y_{2,t-1} \end{array} \right] \right) + \left[ \begin{array}{c} u_{1,t} + \gamma u_{2,t} \\ u_{2,t} \end{array} \right] \end{eqnarray}

The term

\begin{eqnarray} \left( \left[ \begin{array}{cc} 1 & - \gamma \end{array} \right] \left[ \begin{array}{c} y_{1,t-1} \\ y_{2,t-1} \end{array} \right] \right) = y_{1,t-1} - \gamma y_{2,t-1} \end{eqnarray}

is called error correction term. In economic models it often reflects a long-run equilibrium relationship such as a constant ratio of consumption and output. If the economy is out of equilibrium in period $t-1$, that is, $y_{1,t-1} - \gamma y_{2,t-1} \not= 0$, then the economy adjusts toward its long-run equilibrium and $\EE_{t-1}[\Delta y_{t}] \not= 0$. If the ``true’’ cointegrating vector is known, then both the left-hand-side variables and the error correction term are stationary.

Upshot

In practice, if one would like to model a bivariate vector process $y_t$, it has to be determined whether to fit

An unrestricted vector autoregression of the form
a vector autoregression in differences
or a vector error correction model (reduced rank regression)

How to pick:

A likelihood ratio test or a Bayesian model selection criterion could be used
if the processes $y_{1,t}$ and $y_{2,t}$ are integrated the analysis of the sampling distribution of the likelihood ratio test statistics is complicated
Johansen (1995) provides a nice summary of the relevant asymptotic distribution theory.

SVARs

So far, we considered reduced form VARs, say,

\begin{eqnarray} y_t = \Phi_1 y_{t-1} + u_t, \quad {\mathbb E}[u_t u_t’] = \Sigma_u \label{eq_varrf} \end{eqnarray}

in which the error terms $u_t$ have the interpretation of one-step ahead forecast errors. If the eigenvalues of $\Phi_1$ are inside the unit-circle then $y_t$ has the following moving-average (MA) representation in terms of $u_t$:

\begin{eqnarray} y_t = (I - \Phi_1 L)^{-1} u_t = \sum_{j=0}^\infty \Phi_1^j u_{t-j} = \sum_{j=0}^\infty C_j u_{t-j} \end{eqnarray}

Modern dynamic macro models suggest that the one-step ahead forecast errors are functions of some fundamental shocks, such as technology shocks, preference shocks, or monetary policy shocks.

Let $\epsilon_t$ a vector of such fundamental shocks and assume that ${\mathbb E}[\epsilon_t \epsilon_t’] = {\cal I}$. Moreover, assume that

\begin{eqnarray} u_t = \Phi_\epsilon \epsilon_t. \end{eqnarray}

Then we can express the VAR in structural form as follows

\begin{eqnarray} y_t &=& \Phi_1 y_{t-1} + \Phi_\epsilon \epsilon_t \label{eq_varsf} \\ \Phi_\epsilon^{-1} y_t &=& \Phi_\epsilon^{-1} \Phi_1 y_{t-1} + \epsilon_t \nonumber \end{eqnarray}

The moving-average representation of $y_t$ in terms of the structural shocks is given by

\begin{eqnarray} y_t = \sum_{j=0}^\infty \Phi_1^j \Phi_\epsilon \epsilon_{t-j} = \sum_{j=0}^\infty C_j \Phi_\epsilon \epsilon_{t-j}. \end{eqnarray}

The moving-average representation of $y_t$ in terms of the structural shocks is given by

\begin{eqnarray} y_t = \sum_{j=0}^\infty \Phi_1^j \Phi_\epsilon \epsilon_{t-j} = \sum_{j=0}^\infty C_j \Phi_\epsilon \epsilon_{t-j}. \end{eqnarray}

For~(\ref{eq_varrf}) and~(\ref{eq_varsf}) the matrix $\Phi_\epsilon$ has to satisfy the restriction

\begin{eqnarray} \Phi_\epsilon \Phi_\epsilon’ = \Sigma_u. \end{eqnarray}

Notice that the matrix $\Phi_\epsilon$ has $n^2$ elements. The covariance relationship, unfortunately, generates only $n(n+1)/2$ restrictions and does not uniquely determine $\Phi_\epsilon$. This creates an identification problem since all we can estimate from the data is $\Phi_1$ and $\Sigma_u$.

In order to make statements about the propagation of structural shocks $\epsilon_t$ we have to make further assumptions. The papers by Cochrane (1994), Christiano and Eichenbaum (1999), and Stock and Watson (2001) survey such identifying assumptions. A cynical view of this literature is the following:

Propose an identification scheme, that determines all elements of $\Phi_\epsilon$.
Compute impulse response functions.
If impulse response functions are plausible, then stop; else, declare a ``puzzle’’ and return to 1.

Here are some famous ``puzzles:’'

``Liquidity Puzzle:’’ When identifying monetary policy shocks as surprise changes in the stock of money one often finds that interest rates fall when the money stock is lowered.
``Price Puzzle:’’ When identifying monetary policy shocks as surprise changes in the Federal Funds Rate, one often finds that prices fall after a drop in interest rates.

These ``puzzles’’ are typically resolved by considering more elaborate identification schemes.

Impulse Response Functions and Variance Decompositions

Impulse responses are defined as

\begin{eqnarray} \frac{\partial y_{t+h}}{\partial \epsilon_t’} = C_h \Phi_\epsilon \end{eqnarray}

and correspond to the MA coefficient matrices in the moving average representation of $y_t$ in terms of structural shocks.

The covariance matrix of $y_t$ is given by

\begin{eqnarray} \Gamma_{yy,0} = \sum_{j=0}^\infty C_j \Phi_\epsilon {\cal I} \Phi_\epsilon’ C_j' \end{eqnarray}

Let ${\cal I}^{i}$ be matrix for which element $i,i$ is equal to one and all other elements are equal to zero. Then we can define the contribution of the $i$’th structural shock to the variance of $y_t$ as

\begin{eqnarray} \Gamma_{yy,0}^{(i)} = \sum_{j=0}^\infty C_j \Phi_\epsilon {\cal I}^{(i)} \Phi_\epsilon’ C_j' \end{eqnarray}

Thus the fraction of the variance of $y_{l,t}$ explained by shock $i$ is \[ [ \Gamma_{yy,0}^{(i)} ]_{ll} / [\Gamma_{yy,0}]_{ll} . \]

We begin by decomposing the covariance matrix into the product of lower triangular matrices (Cholesky Decomposition):

\begin{eqnarray} \Sigma_u = A A’, \end{eqnarray}

where $A$ is lower triangular. If $\Sigma_u$ is non-singular the decomposition is unique. Let $\Omega$ be an orthonormal matrix, meaning that $\Omega \Omega’ = \Omega’ \Omega = {\cal I}$. We can characterize the relationship between the reduced form and the structural shocks as follows

\begin{eqnarray} u_t = A \Omega \epsilon_t \end{eqnarray}

Notice that

\begin{eqnarray} {\mathbb E}[u_t u_t’] = {\mathbb E}[ A \Omega \epsilon_t \epsilon_t’ \Omega’ A’] = A \Omega {\mathbb E}[\epsilon_t \epsilon_t’] \Omega’ A' = A \Omega \Omega’ A’ = A A’ = \Sigma_u. \end{eqnarray}

In general, it is quite tedious to characterize the space of orthonormal matrices. Let’s try for $n=2$:

\begin{eqnarray} \Omega(\varphi) = \left[ \begin{array}{cc} \cos \varphi & - \sin \varphi \\ \sin \varphi & \cos \varphi \end{array} \right] \end{eqnarray}

where $\varphi \in (-\pi,\pi]$. Notice that, for instance,

\begin{eqnarray} \Omega( \pi/2 ) = - \Omega (-\pi/2) \end{eqnarray}

which means that only the signs of the impulse responses change but not the shape.
Let’s look at some famous identification schemes

Sims (1980)

Suppose that \[ y_t = \left[ \begin{array}{c} \mbox{Fed Funds Rate} \\ \mbox{Output Growth} \end{array} \right], \quad \epsilon_t = \left[ \begin{array}{c} \epsilon_{R,t} \\ \epsilon_{z,t} \end{array} \right] = \left[ \begin{array}{c} \mbox{Monetary Policy Shock} \\ \mbox{Technology Shock} \end{array} \right]. \] Moreover, we assume that the central bank does not react contemporaneously to technology shocks because data on aggregate output only become available with a one-quarter lag. This assumption can be formalized through $\varphi = 0$. Then

\begin{eqnarray} u_t = \left[ \begin{array}{cc} a_{11} & 0 \\ a_{21} & a_{22} \end{array} \right] \left[ \begin{array}{c} \epsilon_{R,t} \\ \epsilon_{z,t} \end{array} \right]. \end{eqnarray}

Blanchard and Quah (1989)

Now suppose that \[ y_t = \left[ \begin{array}{c} \mbox{Inflation} \\ \mbox{Output Growth} \end{array} \right], \quad \epsilon_t = \left[ \begin{array}{c} \epsilon_{R,t} \\ \epsilon_{z,t} \end{array} \right] = \left[ \begin{array}{c} \mbox{Monetary Policy Shock} \\ \mbox{Technology Shock} \end{array} \right] \] Moreover,

\begin{eqnarray} y_t = ( \sum_{j=0}^\infty C_j L^j ) u_t = C(L) u_t. \end{eqnarray}

Consider the following assumption: monetary policy shocks do not raise output in the long-run. Let’s examine the moving average representation of $y_t$ in terms of the structural shocks

\begin{eqnarray*} y_t &=& \left[ \begin{array}{cc} c_{11}(L) & c_{12}(L) \\ c_{21}(L) & c_{22}(L) \end{array} \right] \left[ \begin{array}{cc} a_{11} & 0 \\ a_{21} & a_{22} \end{array} \right] \left[ \begin{array}{cc} \cos \varphi & - \sin \varphi \\ \sin \varphi & \cos \varphi \end{array} \right] \left[ \begin{array}{c} \epsilon_{R,t} \\ \epsilon_{z,t} \end{array} \right] \\ &=& \left[ \begin{array}{cc} \cdot & \cdot \\ a_{11} \cos \varphi c_{21}(L) + (a_{21} \cos \varphi + a_{22} \sin \varphi ) c_{22}(L) & \cdot \end{array} \right] \left[ \begin{array}{c} \epsilon_{R,t} \\ \epsilon_{z,t} \end{array} \right] \\ &=& \left[ \begin{array}{cc} d_{11}(L) & d_{12}(L) \\ d_{21}(L) & d_{22}(L) \end{array} \right] \left[ \begin{array}{c} \epsilon_{R,t} \\ \epsilon_{z,t} \end{array} \right] \end{eqnarray*}

Suppose that in period $t=0$ log output and log prices are equal to zero. Then the log-level of output and prices in period $t = T > 0$ is given by

\begin{eqnarray} y_T^c = \sum_{t=1}^T y_t = \sum_{t = 1}^T \sum_{j=0}^\infty D_j \epsilon_{t - j} \end{eqnarray}

Now consider the derivative

\begin{eqnarray} \frac{ \partial y_T^c }{\partial \epsilon_1’} = \sum_{j=0}^{T-1} D_j \end{eqnarray}

Letting $T \longrightarrow \infty$ gives us the long-run response of the level of prices and output to the shock $\epsilon_1$:

\begin{eqnarray} \frac{ \partial y_\infty^c }{ \partial \epsilon_1’} = \sum_{j=0}^\infty D_j = D(1) \end{eqnarray}

Here, we want to restrict the long-run effect of monetary policy shocks on output:

\begin{eqnarray} d_{21}(1) = 0 \end{eqnarray}

This leads us to the equation

\begin{eqnarray} [ a_{11}c_{21}(1) + a_{21} c_{22}(1) ] \cos \varphi + a_{22} c_{22}(1) \sin \varphi = 0. \end{eqnarray}

Notice that the equation has two solutions for $\varphi \in ( - \pi, \pi]$. Under one solution a positive monetary policy shock is contractionary, under the other solution it is expansionary. The shape of the responses is, of course, the same.

\includegraphics[width=4in]{../lecture-four-vars/bq1}

\includegraphics[width=4in]{../lecture-four-vars/bq2}

Sign Restrictions

Again consider \[ y_t = \left[ \begin{array}{c} \mbox{Inflation} \\ \mbox{Output Growth} \end{array} \right], \quad \epsilon_t = \left[ \begin{array}{c} \epsilon_{R,t} \\ \epsilon_{z,t} \end{array} \right] = \left[ \begin{array}{c} \mbox{Monetary Policy Shock} \\ \mbox{Technology Shock} \end{array} \right] \] and our identification assumption is: upon impact, a monetary policy shock raises both prices and output. It can be verified that

\begin{eqnarray} \frac{ \partial y_t }{\partial \epsilon_{R,t} } = \left[ \begin{array}{c} a_{11} \cos \varphi c_{11,1} + (a_{21} \cos \varphi + a_{22} \sin \varphi ) c_{12,1} \\ a_{11} \cos \varphi c_{21,1} + (a_{21} \cos \varphi + a_{22} \sin \varphi ) c_{22,1} \end{array} \right]. \end{eqnarray}

Thus, we obtain the sign restrictions

\begin{eqnarray*} 0 &<& a_{11} \cos \varphi c_{11,1} + (a_{21} \cos \varphi + a_{22} \sin \varphi ) c_{12,1} \\ 0 &<& a_{11} \cos \varphi c_{21,1} + (a_{21} \cos \varphi + a_{22} \sin \varphi ) c_{22,1} \end{eqnarray*}

which restrict $\varphi$ to be in a certain subset of $(-\pi,\pi]$ and will generate a range of responses.

Further readings: cite:Canova2002, cite:Faust1998, cite:Uhlig2005.

Uhlig, 2005

What is the effect of MP on Output?
Let’s assume that after an MP shock $R_{t+k}$ for $k = 1, \ldots K$.
How does it compare to the standard ordering?

Result: Monetary Policy does not effect output!

Sign Restrictions

\includegraphics[width=4in]{../lecture-four-vars/uhlig2}

Sign Restrictions

\includegraphics[width=4in]{../lecture-four-vars/uhliq1}

Cholesky

\includegraphics[width=4in]{../lecture-four-vars/uhlig3}

Is that the last word?

Can you verify sign restrictions?
It’s hard to get (all) the right Omegas [cite:Arias_2014]
Other ``reasonable’’ sign restrictions give different results?
Are we back to ``puzzle’'?

ECON 616: Lecture Four: VARs

Introduction

Background

VARs

Some Theoretical Properties of VARs

Example

VAR(p)

Some Facts

VAR(p), continued

VAR(p), continued

The Likelihood Function

The Likelihood Function

Alternative Representation

Inference

Inference

MLE

Impulse Response Function

Formally

Granger Causality

Granger Causality

Example

\begin{eqnarray} \left[ \begin{array}{c} y_{1,t} \\ y_{2,t} \end{array} \right]

Example

Example

\begin{eqnarray} \left[ \begin{array}{c} p_{t} \\ d_{t} \end{array} \right]

Upshot

Cointegration

A last word about cointegration

Model Continued

Rewriting the Model

VECM

Upshot

SVARs

Impulse Response Functions and Variance Decompositions

Sims (1980)

Blanchard and Quah (1989)

Sign Restrictions

Uhlig, 2005

Sign Restrictions

Sign Restrictions

Cholesky

Is that the last word?

Bibliography

References