We previously spoke about how much the efficacy of MCMC algorithms
depended on the shape of the posterior.
We’re going to talk about posterior simulators that make that idea explicit.
In particular, the Hamiltonian Monte Carlo sampler described in
\cite{Neal_2011}.
The evolution \(\theta\) and \(p\) is governed by set of differential equations. These differential equations are determined by the Hamiltonian, \(H(\theta,p)\):
\begin{eqnarray} H(\theta,p) = \mbox{Kinetic Energy } + \mbox{Potential Energy}. \end{eqnarray}
In physical systems, potential and kinetic energy are obviously
instrinsic features of the environment.
Our world:
\begin{eqnarray} H(\theta,p) &=& -\log\pi(p|\theta) - \log\pi(\theta). \nonumber \\\ &=& K(p,\theta) + V(\theta) \nonumber \end{eqnarray}
Then the joint density of \((\theta,p)\) is given by \(e^{-H(\theta,p)}\)
As mentioned above, the \(\theta,p\) evolve over time according to the Hamiltonian equations:
\begin{eqnarray} \frac{d\theta}{dt} &=& \frac{\partial H}{\partial p} = \frac{\partial K}{\partial p} \\\ \frac{dp}{dt} &=& -\frac{\partial H}{\partial \theta} = -\frac{\partial K}{\partial \theta} -\frac{\partial V}{\partial \theta} \end{eqnarray}
Note that \(-\partial V/\partial \theta = \nabla \log \pi(\theta)\), the gradient of the log of the target density.
Consider now the mapping \(Q_{\bar t}(p,\theta) = (p(\bar t), \theta(\bar t))\), the mapping of
the Hamiltonian for initial state \((p, \theta)\) at time \(t = \bar t\).
The basic idea is to use this as a proposal distribution in an
Metropolis-Hastings chain which targets \(e^{-H(p, \theta)}\).
The Hamiltonian Monte Carlo generates a Markov chain for the pair \((\theta^i,p^i)\), \(i=1,\ldots,N\).
Start with the case in which \(\pi(p|\theta)\) does not depend on \(\theta\).
Thus,
\[
H(\theta,p) = - \log \pi(p) - \log \pi(\theta).
\]
The Hamiltonian equations describe the evolution of \(\theta\) and \(p\) subject to the constraint
that \(H(\theta,p)\) (the sum of kinetic and potential energy) stays constant.
if \(\pi(p^i) > \pi(p^{i-1})\), it has to be the case that \(\pi(\theta^i) < \pi(\theta^{i-1})\).
One way of thinking about this relationship is that \(p^i\) determines the level of the posterior contour from which we are sampling and \(\theta^i\) is the location on the contour.
As the algorithm iterates over \(i\) we are first changing the level of the contour and then we adjust \(\theta^i\).
Suppose that \(\pi(\theta)\) corresponds to a \(N(\mu,\sigma^2)\). Thus, \[ \log \pi(\theta) = - \frac{1}{2} \ln (2 \pi \sigma^2) - \frac{1}{2 \sigma^2} (\theta - \mu)^2. \] Suppose we would generate the sequence \(\theta^i\) by direct sampling. Then the level of the log posterior density would vary according to \[ \log \pi(\theta^i) = - \frac{1}{2} \ln (2 \pi \sigma^2) - \frac{1}{2 \sigma^2} (\theta^i - \mu)^2 \] with
\begin{eqnarray*} \mathbb{E}[\log \pi(\theta^i) ] &=& - \frac{1}{2} \ln (2 \pi \sigma^2) - \frac{1}{2} \\\ \mathbb{V}[\log \pi(\theta^i) ] &=& \frac{1}{2} \end{eqnarray*}
Thus, if \(\theta^i\) is generated by direct sampling, we should
observe that the draws are \(iid\) and that the variance of the log
density is \(1/2\).
In the algorithm below, we will generate \(p^i\) by direct sampling from \(\pi(p)\), which we set to a \(N(0,M)\),
where \(M = \mathbb{V}_\pi[\theta]^{-1} = \sigma^{-2}\).
Note that in this simplified setting, the choice of \(M\) only affects
the level
\[
\mathbb{E}[ \log \pi(p^i)] = \frac{1}{2} \ln (2 \pi \sigma^2) - \frac{1}{2},
\]
but not the variance in the fluctuations around this level.
We are now in a position to solve the Hamiltonian equations, which take the form:
\begin{eqnarray*} \dot{\theta} &=& \frac{1}{M} p \\\ \dot{p} &=& -\frac{1}{\sigma^2}(\theta - \mu) \end{eqnarray*}
It can be verified that the solution to this system of differential equations takes the form
\begin{eqnarray*} \theta(t) &=& \mu + \rho \cos \big(\alpha + t/\sqrt{\sigma^2 M} \big) \\\ p(t) &=& - \rho \frac{\sqrt{M}}{\sigma} \sin \big(\alpha + t/\sqrt{\sigma^2 M} \big). \end{eqnarray*}
Here the parameters \((\rho,\alpha)\) are determined by the initial conditions: \[ \theta(0) = \mu + \rho \cos(\alpha), \quad p(0) = -\rho \frac{\sqrt{M}}{\sigma} \sin(\alpha), \] which leads to \[ \rho = -p(0) \frac{\sigma}{\sqrt{M}} \frac{1}{\sin \alpha}, \quad \alpha = \mbox{arctan} \left[ - \frac{p(0)/\sqrt{M}}{(\theta(0)-\mu)/\sigma} \right]. \]
We can now specify the function \(Q_{\bar t}(p, \theta)\). Using the fact that
\begin{eqnarray*} \cos(x+y) &=& \cos x \cos y - \sin x \sin y \\\ \sin(x+y) &=& \sin x \cos y + \cos x \sin y, \end{eqnarray*}
we obtain
\begin{eqnarray*} \theta(t) - \mu &=& \rho \cos(\alpha) \cos(t/\sqrt{\sigma^2 M}) - \rho \sin (\alpha) \sin (t/\sqrt{\sigma^2 M}) \\\ &=& (\theta(0) - \mu) \cos(t/\sqrt{\sigma^2 M}) + p(0) \frac{\sigma}{\sqrt{M}} \sin (t/\sqrt{\sigma^2 M}) \\\ p(t) &=& -\rho \frac{\sqrt{M}}{\sigma} \sin(\alpha) \cos(t/\sqrt{\sigma^2 M}) -\rho \frac{\sqrt{M}}{\sigma} \cos(\alpha) \sin(t/\sqrt{\sigma^2 M}) \\\ &=& p(0) \cos(t/\sqrt{\sigma^2 M}) - (\theta(0)-\mu) \frac{\sqrt{M}}{\sigma} \sin(t/\sqrt{\sigma^2 M}). \end{eqnarray*}
In matrix form, the equations can be written as \[ \hspace{-0.2in} \left[ \begin{array}{c} \theta(t) - \mu \ p(t) \end{array} \right] = \left[ \begin{array}{cc} \cos(t/\sqrt{\sigma^2 M}) & \frac{\sigma}{\sqrt{M}} \sin (t/\sqrt{\sigma^2 M}) \ -\frac{\sqrt{M}}{\sigma} \sin(t/\sqrt{\sigma^2 M}) & \cos(t/\sqrt{\sigma^2 M}) \end{array}\right] \left[ \begin{array}{c} \theta(0) - \mu \ p(0) \end{array} \right] \]
It is now easy to see that the jacobian of transformation associated with \(Q_{\bar t}(p, \theta) = 1\). Moreover,
\small
\begin{eqnarray*} \lefteqn{\left[ \begin{array}{cc} \cos(t/\sqrt{\sigma^2 M}) & \frac{\sigma}{\sqrt{M}} \sin (t/\sqrt{\sigma^2 M}) \ -\frac{\sqrt{M}}{\sigma} \sin(t/\sqrt{\sigma^2 M}) & \cos(t/\sqrt{\sigma^2 M}) \end{array}\right] \left[ \begin{array}{c} \theta(t) - \mu \ -p(t) \end{array} \right] } \\\ &=& \left[ \begin{array}{cc} \cos(t/\sqrt{\sigma^2 M}) & \frac{\sigma}{\sqrt{M}} \sin (t/\sqrt{\sigma^2 M}) \ -\frac{\sqrt{M}}{\sigma} \sin(t/\sqrt{\sigma^2 M}) & \cos(t/\sqrt{\sigma^2 M}) \end{array}\right] \\\ && \times \left[ \begin{array}{cc} \cos(t/\sqrt{\sigma^2 M}) & \frac{\sigma}{\sqrt{M}} \sin (t/\sqrt{\sigma^2 M}) \ \frac{\sqrt{M}}{\sigma} \sin(t/\sqrt{\sigma^2 M}) & -\cos(t/\sqrt{\sigma^2 M}) \end{array}\right] \left[ \begin{array}{c} \theta(0) - \mu \ p(0) \end{array} \right] \\\ &=& \left[ \begin{array}{cc} 1 & 0 \ 0 & -1 \end{array} \right] \left[ \begin{array}{c} \theta(0) - \mu \ p(0) \end{array} \right]. \end{eqnarray*}
Given the solution to the Hamiltonian equation, we can verify that \(H(\theta,p)\) stays indeed constant over time:
\begin{eqnarray*} H\big( \theta(t),p(t) \big) &=& \frac{1}{2} \ln (2 \pi M) + \frac{1}{2 M} \rho^2 \frac{M}{\sigma^2} \sin^2(\cdot) + \frac{1}{2} \ln (2 \pi \sigma^2) + \frac{1}{2 \sigma^2} \rho^2 \cos^2(\cdot) \\\ &=& \frac{1}{2} \left( \ln (2 \pi M) + \frac{1}{2} \ln (2 \pi \sigma^2) + \frac{\rho^2}{\sigma^2} \right). \end{eqnarray*}
Note that the initial values enter the level of ``energy’’ through the parameter \(\rho\).
The subsequent algorithm generates the proposal distribution using the Hamiltonian equations. In particular, it will start from \[ \theta(0) = \theta^{i-1}, \quad p(0) \sim N(0,M). \] Using the Hamiltonian equations over a time interval \(t\), we obtain (in slight abuse of notation)
\begin{eqnarray*} \hspace{-0.2in} \theta^* - \mu &=& \cos(t/\sqrt{\sigma^2 M}) (\theta^{i-1} - \mu) + \sigma \sin (t/\sqrt{\sigma^2 M}) N(0,1)\\\ p^* &=& -\frac{\sqrt{M}}{\sigma} \sin(t/\sqrt{\sigma^2 M}) (\theta^{i-1} - \mu) + \sqrt{M} \cos(t/\sqrt{\sigma^2 M}) \cdot N(0,1) \end{eqnarray*}
The acceptance probability for the proposed draw depends on the Hamiltonian. But by construction, the dynamics of \(\theta\) and \(p\) are such that the Hamiltonian is constant, meaning the draw is always accepted. So, we obtain the following autoregressive law of motion: \[ \theta^{i} - \mu = \cos(t/\sqrt{\sigma^2 M}) (\theta^{i-1} - \mu) + \sigma \sin (t/\sqrt{\sigma^2 M}) \epsilon^i, \quad \epsilon^i \sim N(0,1) \] Because \(\sin^2 + \cos^2 = 1\), it is easy to see that the stationary distribution of this AR(1) process is the target posterior \(N(\mu,\sigma^2)\). The persistence of the chain depends on the time period \(t\). When \(t\) is set to \[ t^{iid} = \frac{\pi}{2} \sqrt{\sigma^2 M}, \] the chain produces uncorrelated draws of \(\theta\).
To actual obtain (an approximate) \(\hat Q_{\bar t}(p, \theta)\), Hamilton’s equations are discretized in time.
To discretize, pick a step size \(\epsilon = \Delta t\). Then applying \(\bar t / \epsilon = L\) steps (assumed to be an integer) of the
discretized Hamiltonian obtains a draw from \(\hat Q_{\bar t}(p, \theta)\).
There are many ways approaches to discretize the ODEs!
We’re going to use the leapfrog method because it has nice
simulation properties and the resulting density maintains the
important statistical properties
Under the assumption that \(p|\theta\) is normally distributed with mean \(0\) and variance \(M\):
\begin{eqnarray} \log\pi(p|\theta) \propto - \frac{p’M^{-1}p}{2}, \end{eqnarray}
\(\partial K/\partial \theta = 0\) and the calculations are considerably simplified. (But it’s worth reiterating that \(\pi(p|\theta)\) is choice.) Given \(\theta(t)\) and \(p(t)\), a step of size \(\epsilon\) is taken using the following three steps.
\begin{eqnarray*} p(t+\epsilon/2) &=& p(t) + \frac{\epsilon}{2}\nabla \log \pi(\theta(t)) \\\ \theta(t+\epsilon) &=& \theta(t) + \epsilon M^{-1}p(t+\epsilon/2) \\\ p(t+\epsilon) &=& p(t+\epsilon/2) + \frac{\epsilon}{2}\nabla \log \pi(\theta(t+\epsilon)). \end{eqnarray*}
This leads to the following algorithm (the user needs to specify \(\epsilon\) and \(L\)):
\begin{algo}{Modified Leapfrog Simulator, \label{algo:leapfrog} $\hat Q_{L\epsilon}(\cdot,\cdot)$.} This algorithm takes as an inputs $(\theta, p)$ and returns $(\theta^*, p^*)$, after approximating Hamiltonion dynamics using step size $\epsilon$ for $L$ steps. \begin{enumerate} \item For $l = 1, \ldots, L$. Set \begin{eqnarray*} &1.& p \leftarrow p + \epsilon \nabla \log \pi(\theta)/2 \\\ &2.& \theta \leftarrow \theta + \epsilon M^{-1} p \\\ &3.& p \leftarrow p + \epsilon \nabla \log \pi(\theta) / 2 \end{eqnarray*} \item Return $\theta^* = \theta$ and $p^* = -p$. (The negation is not a typo!) \end{enumerate} \end{algo}
\includegraphics[width=4in]{static/leapfrog_neal}
For the example given above, we have
\begin{eqnarray*} \left[ \begin{array}{c} \theta^* - \mu \ p^* \end{array} \right] &=& \left[ \begin{array}{cc} 1 & 0 \ 0 & -1 \end{array} \right] \left[\begin{matrix}1 - \frac{\epsilon^{2}}{2 M \sigma^{2}} & \frac{\epsilon}{M}\- \frac{\epsilon}{\sigma^{2}} + \frac{\epsilon^{3}}{4 M \sigma^{4}} & 1 - \frac{\epsilon^{2}}{2 M \sigma^{2}}\end{matrix}\right]^L \left[ \begin{array}{c} \theta^{i-1} - \mu \ \hat p \end{array} \right] \end{eqnarray*}
The stability of this system is determined by the eigenvalues \(\lambda_1\) and \(\lambda_2\), where
\begin{eqnarray} \lambda_1 &= \frac{2 M \sigma^{2} - \epsilon^{2}}{2 M \sigma^{2}} + \frac{\epsilon}{2 M \sigma^{2}} \sqrt{- 4 M \sigma^{2} + \epsilon^{2}}, \nonumber \\\ \lambda_2 &= \frac{2 M \sigma^{2} - \epsilon^{2}}{2 M \sigma^{2}} - \frac{\epsilon}{2 M \sigma^{2}} \sqrt{- 4 M \sigma^{2} + \epsilon^{2}} \nonumber \end{eqnarray}
Case 1: \(\epsilon > \sqrt{4M\sigma^2}\). Then
\begin{eqnarray*} \lambda_2 &=& 1 - \frac{\epsilon^2}{2M\sigma^2} - \frac{\epsilon}{4M\sigma^2}\sqrt{- 4 M \sigma^{2} + \epsilon^{2}} \\\ &<& 1 - \frac{\epsilon^2}{2M\sigma^2} \\\ &<& 1 - \frac{4M\sigma^2}{2M\sigma^2} \\\ &<& -1 \end{eqnarray*}
For any Metropolis-style algorithm for \(x \sim \pi\) with proposal density \(q(x^* | x)\), note the expected (unbounded) acceptance probability is given by:
\begin{eqnarray} E\left[\frac{\pi(x^*)q(x|x^*)}{\pi(x)q(x^*|x)}\right] &=& \int\int\frac{\pi(x^*)q(x|x^*)}{\pi(x)q(x^*|x)} q(x^*|x) \pi(x) dx dx^* \nonumber \\\ &=& \int \pi(x^*)\left[\int q(x|x^*) dx\right] dx^* \nonumber \\\ &=& \int \pi(x^*)dx^* \nonumber \\\ &=& 1. \end{eqnarray}
Note that for both the Random Walk Metropolis-Hastings (RWMH) algorithm \((x = \theta)\) and the Hamiltonian Monte Carlo algorithm \((x = (\theta,p))\), we have \(q(x|x^*) = q(x^*|x)\). This means we can deduce that
\begin{eqnarray} \label{eq:pratio} E\left[\frac{\pi(x^*)}{\pi(x)}\right] = 1. \end{eqnarray}
If we write the posterior in the form, \(\pi(x) = \exp(-f(x))/Z\), then using (\ref{eq:pratio}), we deduce that
\begin{eqnarray} \label{eq:pratio} E\left[\exp\left\{-(f(x^*) - f(x))\right\}\right] = 1. \end{eqnarray}
Defining \(\Delta = f(x^*) - f(x)\) and using Jensen’s inequality, we have:
\begin{eqnarray} \label{eq:delta} E\left[\Delta\right] \ge 0. \end{eqnarray}
In most interesting cases, the inequality is strict. Next, note also that \(\alpha(x^*|x)\), the Metropolis-Hastings acceptance probability is given by:
\begin{eqnarray} \label{eq:alpha} \alpha(x^*|x) = \min\left\{1,\exp(-\Delta)\right\}. \end{eqnarray}
The expected acceptance rate for an MH algorithm can be written as \tiny
\begin{eqnarray} \hspace{-2.4in} \label{eq:avgacpt} \bar \alpha &=& \int \alpha(x^* | x) q(x^*|x) \pi(x) dx^* dx \nonumber \\\ &=& \int_{\Delta(x^*,x) < 0} \alpha(x^* | x) q(x^*|x) \pi(x) dx^* dx + \int_{\Delta(x^*,x) > 0} \alpha(x^* | x) q(x^*|x) \pi(x) dx^* dx\nonumber \\\ &=& \int_{\Delta(x^*,x) < 0} \alpha(x^*| x) q(x^*|x) \pi(x) dx^* dx + \int_{\Delta(x^*,x) > 0} \alpha(x | x^*) q(x|x^*) \pi(x^*) dx^* dx\nonumber \\\ &=& \int_{\Delta(x^*,x) < 0} \alpha(x^*| x) q(x^*|x) \pi(x) dx^* dx + \int_{\Delta(x,x^*) < 0} \alpha(x | x^*) q(x|x^*) \pi(x^*) dx^* dx\nonumber \\\ &=& \int_{\Delta(x^*,x) < 0} q(x^*|x) \pi(x) dx^* dx + \int_{\Delta(x,x^*) < 0} q(x|x^*) \pi(x^*) dx^* dx\nonumber \\\ &=& 2 \times {\mathbb P}(\Delta < 0). \end{eqnarray}
In what follows, we first derive a restriction on the relationship between the mean and variance of \(\Delta\). If we assume the elements of \(x\), \(x_i\), are independent we can write,
\begin{eqnarray} \label{eq:f} f(x) = \sum_{d=1}^D f_d(x_d) \mbox{ and } \Delta = \sum_{d=1}^D \Delta_d, \end{eqnarray}
where \(D\) is the size of x. Taking a second-order expansion around \(\exp(-\Delta_d)\), we have:
\begin{eqnarray} \exp(-\Delta_d) \approx 1 - \Delta_d + \Delta_d^2/2. \end{eqnarray}
Using (\ref{eq:f}), $E[Δ_d] ≈ E[Δ^2_d]/2.$ This means the mean of \(\Delta_d\) is about half of the variance of \(\Delta^2_d\). If we further assume that the proposals are independent, this means that \(E[\Delta] \approx E[\Delta^2]/2\).
write \(\mu = E[\Delta]\), \[ \Delta \sim N(\mu, 2\mu). \] Using this, we can write:
\begin{eqnarray} \label{eq:abar} \bar \alpha &=& 2 \times {\mathbb P}(\Delta < 0) \nonumber \\\ &=& 2 \Phi\left(\frac{0 - \mu}{\sqrt{2\mu}}\right) \nonumber \\\ &=& 2 \Phi(-\sqrt{\mu/2}). \end{eqnarray}
Finally, we use this to construct a (heuristic) cost of an algorithm:
\begin{eqnarray}
\label{eq:cost}
\mbox {Cost of Alg.} &\propto& \mbox{Avg. Number of Proposals
Before Acceptance} \nonumber \\\
&~& \times \mbox{ Proposal Steps to Independent'' Point} \nonumber \\\\\\ &=& \frac{1}{\bar \alpha} \times \mbox{ Proposal Steps to
Independent’’ Point}
\end{eqnarray}
Consider the RWMH applied
to a independent multivariate normal distribution of size \(D\). So
\(x \sim N(0,I_D)\) and \(q(x^*|x) \sim N(x,c^2 I_D)\). Then:
\[
E[\Delta] = E\left[\sum_{d=1}^D \Delta_d\right] = \sum_{d=1}^D E[\Delta_d] = \frac{c^2}{2}D.
\]
Informally, this probability is decreasing as \(D\) increases, since the
expected difference \(E[\Delta] \propto D\) with fixed \(c\).
This means to maintain a given acceptance rate as \(D\) increases, \(c\) must shrink
at rate \(D^{-1/2}\).
Finally for random walk processes, it is well known that the number of steps needed to reach a ``nearly independent’’ point will be proportional to \(c^{-2}\). We can then write the cost, \(C_{RWMH}\), as:
\begin{eqnarray} \label{eq:cost-rwmh} C_{RWMH} = \frac{1}{2\Phi(-\sqrt{\mu/2})}\times \frac{1}{\mu}. \end{eqnarray}
This cost is minimized when \(\mu = 2.83\), which implies as \(\bar\alpha = 0.234\), in accordance with well known results.
Now \(f(\theta,p)\) can be written as:
\[
f(\theta, p) = \frac{\theta^2}{2} + \frac{p^2}{2}
\]
Applying many leapfrog steps \(L\) leads to:
\[
E[\Delta] \approx D\epsilon^4.
\]
This means that as \(D\) increases, \(\epsilon\) must shrink at rate
\(D^{-1/4}\) in order to maintain a reasonable acceptance rate.
The number of leapfrog updates to reach a nearly independent point
will grow at rate \(\epsilon\).
Using the above relationship, \(\mu \propto \epsilon^{1/4}\). Thus:
\begin{eqnarray} \label{eq:cost-HMC} C_{HMC} = \frac{1}{2\Phi(-\sqrt{\mu/2})}\times \frac{1}{\mu^{1/4}}. \end{eqnarray}
This cost is minimized when \(\mu = 0.41\), which implies as \(\bar\alpha = 0.65\). Hence, one should target an acceptance rate of about 65%.
We now combine the components to obtain a Hamiltonian Monte Carlo algorithm. Our goal is to obtain draws from \(\pi(\theta)\). To do so, we target the augmented distribution: \[ \pi(\theta,p) = \pi(\theta)\underbrace{\pi(p|\theta)}_{\mathcal N(0, M).} = \exp\{-H(\theta,p)\}. \] Given \((\theta^{i-1}, p^{i-1})\), we apply two Markov transition kernels in succession to get \((\theta^i, p^i)\). The two-step process first operates on \(p\) and then \((\theta, p)\) jointly.
Of course, in practice this is implemented in a one-step procedure. In fact, there is no need to store draws of \(p\).
In our applications, we should try to have \(M\) approximate \(\mathbb V_{\pi}[\theta]^{-1}\). And, of course, we also need an initial \(\theta\).
An important thing to note about this proposal is that,
\[
\hat Q_{L\epsilon}(\hat Q_{L\epsilon}(\theta^{i-1}, \hat p)) = (\theta^{i-1}, \hat p).
\]
In this sense the mapping is symmetric.
Moreover, the Jacobian
associated with \(Q_{L\epsilon}\) is 1. The upshot of this is
that we don’t need to account for \(Q_{L\epsilon}\) in the
Metropolis-Hastings acceptance probability, which is given by:
\begin{eqnarray} \label{eq:hmc-alpha} \alpha = \max\left\{1, \exp\left( H(\theta^{i-1}, \hat p)- H(\theta^*, p^*)\right)\right\} = \max\left\{1, \frac{\pi(\theta^*, p^*)}{\pi(\theta^{i-1}, \hat p)}\right\}. \end{eqnarray}
Note that the Hamiltonian equations ensure that the ratio \(\pi(\theta^*, p^*)/\pi(\theta^{i-1},\hat p)\), and therefore the acceptance ratio, is close to one. Because of the discretization, however, in the practical implementation the ratio is not equal to one.
Consider the circular posterior density \[ \pi(\theta) \propto \exp \bigg \{ -\frac{1}{2}\psi_n \big((\theta_1-1)^2 + (\theta_2-1)^2 -1 \big) ^2 \bigg \}{\bf 1}\left\{\theta\in[-1,3]^2\right\} \] This means that:
\begin{multline}
\nabla \log \pi(\theta) = \bigg[- 2 \psi_n \left(\theta_{1} - 1\right) \left(\left(\theta_{1} - 1\right)^{2} + \left(\theta_{2} - 1\right)^{2} - 1\right), \\\
With \(\psi_n = 1000\), the ``posterior’’ covariance of this distribution is:
\begin{eqnarray*} \mathbb V_{\pi}[\theta] = \left[\begin{array}{cc} 0.5 & 0 \ 0 & 0.5 \end{array}\right]. \end{eqnarray*}
\begin{center} \includegraphics[width=4.5in]{static/results} \end{center}
It seems like this should be great for DSGE models right?
Well, we need to compute \(\nabla \log \pi(\theta)\). Not to easy for a DSGE!
Also, multimodality is still present! Can combine SMC with HMC!
</home/eherbst/Dropbox/ref/ref.bib>