Derivation of the Kalman Filtering Equations for the Time-varying Intercept Simple Linear Regression Model

22 Apr 2023

Some of the most useful and interesting applications of Kalman filter in economics and finance are in the context of multivariate models. However, the method is best understood when introduced in the context of a simple, univariate model. Therefore, the goal of this post is to introduce the reader to the concept of the Kalman filter using a very simple, univariate regression model.

Suppose, we have the following simple regression model for the GDP growth: \(\begin{align} y_t = \beta_t + e_t \tag{1} \end{align}\) where $y_{t}$ is the growth rate of real US GDP and \(e_{t}\sim i.i.d.\text{ } N(0,R). \tag{2}\) and $\beta_t$ is the time-varying mean GDP growth that follows an AR(1) process: \(\begin{equation} \beta_{t}=\mu +F\beta_{t-1}+v_{t} \tag{3} \end{equation}\) where \(v_{t}\sim i.i.d.\text{ } N(0,Q) \tag{4}\)

The idea of Kalman FIlter is pretty straight-forward and consists of two steps: Prediction: At the beginning of time $t$, we may want to form an optimal predictor of $y_{t}$ based on all the available information up to time t-1: $y_{t|t-1}$. To do this we need to calculate $\beta_{t|t-1}$.

Updating: Once $y_{t}$ is realized at the end of time $t$, the prediction error can be calculated: $\eta_{t t-1}=y_{t}-y_{t t-1}$. This prediction error contains new information about $\beta_{t}$ beyond that contained in $\beta_{t t-1}$. Thus, after observing $y_{t}$ a more accurate inference can be made of $\beta_{t}:\beta_{t t}$. An inference of $\beta_t$ based on information up to time $t$ may be of the following form: $\beta_{t t}=\beta_{t t-1}+K_{t}\eta_{t t-1}$ where $K_t$, the Kalman gain, is the weight assigned to new information about $\beta_{t}$ contained in the prediction error. To be more specific the basic filter is described in following two steps:

1) Prediction \(\begin{align} \beta_{t|t-1}=\mu +F\beta_{t-1|t-1} \tag{5}\\ P_{t|t-1}=F^{2}P_{t-1|t-1}+Q \tag{6}\\ \eta_{t|t-1}=y_{t}-y_{t|t-1}=y_{t}-\beta_{t|t-1} \tag{7}\\ f_{t|t-1}=P_{t|t-1}+R \tag{8}\\ \end{align}\)

where
$\beta_{t|t-1}=E(\beta_{t}|\psi_{t-1})$ - estimate of $\beta_{t}$ conditional on information up to $t-1$
$\beta_{t|t}=E(\beta_{t}|\psi_{t})$ - estimate of $\beta_{t}$ conditional on information up to $t$
$P_{t|t-1}=E(\beta_{t}-\beta_{t|t-1})^{2}$ - variance of $\beta_{t}$ conditional on information up to $t-1$
$P_{t|t}=E(\beta_{t}-\beta_{t|t})^{2}$ - variance of $\beta_{t\text{ }}$ conditional on information up to $t$
$\eta_{t|t-1}=y_{t}-y_{t|t-1}$ - prediction error
$f_{t|t-1}=E(\eta_{t|t-1}^{2})$ - conditional variance of prediction error

2) Updating \(\begin{align} K_{t}=\frac{P_{t|t-1}}{f_{t|t-1}} \tag{9} \\ \beta_{t|t}=\beta_{t|t-1}+K_{t}\eta_{t|t-1} \tag{10} \\ P_{t|t}=P_{t|t-1}-KP_{t|t-1} \tag{11} \end{align}\)

Provided that eigenvalues of $F$ are all inside the unit circle, then the process for $\beta_{t}$ in equation (3) is is covariance-stationary and thus the derivation of equation (5) is straight-forward. Equation (6) can be derived as follows:

\[\begin{align} P_{t|t-1} &=E_{t-1}(\beta_{t}-\beta_{t|t-1})^{2}\\ &=E_{t-1}(\mu +F\beta_{t-1}+v_{t}-\mu -F\beta_{t-1|t-1})^{2} \\ &=E_{t-1}(F\beta_{t-1}+v_{t}-F\beta_{t-1|t-1})^{2} \\ &=E_{t-1}(F(\beta_{t-1}-\beta_{t-1|t-1})+v_{t})^{2} \\ &=F^{2}E_{t-1}(\beta_{t-1}-\beta_{t-1|t-1})^{2}+Q \\ &=F^{2}E_{t-1}P_{t-1|t-1}+Q \end{align}\]

Since $E_{t-1}[\eta_{t}]=0$ , equation (8) can be derived in the following way: \(\begin{align} f_{t|t-1} &=E_{t-1}(\eta_{t|t-1}^{2})=E_{t-1}(y_{t}-y_{t|t-1})^{2} \\ &=E_{t-1}(\beta_{t}+e_{t}-\beta_{t|t-1})^{2} \\ &=E_{t-1}((\beta_{t}-\beta_{t|t-1})+e_{t})^{2} \\ &=E_{t-1}(\beta_{t}-\beta_{t|t-1})^{2}+E_{t-1}(e_{t})^{2} \\ &=P_{t|t-1}+R \end{align}\) The derivation of the updating equations relies on the following known result in Probability Theory:

Theorem 1 (Proof is in the appendix) If two random variables A and B are jointly normally distributed then the conditional on B, A is normally distribution with mean \(\mu_{A|B}=\mu_A+\frac{\sigma_{AB}}{\sigma_B^2}(b-\mu_B) \tag{12}\) and variance \(\sigma_{A|B}^{2}=\sigma_{A}^{2}-\frac{\sigma_{AB}^{2}}{\sigma_{B}^{2}} \tag{13}\)

To derive equations (10) and (11), assume that $\beta_{t}$ and $\eta_{t|t-1}=y_{t}-y_{t|t-1}$ are jointly normally distributed. By denoting $A=\beta_{t}$ and $B=y_{t}-y_{t|t-1}=\eta_{t|t-1}$ we can obtain: \(\begin{align} \mu_{A}&=\beta_{t|t-1} \\ \mu_{B}&=E_{t-1}(\beta_{t}+e_{t}-\beta_{t|t-1}x_{t})=E_{t-1}(\beta _{t}-\beta_{t|t-1})=0\\ \sigma_{A}^{2}&=P_{t|t-1} \\ \sigma_{B}^{2}&=f_{t|t-1} \\ \sigma_{AB} &=E_{t-1}((\beta_{t}-\beta_{t|t-1})(\eta_{t|t-1})) \\ &=E_{t-1}((\beta_{t}-\beta_{t|t-1})(y_{t}-y_{t|t-1})) \\ &=E_{t-1}((\beta_{t}-\beta_{t|t-1})(\beta_{t}+e_{t}-\beta _{t|t-1})) \\ &=E_{t-1}((\beta_{t}-\beta_{t|t-1})((\beta_{t}-\beta _{t|t-1})+e_{t})) \\ &=E_{t-1}(\beta_{t}-\beta_{t|t-1})^{2}+E_{t-1}(e_{t}(\beta _{t}-\beta_{t|t-1})) \\ &=P_{t|t-1} \end{align}\)

Then

\[\begin{align} \beta_{t|t} &=\beta_{t|t-1}+\frac{P_{t|t-1}}{f_{t|t-1}}\eta_{t|t-1} \tag{14}\\ &=\beta_{t|t-1}+K_{t}\eta_{t|t-1} \end{align}\]

Now due to (13): \(\begin{align} P_{t|t} &=P_{t|t-1}-\frac{P_{t|t-1}^{2}}{f_{t|t-1}} \tag{15}\\ &=P_{t|t-1}-\frac{P_{t|t-1}}{f_{t|t-1}}P_{t|t-1}x_{t}\\ &=P_{t|t-1}-K_{t}P_{t|t-1} \end{align}\)

One important question that must be answered is what must be the initial values for $\beta_{0 0}$ and $P_{0 0}$? Kim and Nelson (1999) suggest that those can be estimated with other parameter using MLE as follows.

From (5) we have: \(\begin{align} \beta_{0|0} &=\mu +F\beta_{0|0} \\ \Downarrow & \nonumber\\ \beta_{0|0} =&\frac{\mu }{1-F} \nonumber \end{align}\)

and from (6) we have: \(\begin{align} P_{0|0} &=F^{2}P_{0|0}+Q \\ \Downarrow & \nonumber\\ P_{0|0} =&\frac{Q}{1-F^{2}} \end{align}\)

In the future, I will show how to estimate this simple model using MLE and R.

Appendix

Proof of Theorem 1 If A and B are jointly normally distributed their joint density is: \(f(A,B) = \frac{1}{2 \pi \sigma_A \sigma_B \sqrt{1 - \rho^2}} \exp\left(-\frac{1}{2(1 - \rho^2)} \left(\frac{(a - \mu_A)^2}{\sigma_A^2} - 2 \rho \frac{(a - \mu_A)(b - \mu_B)}{\sigma_B \sigma_B}+ \frac{(b - \mu_B)^2}{\sigma_B^2} \right)\right)\)

and the marginal distribution of B is: \(g(B)=\frac{1}{\sqrt{2\pi}\sigma_B}\exp \left(-\frac{1}{2} \left(\frac{(b - \mu_B)^2}{\sigma_B^2} \right)\right)\) To simplify notations, let \(v=\frac{a-\mu_{A}}{\sigma_{A}}\) and \(u=\frac{b-\mu_{B}}{\sigma_{B}}\) Rewrite the formula of the joint density using these simplifications: \(f(A,B) = \frac{1}{2 \pi \sigma_A \sigma_B \sqrt{1 - \rho^2}} \exp\left(-\frac{1}{2(1 - \rho^2)} \left(v^2 - 2 \rho v u + u^2 \right)\right)\)

and marginal density function of B as \(g(B)=\frac{1}{\sqrt{2\pi}\sigma_B}\exp \left(-\frac{1}{2} \left(u^2 \right)\right)\) Now for any two random variables A and B, conditional density $w(A|B)=\dfrac{f(A,B)}{g(B)}$ . Using the two formulas above, the conditional density formula can be rewritten as:

\(\begin{align} w(A|B) &=\frac{\dfrac{1}{2\pi \sigma_{B}\sigma_{A}\sqrt{1-\rho ^{2}}}e^{- \dfrac{1}{2(1-\rho ^{2})}[u^{2}-2\rho uv+v^{2}]}}{\dfrac{1}{\sqrt{2\pi } \sigma_{B}}e^{-\dfrac{1}{2}u^{2}}} \\ w(A|B) &=\dfrac{1}{\sqrt{2\pi }\sigma_{A}\sqrt{1-\rho ^{2}}}e^{-\dfrac{ [u^{2}-2\rho uv+v^{2}]}{2(1-\rho ^{2})}+\dfrac{u^{2}}{2}} \\ w(A|B) &=\dfrac{1}{\sqrt{2\pi }\sigma_{A}\sqrt{1-\rho ^{2}}}e^{-\dfrac{ [u^{2}-2\rho uv+v^{2}]}{2(1-\rho ^{2})}+\dfrac{u^{2}-u^{2}\rho ^{2}}{2(1-\rho ^{2})}} \\ w(A|B) &=\dfrac{1}{\sqrt{2\pi }\sigma_{A}\sqrt{1-\rho ^{2}}}e^{-\dfrac{[u^{2}-2\rho uv+v^{2}]-u^{2}+u^{2}\rho ^{2}}{2(1-\rho ^{2})}} \\ w(A|B) &=\dfrac{1}{\sqrt{2\pi }\sigma_{A}\sqrt{1-\rho ^{2}}}e^{-\dfrac{[v^{2}-2\rho uv+u^{2}\rho ^{2}]}{2(1-\rho ^{2})}} \\ w(A|B) &=\dfrac{1}{\sqrt{2\pi }\sigma_{A}\sqrt{1-\rho ^{2}}}e^{-\dfrac{[v-\rho u]^{2}}{2(1-\rho ^{2})}} \\ w(A|B) &=\dfrac{1}{\sqrt{2\pi }\sigma_{A}\sqrt{1-\rho ^{2}}}e^{-\dfrac{1}{2}\left[ \dfrac{v-\rho u}{\sqrt{(1-\rho ^{2})}}\right] ^{2}} \end{align}\) Now express the last equation in terms of original variables A and B:

\(\begin{align} w(A|B) &=\dfrac{1}{\sqrt{2\pi }\sigma_{A}\sqrt{1-\rho ^{2}}}e^{-\dfrac{1}{2}\left[ \dfrac{\frac{a-\mu_{A}}{\sigma_{A}}-\rho \frac{b-\mu_{B}}{\sigma _{B}}}{\sqrt{(1-\rho ^{2})}}\right] ^{2}} \\ w(A|B) &=\dfrac{1}{\sqrt{2\pi }\sigma_{A}\sqrt{1-\rho ^{2}}}e^{-\dfrac{1}{2}\left[ \dfrac{a-\{\mu_{A}+\rho \dfrac{\sigma_{A}}{\sigma_{B}}(b-\mu _{B})\}}{\sigma_{A}\sqrt{(1-\rho ^{2})}}\right] ^{2}} \end{align}\) From the above, it must be clear that conditional on B, A is normally distributed with mean \(\begin{align} \mu_{A|B} &=\mu_{A}+\rho \frac{\sigma_{A}}{\sigma_{B}}(b-\mu_{B})\\ &\Downarrow & \nonumber \\ \mu_{A|B} &=\mu_{A}+\frac{\sigma_{AB}}{\sigma_{B}\sigma_{A}}\frac{\sigma_{A}}{\sigma_{B}}(b-\mu_{B}) \nonumber \\ &\Downarrow & \nonumber \\ \mu_{A|B} &=\mu_{A}+\frac{\sigma_{AB}}{\sigma_{B}^{2}}(b-\mu_{B}) \nonumber \end{align}\) and variance \(\begin{align} \sigma_{A|B}^{2} &=\sigma_{A}^{2}(1-\rho ^{2}) \\ &\Downarrow \\ \sigma_{A|B}^{2} &=\sigma_{A}^{2}-\rho ^{2}\sigma_{A}^{2} \nonumber \\ &\Downarrow \\ \sigma_{A|B}^{2} &=\sigma_{A}^{2}-\frac{\sigma_{AB}^{2}}{\sigma _{B}^{2}\sigma_{A}^{2}}\sigma_{A}^{2} \nonumber \\ &\Downarrow \\ \sigma_{A|B}^{2} &=\sigma_{A}^{2}-\frac{\sigma_{AB}^{2}}{\sigma_{B}^{2}} \nonumber \end{align}\)

Q.E.D

References

Kim, C. J., & Nelson, C. R. (1999). State-space models with regime switching: classical and Gibbs-sampling approaches with applications. MIT Press Books1.