文章基本信息

标题：Expectations and the term structure of interest rates: evidence and implications.
作者：King, Robert G. ; Kurmann, Andre
期刊名称：Economic Quarterly
印刷版ISSN：1069-7225
出版年度：2002
期号：September
语种：English
出版社：Federal Reserve Bank of Richmond
摘要：Interest rates on long-term bonds are widely viewed as important for many economic decisions, notably business plant and equipment investment expenditures and household purchases of homes and automobiles. Consequently, macroeconomists have extensively studied the term structure of interest rates. For monetary policy analysis this is a crucial topic, as it concerns the link between short-term interest rates, which are heavily affected by central bank decisions, and long-term rates.
关键词：Interest rates;Monetary policy

Expectations and the term structure of interest rates: evidence and implications.

King, Robert G. ; Kurmann, Andre

Interest rates on long-term bonds are widely viewed as important for many economic decisions, notably business plant and equipment investment expenditures and household purchases of homes and automobiles. Consequently, macroeconomists have extensively studied the term structure of interest rates. For monetary policy analysis this is a crucial topic, as it concerns the link between short-term interest rates, which are heavily affected by central bank decisions, and long-term rates.

The dominant explanation of the relationship between short- and long-term interest rates is the expectations theory, which suggests that long rates are entirely governed by the expected future path of short-term interest rates. While this theory has strong implications that have been rejected in many studies, it nonetheless seems to contain important elements of truth. Therefore, many central bankers and other practitioners of monetary policy continue to apply it as an admittedly imperfect yet useful benchmark. In this article, we work to quantify both the dimensions along which the expectations theory succeeds in describing the link between expectations and the term structure and those along which it does not, thus providing a better sense of the utility of this benchmark.

Following Sargent (1979) and Campbell and Shiller (1987), we focus on linear versions of the expectations theory and linear forecasting models of future interest rate expectations. In this context, we reach five notable conclusions for the period since the Federal Reserve-Treasury Accord of March 1951. (1)

First, cointegration tests confirm that the levels of both long and short interest rates are driven by a common stochastic trend. In other words, there is a permanent component that affects long and short rates equally, which accords with one of the basic predictions of the expectations theory.

Second, while changes in this stochastic trend dominate the month-to-month changes in long-term interest rates, the same changes affect the short-term rate to a much less important degree. We summarize our detailed econometric analysis with a useful rule of thumb for applied researchers: it is optimal to infer that the stochastic trend in interest rates has varied by 97 percent of any change in the long-term interest rate. (2) In this sense, the long-term interest rate is a good indicator of the stochastic trend in interest rates in general. (3)

Third, according to cointegration tests, the spread between long and short rates is not affected by the stochastic trend, which is consistent with the expectations theory. Rather, the spread is a reasonably good indicator of changes in the temporary component of short-term interest rates. Developing a similar rule of thumb, we compute that on average, a 1 percent increase in the spread indicates a 0.71 percent decrease in the temporary component of the short rate, i.e., in the difference between the current short rate and the stochastic trend.

Fourth, the expectations theory imposes important rational expectations restrictions on linear time series models in the spread and short-rate changes. Like Campbell and Shiller (1987), who pioneered testing of the expectations theory in a cointegration framework, we find that these restrictions are decisively rejected by the data. But our work strengthens this conclusion by using a longer sample period and a better testing methodology. (4) We interpret the rejection as arising from predictable time-variations in term premia. Under the strongest form of the expectations theory, term premia should be constant and fluctuations in the spread should be entirely determined by expectations about future short-rate changes. However, our calculations indicate that--as another rule of thumb--a 1 percent deviation of the spread from its mean signals a 0.69 percent fluctuation of the expectations component with the remainder viewed as arising from shifts in the term premia.

Fifth, based on the work by Sargent (1979), we show how to adapt the restrictions implied by the expectations theory to a situation where term premia are time-varying but unpredictable over some forecasting horizons. Our tests indicate that these modified restrictions continue to be rejected with forecasting horizons of up to a year. Thus, departures from the expectations theory in the form of time-varying term premia are not simply of a high frequency form, although the cointegration results indicate that the term premia are stationary.

Our empirical findings should provide some guidance for macroeconomic modeling, including work on small-scale econometric models and on monetary policy rules. In particular, our results suggest that the presence of a common stochastic trend in short and long nominal rates is a feature of post-Accord history that deserves greater attention. Furthermore, the detailed empirical results and the summary rules-of-thumb can be considered as a useful guide for monetary policy discussions. As an example, we ask whether the general patterns in the 50-year sample hold up over the period 1986-2001. Interestingly, we find a reduced variability in the interest rate stochastic trend: it is only about half as volatile as during the entire sample period. Nevertheless, the appropriate rule of thumb is still to view 85 percent of any change in the long rate as reflecting a shift in the stochastic trend. Our analysis also indicates that the expectations component of the spread (the discounted sum of expected short-rate changes) is of larger importance in the more recent sample, justifying an increase of the relevant rule-of-thumb coefficient from 69 percent to 77 percent. One interpretation of these different results is that they indicate increased credibility of the Federal Reserve System over the last decade and a half, which Goodfriend (1993) describes as the Golden Age of monetary policy because of enhanced credibility.

1. HISTORICAL BEHAVIOR OF INTEREST RATES

The historical behavior of short-term and long-term interest rates during the period April 1951 to November 2001 is shown in Figure 1. The two specific series that we employ have been compiled by Ibbotson (2002) and pertain to the 30-day T-bill yield for the short rate and the long-term yield on a bond of roughly twenty years to maturity for the long rate. One motivation for our use of this sample period is that the research of Mankiw and Miron (1986) suggests that the expectations theory encounters particular difficulties after the founding of the Federal Reserve System, particularly during the post-Accord period, because of the nonstationarity of short-term interest rates.

In this section, we start by discussing some key stylized facts that have previously attracted the attention of many researchers. We then conduct some basic statistical tests on these series that provide important background to our subsequent analysis.

Basic Stylized Facts

We begin by discussing three important facts about the levels and comovement of short-term and long-term interest rates and then discuss two additional important facts about the predictability of these series.

Wandering levels: The levels of short-term and long-term interest rates vary substantially through time, as shown in Figure 1. Table 1 reports the very different average values over subsamples: in the 1950s, the short rate averaged 1.85 percent and the long rate averaged 3.02 percent; in the 1970s, the short rate averaged 6.13 percent and the long rate averaged 7.57 percent; and in the 1990s, the short rate averaged 4.80 percent and the long rate averaged 7.10 percent. These varying averages suggest that there are highly persistent factors that affect interest rates.

Comovement: While the levels of interest rates wander through time, subperiods of high average short rates are also periods of high average long rates. Symmetrically, short-term and long-term interest rates have a tendency to simultaneously display low average values within subperiods. This suggests that there may be common factors affecting long and short rates.

Relative stability of the spread: The spread between long- and short-term interest rates is much more stable over time, with average values of 1.17 percent, 1.45 percent, and 2.30 percent over the three decades discussed above. This again suggests that there is a common source of persistent variation in the two rates.

Predictability of the spread: While apparently returning to a more or less constant value, the spread between long and short rates appears relatively forecastable, even from its own past, because it displays substantial autocorrelation. This predictability has made the spread the focus of many empirical investigations of interest rates.

Changes in short-term and long-term interest rates: Figure 2 shows that changes in short and long rates are much less auto correlated. The two plots also highlight the changing volatility of short-term and long-term interest rates, which has been the subject of a number of recent investigations, including that of Watson (1999).

Basic Statistical Tests

The behavior of short-term and long-term interest rates displayed in Figures 1 and 2 has led many researchers to model the two series as stationary in first differences rather than in levels.

Unit root tests for interest rates: Accordingly, we begin by investigating whether there is evidence against the assumption that each series is stationary in differences rather than in levels. For this purpose, the first two columns of Table 2 report regressions of the augmented Dickey-Fuller (ADF) form. Specifically, the regression for the short rate [R.sub.t] takes the form

[DELTA][R.sub.t] = [a.sub.0] + [a.sub.1][DELTA][R.sub.t-1] + [a.sub.2] [DELTA][R.sub.t-2] + .... [a.sub.p][DELTA][R.sub.t-p] + f [R.sub.t-1] + [e.sub.Rt].

Our null hypothesis is that the short-term interest rate is difference stationary and that there is no deterministic trend in the level of the rate. In particular, stationarity in first differences implies that f = 0; if a deterministic trend is also absent, then [a.sub.0] = 0 as well. The alternative hypothesis is that the interest rate is stationary in levels (f < 0); in this case, a constant term is not generally zero because there is a non-zero mean to the level of the interest rate. The relevant test is reported in Table 2 for a lag length of p = 4. (5) It involves a comparison of fit of the constrained regression in the first column and the unconstrained regression in the second column, with the former appropriate under the null hypothesis of a unit root and the latter appropriate under the alternative of stationarity. There is no strong evidence against the null, since the Dickey-Fuller F-statistic of 2.94 is less than the 10 percent critical value of 3.78. (6) Looking at comparable results for the lon g rate [R.sup.L.sub.t] we find even less evidence against the null hypothesis. (7) The value of the Dickey-Fuller F-statistic is even smaller. (8) We therefore model both interest rates as first difference stationary throughout our analysis.

In these regressions, we also find the first evidence of different predictability of short-term and long-term interest rates, a topic that will be a focus of much discussion below. Foreshadowing this discussion, we will find in every case that long-rate changes are less predictable than short-rate changes. In Table 2, the unconstrained regression for changes in the long rate accounts for about 3.5 percent of its variance, and the unconstrained regression for changes in the short rate accounts for about 8 percent of its variance. (9)

A simple cointegration test: Since we take the long-term and short-term rate as containing unit roots, the spread [S.sub.t] = [R.sup.L.sub.t] - [R.sub.t] may either be nonstationary or stationary. If the spread is stationary, then the long-term and short-term interest rates are cointegrated in the terminology of Engle and Granger (1987), since a linear combination of the variables is stationary. One simple test for cointegration when the cointegrating vector is known, discussed for example in Hamilton (1994, 582-86), is based on a Dickey-Fuller regression. In our context, we run the regression

[DELTA][S.sub.t] = [a.sub.0] + [a.sub.1][DELTA][S.sub.t-1] + [a.sub.2] [DELTA][S.sub.t-2] + .... [a.sub.p][DELTA][S.sub.t-p] + f [S.sub.t-1] + [e.sub.St].

As above, we take the null hypothesis to be that the spread is nonstationary, but that there is no deterministic trend in the level of the spread. The alternative of stationarity (cointegration) is a negative value of f; the value of [a.sub.0] then captures the non-zero mean of the spread. The results in Table 2 show that we can reject the null at a high critical level: the value of the Dickey-Fuller F-statistic is 9.67, which exceeds the 5 percent critical level of 4.59.

Thus, we tentatively take the short-term and long-term interest rate to be cointegrated, but we will later conduct a more powerful test of cointegration. The regression results in Table 2 also highlight the fact that the spread is more predictable from its own past than are either of its components. In the unconstrained regression, 16 percent of month-to-month changes in the spread can be forecast from past values.

Cointegration of short-term and long-term interest rates is a formal version of the second stylized fact above: there is comovement of short and long rates despite their shifting levels. It is based on the third stylized fact: the spread appears relatively stationary although it is variable through time.

2. THE EXPECTATIONS THEORY

The dominant economic theory of the term structure of interest rates is called the expectations theory, as it stresses the role of expectations of future short-term interest rates in the determination of the prices and yields on longer-term bonds. There are a variety of statements of this theory in the literature that differ in terms of the nature of the bond which is priced and the factors that enter into pricing. We make use of a basic version of the theory developed in Shiller (1972) and used in many subsequent studies. (10) This version is suitable for empirical analyses of yields on long-term coupon bonds such as those that we study, since it delivers a simple linear formula for long-term yields. The derivation of this formula, which is reviewed in Appendix A, is based on the assumption that investors equate the expected holding period yield on long-term bonds to the short-term interest rate [R.sub.t], plus a time-varying excess holding period return [k.sub.t], which is not described or restricted by the model but could represent variation in risk premia, liquidity premia and so forth. It is based on a linear approximation to this expected holding period condition that neglects higher order terms. More specifically, the theory indicates that

[R.sup.L.sub.t] = [beta][E.sub.t][R.sup.L.sub.t+1] + (1 - [beta])([R.sub.t] + [k.sub.t]) (1)

where [beta] = 1/(1 + [R.sup.L]) is a parameter based on the mean of the long-term interest rate around which the approximation is taken. (11)

This expectational difference equation can be solved forward to relate the current long-term interest rate to a discounted value of current and future R and k:

[R.sup.L.sub.t] = (1 - [beta]) [summation over ([infinity]/j=0)][[beta].sup.j][[E.sub.t][R.sub.t+j] + [E.sub.t][k.sub.t+j]]. (2)

Various popular term-structure theories can be accommodated within this framework. The pure expectations theory occurs when there are no k terms, so that [R.sup.L.sub.t] = (1 - [beta]) [summation over ([infinity]/j=0)][[beta].sup.j][E.sub.t][R.sub.t+j]. This is a useful form for discussing various propositions about long-term and short-term interest rates that also arise in richer theories.

Implication for permanent changes in interest rates: Notably, the pure expectations theory predicts that if interest rates increase at date tin a manner which agents expect to be permanent, then there is a one-for-one effect of such a permanent increase on the level of the long rate because the weights sum to one, i.e., (1 - [beta]) [summation over ([infinity]/j=0)][[beta].sup.j] = (1 - [beta])/(1 - [beta]) = 1. This is a basic and important implication of the expectations theory long stressed by analysts of the term structure and that appears capable of potentially explaining the comovement of short-term and long-term interest rates that we discussed above.

Implications for temporary changes in interest rates: Temporary changes in interest rates have a smaller effect under the pure expectations theory, with the extent of this effect depending on how sustained the temporary changes are assumed to be. Supposing that the short-term interest rate is governed by the simple autoregressive process [R.sub.t] = [rho][R.sub.t-1] + [e.sub.Rt] with the error term being unforecastable, it is easy to see that E [R.sub.t+j] = [[rho].sup.j][R.sub.t]. It follows that a rational expectations solution for the long-term rate is

[R.sup.L.sub.t] = (1 - [beta]) [summation over ([infinity]/j=0)][[beta].sup.j][E.sub.t][R.sub.t+j]

= (1 - [beta]) [summation over ([infinity]/j=0)][[beta].sup.j] [[rho].sup.j] [R.sub.t] = 1 - [beta]/1 - [[beta][rho] [R.sub.t] = [theta][R.sub.t].

This solution can be used to derive implications for temporary changes in short rates. If these are completely transitory, so that [rho] = 0, there is a minimal effect on the long rate, since [theta] = 1 - [beta] [approximately equal to] 0.005. On the other hand, as the changes become more permanent ([rho] approaches one) the [theta] coefficient approaches the one-for-one response previously discussed as the implication for fully permanent changes in the level of rates. Accordingly, the response of the long rate under the expectations theory depends on the degree of persistence that agents perceive in short-term interest rates. A property that Mankiw and Miron (1986) and Watson (1999) have exploited to derive interesting implications of the term structure theory tat accord with various changes in the patterns of short-term and long-term interest rates in diffeent periods of U.S. history.

The spread as an indicator of future changes: There has been much interest in the idea that the expectations theory implies that the long-sort spread is an indicator of future changes in short-term interest rates. With a little bit of algebra, as in Campbell and Shiller (1987), we can rewrite (2) as

[R.sup.L.sub.t] - [R.sub.t] = (1 - [beta]) [summation over ([infinity]/j=0)] [[beta].sup.j][([E.sub.t][R.sub.t+j] - [R.sub.t])] = [summation over ([infinity]/j=1)][[beta].sub.j][E.sub.t][DELTA][R.sub.t+j],

when there are no term premia.(12) Hence, the spread is high when short-term interest rates are expected to increase in the future, and it is low when they are expected to decrease. Further, permanent changes in the level of short-term interest rates, such as those considered above, have no effect on the spread because they do not imply any expected future changes in interest rates.

While these three implications can easily be derived under the pure expectations theory, they carry over to other more general theories so long as the changes in interest rates do not effect (1 - [beta]) [summation over ([infinity]/j=0)][[beta].sub.j][E.sub.t][k.sub.t+j] in (2). Further, while the pure expectations theory is a useful expository device, it is simply rejected: one of the stylized facts is that long rates are generally higher than short rates (there is a positive average value to the term spread). For this reason, all empirical studies of the effects of expectations on the long rate minimally use a modified form

[R.sup.L.sub.t] = (1 - [beta]) [summation over ([infinity]/j=0)][[beta].sup.j] [E.sub.t][R.sub.t+j] + K,

where K is a parameter capturing the average value of the term spread that comes from assuming that [k.sub.t] is constant. (13)

The Efficient Markets Test

As exemplified by the work of Roll (1969), one strategy is to derive testable implications of the expectations theory that (i) do not require making assumptions about the nature of the information set that market participants use to forecast future interest rates and that (ii) impose restrictions on a single linear equation. In the current setting, such an efficient markets test is based on manipulating (1) so as to isolate a pure expectations error, [R.sup.L.sub.t] = 1/[beta][R.sup.L.sub.t-1] - (1-[beta]/[beta])([R.sub.t-1] + K)+[[xi].sub.t], where [[xi].sub.t] = [R.sup.L.sub.t] - [E.sub.t-1][R.sup.L.sub.t]. As in Campbell and Shiller (1987, 1991), this condition may be usefully reorganized to indicate that the long-short spread (and only the spread) should forecast long-rate changes,

[R.sup.L.sub.t] - [R.sup.L.sub.t-1] = (1/[beta] - 1)([R.sup.L.sub.t-1] - [R.sub.t-1] - K) + [[xi].sub.t],

which is a form that is robust to nonstationarity in the interest rate.

The essence of efficient markets tests is to determine whether any variables that are plausibly in the information set of agents at time t - 1 can be used for predict [[xi].sub.t] = [R.sup.L.sub.t] - [R.sup.L.sub.t-1] - (1/[beta] - 1)([R.sup.L.sub.t-1] - [R.sub.t-1] - K). The forecasting relevance of any stationary variable can be tested with a standard t-statistic and the relevance of any group of p stationary variables can be tested by a likelihood ratio test, which has an asymptotic [[chi square].sub.p] distribution. Table 3 reports a battery of such efficient markets tests. The first regression simply is a benchmark, relating [R.sup.L.sub.t] - [R.sup.L.sub.t-1] to a constant and to (1/[beta] - 1)[S.sub.t-1] in the manner suggested by the efficient markets theory. The second regression frees up the coefficient on [S.sub.t-1] and finds its estimated value to be negative rather than positive. The t-statistic for testing the hypothesis that the coefficient equals (1/[beta] - 1) = 0.005 takes on a value of 2.345, which exceeds the standard 95 percent critical level. This finding has been much discussed in the context of long-term bonds and some other financial assets, in that financial markets spreads have a "wrong-way" influence on future changes relative to the predictions of basic theory. (14) At the same time, the low [R.sup.2] of 0.0051 indicates that the prediction performance of the regression is very modest.

Additional evidence against the efficient markets view comes when lags of short-rate changes and lags of long-rate changes or both are added to the above equation. As regressions 3 through 5 in Table 3 show, the estimated coefficient on [S.sub.t-1] remains significantly different from its predicted theoretical value. Furthermore, the prediction performance remains small (the [R.sup.2] is less than 10 percent for all the cases) and the F-tests reported at the bottom of the table indicate that adding lagged variables does not significantly increase the explanatory power compared to the original efficient markets regression. (15)

The efficient markets regression again highlights that there is a substantial amount of unpredictable variation in changes in long bond yields, which makes it difficult to draw strong conclusions about the nature of predictable variations in these returns. (16) One measure of the degree of this unpredictable variation is presented in panel B of Figure 2, where there is a very smooth and apparently quite flat line that is labelled as the "predicted changes in long rates." Those predicted changes are (1/[beta] - 1)([R.sup.L.sub.t-1] - [R.sub.t-1]) with a value of [beta] suggested by the average level of long rates over our sample period. Panel B of Figure 2 highlights the fact that the expectations theory would explain only a tiny portion of interest rate variation if it were exactly true. Sargent (1979) refers to this as the "near-martingale property of long-term rates" under the expectations hypothesis. But it would not look very different if the fitted values of the other specifications in Table 3 were emplo yed. Changes in the long rate are quite hard to predict and their predictable components are inconsistent with the efficient markets hypothesis.

Where Do We Go from Here?

Given that the efficient markets restriction is rejected, some academics simply conclude we know nothing about the term structure. (17) However, central bankers and other practitioners actually do seem to employ the expectations theory as a useful yet admittedly imperfect device to interpret current and historical events (examples in this review are Dotsey [1998], Goodfriend [1993], and Owens and Webb [2001]). In the remainder of this analysis, we recognize that the expectations theory is not true but instead of simply rejecting it, we use modern time series methods to understand the dimensions along which it appears to succeed and those along which it does not. Section 3 develops and tests the common stochastic trend/cointegration restrictions that the expectations theory imposes. Consistent with earlier studies, we find that U.S. data do not allow us to reject these restrictions and, thus, that the theory appears to contain an important element of truth as far as the common stochastic trend implication is concerned. Section 4 then follows Sargent (1979) in developing and testing a variety of cross-equation restrictions that the expectations theory implies. These restrictions are rejected in the data. Finally, in Section 5, we build on the approach by Campbell and Shiller (1987) to extract estimates of changes in market expectations, which also allows us to extract estimates of time-variation in term premia.

3. COINTEGRATION AND COMMON TRENDS

A basic implication of the expectations theory is that an unexpected and permanent change in the level of short rates should have a one-for-one effect on the long rate. In other words, the theory implies that there is a common trend for the short and the long rate. This idea can be developed further using the concept of cointegration and related methods can be used to estimate the common trend.

The starting point is Campbell and Shiller's (1987) observation that present value models have cointegration implications, if the underlying series are nonstationary in levels, and that these implications survive the introduction of stationary deviations from the pure expectations theory such as time-varying term premia. In the context of the term structure, we can rewrite the long-rate equation (2) as

[R.sup.L.sub.t] - [R.sub.t] = (1 - [beta]) [summation over ([infinity]/j=0)] [[beta].sup.j][([E.sub.t][R.sub.t+j] - [R.sub.t]) + [E.sub.t][k.sub.t+j]] (3)

= [summation over ([infinity]/j=1)] [[beta].sup.j] [E.sub.t][DELTA][R.sub.t+j] + (1 - [beta]) [summation over ([infinity]/j=0)] [[beta].sup.j] [E.sub.t][k.sub.t+j] (4)

so that the expectations theory stipulates that the spread is stationary so long as (i) first differences of short rates are stationary and (ii) the expected deviations from the pure expectations theory are stationary. Thus, cointegration tests are one way of assessing this implication of the theory.

In Section 1, we found evidence against the hypothesis that the spread contains a unit root and suggested that a stationary spread was a better description of the U.S. data. That is, we found some initial evidence consistent with modeling the short rate and the long rate as cointegrated. Here, in Section 3, we confirm that the spread also passes a more rigorous cointegration test. Given this result, we then define and estimate the common stochastic trend for the short rate and the long rate. We also present an easy-to-use rule of thumb that decomposes fluctuations of the short and the long rate into fluctuations in the common trend and fluctuations in the temporary components.

Testing for Cointegration

To develop the intuition behind the more rigorous cointegration tests, consider a vector autoregression (VAR) in the first difference of the short rate and the first difference of the long rate:

[DELTA][R.sub.t] = [summation over (p/j=1)][a.sub.i][DELTA][R.sup.L.sub.t-i] + [summation over (p/j=1)][b.sub.i][DELTA][R.sub.t-i] + [e.sub.Rt], (5)

[DELTA][R.sup.L.sub.t] = [summation over (p/j=1)][c.sub.i][DELTA][R.sup.L.sub.t-i] + [summation over (p/j=1)][d.sub.i][DELTA][R.sub.t-i] + [e.sub.Lt]. (6)

By virtue of the Wold decomposition theorem, we may be tempted to believe that such a VAR in first differences can approximate the dynamics of short- and long-rate changes arbitrarily well, so long as the vector [DELTA][x.sub.t] = [[DELTA][R.sub.t] [DELTA][R.sup.L.sub.t]] ii is a stationary stochastic process (this last condition being asserted by the Dickey-Fuller tests of the last section). However, if the two variables [R.sub.t] and [R.sup.L.sub.t] are also cointegrated, then this argument breaks down. The above VAR represents a poor approximation in such circumstances because the short and long rate only contain one common stochastic trend and first differencing both variables thus deletes useful information. (18)

However, as Engle and Granger (1987) demonstrate, if first differences of [x.sub.t] are stationary and there is cointegration among the variables of the form [alpha][x.sub.t], then there always exists an empirical specification relating [DELTA][x.sub.t], its lags [DELTA][x.sub.t-p], and [alpha][x.sub.t-1] that describes the dynamics of [DELTA][x.sub.t] arbitrarily well. Such a system of equations is called a vector error correction model (VECM). In our context, if [R.sub.t] and [R.sup.L.sub.t] are cointegrated, as under the weak form of the expectation theory discussed above, then the following VECM should provide a better description of the dynamics of [DELTA][x.sub.t] than the VAR in (5) and (6):

[DELTA][R.sub.t] = [summation over (p/j=1)][a.sub.i][DELTA][R.sup.L.sub.t-i] + [summation over (p/j=1)][b.sub.i][DELTA][R.sub.t-i] + f[[S.sub.t-1] - K] + [e.sub.Rt], (7)

[DELTA][R.sup.L.sub.t] = [summation over (p/j=1)][c.sub.i][DELTA][R.sup.L.sub.t-i] + [summation over (p/j=1)][d.sub.i][DELTA][R.sub.t-i] + g[[S.sub.t-1] - K] + [e.sub.Lt]. (8)

In these equations, f and g capture the effects of the lagged spread on forecastable variations in the short and long rates; K is the mean value of the spread.

To test for cointegration, we estimate both the VAR and the VECM and compare their respective fit. A substantial increase in the log likelihood of the VECM over the VAR signals that the cointegration terms aid in the prediction of interest rate changes. More specifically, a large likelihood ratio results in a rejection of the null hypothesis in favor of the alternative of cointegration. In particular, we follow the testing procedure by Horvath and Watson (1995) and assume a priori that the cointegrating relationship is given by the spread [S.sub.t] = [R.sup.L.sub.t] - [R.sub.t] rather than estimating the cointegrating vector. (19) Table 4 reports estimates of the VAR and VECM models for the lag length of p = 4, which we choose as the reference lag length throughout. Before discussing the cointegration test results in detail, it is worthwhile looking at a few elements that the VAR and VECM regressions have in common. First, changes in short rates are somewhat predictable from past changes in short rates, as wa s previously found with the Dickey-Fuller regression in Table 2. In addition, past changes in long rates are important for predicting changes in short rates in both the VAR and the VECM. (20) Finally, changes in short rates are predicted by the lagged spread: if the long rate is above the short rate, then short rates are predicted to rise. Second, changes in long rates are still fairly hard to predict with either the VAR or the VECM.

Moving to the cointegration test, the likelihood ratio between the VECM and the VAR equals 2 * ([L.sub.VECM] - [L.sub.VAR]) = 27.67, which exceeds the 5 percent critical level of 6.28 calculated by the methods of Horvath and Watson (1995). (21) In other words, we can comfortably reject the hypothesis of no cointegration between [R.sub.t] and [R.sup.L.sub.t], which is consistent with earlier studies and reinforces the statistical support for the common trend implication of the expectations theory. Therefore, the data is consistent with the basic implication of cointegration of the expectations theory and we thus view the VECM as the preferred specification and assume cointegration for the remainder of our analysis. (22)

Uncovering the Common Stochastic Trend

A key implication of cointegration in our context is that the short and long rates share a common stochastic trend, which we will now work to uncover. (23) Following Beveridge and Nelson (1981), the stochastic trend of a single series such as the short-term interest rate is defined as the limit forecast [R.sub.t] = [lim.sub.k[right arrow][infinity]] [E.sub.t][R.sub.t+k], or equivalently

[R.sub.t] = [R.sub.t-1] + [lim.sub.k[right arrow][infinity]] [summation over (k/j=0)][E.sub.t][DELTA][R.sub.t+k]. (9)

However, in order to obtain a series of [R.sub.t], we need to take a stand on how to compute the [E.sub.t][DELTA][R.sub.t+k] terms. The VECM suggests a straight-forward way to do so. Specifically, suppose that the system expressed by equations (7) and (8) is written in the form

[z.sub.t] = [FORMULA NOT REPRODUCIBLE IN ASCII] = H[x.sub.t]

[x.sub.t] = M[x.sub.t-1] + G[e.sub.t],

where [e.sub.t] is the vector of one-step-ahead forecast errors [e.sub.t] = [[e.sub.Rt] [e.sub.Lt]] and [x.sub.t] = [[DELTA][R.sub.t] [DELTA][R.sup.L.sub.t] [DELTA][R.sub.t-1] [DELTA][R.sup.L.sub.t-1] ... [DELTA][R.sub.t-(p-1)] [DELTA][R.sup.L.sub.t-(p-1)] [S.sub.t]] is the vector of information that the VECM identifies as useful for forecasting future spreads and interest rate changes. The matrix H simply selects the elements of [x.sub.t], and the elements of M and G depend on the parameter estimates {a, b, c, d, f, g} in a manner spelled out in Appendix B.

Given this setup, forecasts of [DELTA][R.sub.t+k] conditional information on [x.sub.t] are easily computed as

E[[DELTA][R.sub.t+k]/[x.sub.t]] = [h.sub.R]E[[z.sub.t+k]/[x.sub.t]] = [h.sub.R]H E[[x.sub.t+k]/[x.sub.t]] = [h.sub.R]H [M.sup.k][x.sub.t],

where [h.sub.R] = [1 0 0] such that [DELTA][R.sub.t] = [h.sub.R][z.sub.t]. Mapping these forecasts of [DELTA][R.sub.t+k] into (9), we obtain a closed-form solution for the stochastic trend of the short rate:

[R.sub.t] = [R.sub.t-1] + [summation over ([infinity]/k=0)][h.sub.R]H [M.sup.k][x.sub.t] = [R.sub.t-1] + [h.sub.R]H[[I - M].sub.-1][x.sub.t].

The same procedure for computing multiperiod forecasts also provides a recipe for computing the stochastic trend in the long rate, that is,

[R.sup.-L.sub.t] = [R.sup.L.sub.t-1] + [lim.sub.k[right arrow][infinity]][summation over (k/j=0)][E.sub.t][DELTA][R.sup.L.sub.t+k]

= [R.sup.L.sub.t-1] + [summation over ([infinity]/k=0)] [h.sub.L]H [M.sup.k] [x.sub.t] = [R.sup.L.sub.t-1] + [h.sub.L]H[[I - M].sup.-1] [x.sub.t],

where [h.sub.L] = [0 1 0] such that [DELTA][R.sup.L.sub.t] = [h.sub.L][z.sub.t]. Finally, the difference between [R.sup.L.sub.t] and [R.sub.t] is the limit forecast of the spread. By definition of cointegration, the spread is stationary and therefore its limit forecast must be a constant: (24)

K = [lim.sub.k[right arrow][infinity]] [E.sub.t][S.sub.t+k] = [lim.sub.k[right arrow][infinity]] [E.sub.t][R.sup.L.sub.t+k] - [lim.sub.k[right arrow][infinity]] [E.sub.t][R.sub.t+k] = [R.sup.L.sub.t] - [R.sub.t].

Thus, the trends for the long rate and the short rate differ only by the constant K: in other words, the long rate and the short rate have a common stochastic trend component. Since this is sometimes termed the permanent component, deviations from it are described as temporary components. Using this language, the temporary component of the short rate is [R.sup.t] - [R.sup.t] and that of the long rate is [R.sup.L.sub.t] - [R.sup.L.sub.t].

A Stochastic Trend Estimate: 1951-2001

Figure 3 shows the common stochastic trend in long and short rates based on the VECM from Table 3, constructed using the method that we just discussed. In line with the expectations theory, we interpret this stochastic trend as describing permanent changes in the level of the short rate, which are reflected one-for-one in the long rate.

Short rates and the stochastic trend: In panel A, we see that the short rate fluctuates around its stochastic trend. There are some lengthy periods, such as the mid-1960s, where the short rate is above the stochastic trend for a lengthy period and others, such as the mid-1990s, where the short rate is below the stochastic trend. The vertical distance is a measure of the temporary component to short rates, which we will discuss in greater detail further below.

Long rates and the stochastic trend: In panel B, we see that the long rate and the stochastic trend correspond considerably more closely. This result accords with a very basic implication of the expectations theory: long rates should be highly responsive to permanent variations in the short-term interest rate. (25)

Variance Decompositions

It is useful to consider a decomposition of the variance of short-rate and long-rate changes into contributions in terms of changes in the temporary and permanent components. For the short-rate changes, since var([DELTA][R.sub.t]) = var([DELTA][R.sub.t] + [DELTA]([R.sub.t] - [R.sub.t])), this decomposition takes the form

var([DELTA][R.sub.t]) = var([DELTA][R.sub.t]) + var([DELTA]([R.sub.t] - [R.sub.t]))

+2 * cov([DELTA][R.sub.t], [DELTA]([R.sub.t] - [R.sub.t]))

0.656 = 0.105 + 0.544 + 2 * (0.004)

with the last line drawn from the first panel of Table 5. (26)

The variance of month-to-month changes in interest rates is 0.66. Changes in the temporary component account for the great bulk (82.9 percent) of this variance, while the variance of changes in the permanent component contributes 15.9 percent and the covariance between the two components contributes only about 1.2 percent.

For the long rate, the decomposition takes conceptually the same form, but we find a very different result in terms of relative contributions:

var([DELTA][R.sup.L.sub.t]) = var([DELTA][R.sup.L.sub.t]) + var ([DELTA]([R.sup.L.sub.t] - [R.sup.L.sub.t]))

+2 * cov([DELTA][R.sup.L.sub.t], [DELTA]([R.sup.L.sub.t] - [R.sup.L.sub.t]))

0.083 = 0.104 + 0.027 + 2 * (-0.024)

First, the overall variance of month-to-month changes in the long rate is much smaller. In contrast to the short rate, this variance is dominated by the variance in its permanent component, which is actually somewhat larger because there is a negative correlation between the permanent and the transitory component.

The permanent-temporary decomposition also permits us to undertake a decomposition of the long-short spread, which is displayed in Figure 4. The spread and the two temporary components are connected via the identity

[S.sub.t] - S = [R.sup.L.sub.t] - [R.sub.t] - S = ([R.sup.L.sub.t] - [R.sup.L.sub.t]) - ([R.sub.t] - [R.sub.t]).

Hence, there is a mechanical inverse relationship between the spread and the temporary component of the short rate, which is clearly evident in panel A of Figure 4: everything else equal, whenever the short-term rate is high relative to its permanent component, the spread is low on this account. We can undertake a similar decomposition of the variance of the spread to those used above,

var([S.sub.t]) = var([R.sup.L.sub.t] - [R.sup.L.sub.t]) + var([R.sub.t] - R) -2 * cov(([R.sup.L.sub.t] - [R.sup.L.sub.t]), ([R.sub.t] - [R.sub.t]))

1.93 = 0.18 + 0.98 - 2 * (-0.38).

According to this expression, there is a variance of 1.93. Of this, 51 percent is attributable to the variability of the temporary of the short rate, 9 percent is attributable to the temporary component of the long rate, and a substantial amount (39 percent) is attributable to the covariance between these two expressions. (27)

Simple Rules of Thumb

Suppose that we observe just the change in the long rate and want to know how much of a change has taken place in the permanent component. Our variance decompositions let us provide an answer to this and related questions below. Specifically, we derive a simple rule of thumb as follows. First define the change in the permanent component as an unobserved zero-mean variable [Y.sub.t]. This variable is known to be connected to the observed zero-mean variables [DELTA][R.sup.L.sub.t] according to the identity [Y.sub.t] = [DELTA][R.sup.L.sub.t] + [U.sub.t], where [U.sub.t] is an error. Then we can ask the question: What is the optimal linear estimate of [Y.sub.t] given the observed series [DELTA][R.sup.L.sub.t]? To calculate this measure, [Y.sub.t] = b[DELTA][R.sup.L.sub.t], we minimize the expected squared errors, var([Y.sub.t] - [Y.sub.t]) = var([Y.sub.t]) + [b.sup.2]var([DELTA][R.sup.L.sub.t]) - 2bcov([Y.sub.t], [DELTA][R.sup.L.sub.t]). The optimal value of b is the familiar OLS regression coefficient

b = cov([Y.sub.t], [DELTA][R.sup.L.sub.t])/var([DELTA][R.sup.L.sub.t]).

Using our estimates of the common stochastic trend, we compute that the variance of long-rate changes is 0.0826 and that the covariance of long-rate and permanent component changes is 0.0802 (see second panel of Table 5). Thus, the coefficient b takes on a value of 0.97, which leads to the following rule of thumb.

Long-rate rule of thumb: If a 1 percent rise (fall) in the long rate occurs, then our calculations suggest that an observer should increase (decrease) his or her estimate of the permanent component by 97 percent of this rise (fall) (28)

A similar rule of thumb can be derived by linking changes in the unobserved temporary component of the short rate ([R.sub.t] - [R.sub.t]) to the spread. (29)

Spread rule of thumb #1: If the spread exceeds its mean by 1 percent, then our estimates suggest that the temporary component of short-term interest rates is low by -0.71 percent (-0.71 = (-1.37)/(1.93)).

Our two rules of thumb indicate that changes in the long rate are dominated by changes in the permanent component and the level of the spread (relative to its mean) is substantially influenced by the temporary component.

4. RATIONAL EXPECTATIONS TESTS

A hallmark of rational expectations models of the term structure, stressed by Sargent (1979), is that they impose testable cross-equation restrictions on linear time series models. In this section, we describe the strategy behind rational expectations tests along the lines of Sargent (1979) and Campbell and Shiller (1987); we also discuss how to extend the tests to accommodate time-varying term premia. We then implement these tests and find that there is a broad rejection of the rational expectations restrictions that we trace to divergent forecastability of the spread and changes in short-term interest rates.

A Simple Reference Model

To illustrate the nature of the cross-equation restrictions that the expectations theory imposes and to motivate the ensuing discussion of rational expectations tests, consider the following simple model. Suppose that the short-term interest rate is governed by

[R.sub.t] = [[tau].sub.t] + [x.sub.t],

where [[tau].sub.t] is a relatively persistent permanent component that we model as a unit root process and [x.sub.t] is a relatively less persistent temporary component. In addition, suppose that agents observe [[tau].sub.t] and [x.sub.t] separately and also understand that these evolve according to

[[tau].sub.t] = [[tau].sub.t-1] + [e.sub.[tau],t]

[x.sub.t] = [rho][x.sub.t-1] + [e.sub.x,t],

with -1 < [rho] < 1 and with [e.sub.[tau]t], [e.sub.xt] being white noises. Suppose also that the expectations theory holds true. Using equation (2) and setting (1 - [beta]) [summation over ([infinity]/j=0)] [[beta].sup.j] [E.sub.t][k.sub.t+j] = K = 0 for all t, the dynamics of the long rate can thus be described as (30)

[R.sup.L.sub.t] = (1 - [beta]) [summation over ([infinity]/j=0)] [[beta].sup.j] [E.sub.t][R.sub.t+j] (10)

= (1 - [beta]) [summation over ([infinity]/j=0)] [[beta].sup.j] [E.sub.t] [[[tau].sub.t+j] + [x.sub.t+j]] = [[tau].sub.t] + [theta][x.sub.t],

where [theta] = (1 - [beta])/(1 - [beta][rho]) < 1 since [rho] < 1 as in Section 2 above. Finally, notice that the spread by definition takes the form

[S.sub.t] = [R.sup.L.sub.t] - [R.sub.t] = ([theta] - 1) [x.sub.t],

which implies that under the expectations theory, the spread is a perfect negative indicator of the temporary component of short-term interest rates.

Cross-equation restrictions on a stationary VAR system: By assuming a unit root component [[tau].sub.t] in the short rate and the expectations theory being true, we determined above that both the short rate and the long rate in our reference model are stationary in first differences rather than levels. We therefore follow Campbell and Shiller (1987) and study the bivariate system in short-rate changes,

[DELTA][R.sub.t] = [DELTA][[tau].sub.t] + [DELTA][x.sub.t] = [e.sub.[tau],t] + [e.sub.x,t] + ([rho] - 1) [x.sub.t-1]

= [e.sub.[tau],t] + [e.sub.x,t] + [rho] - 1/[theta] - 1 [S.sub.t-1] = [e.sub.[tau],t] + [e.sub.x,t] 1 - [beta][rho]/[beta] [S.sub.t-1],

and in the spread,

[S.sub.t] = ([theta] - 1)[x.sub.t] = [rho][S.sub.t-1] + ([theta] - 1) [e.sub.x,t].

Both of these variables are stationary, which has the advantage that testable restrictions are easier to develop in the presence of time-varying, but stationary, term premia. (31)

As stressed by Sargent (1979), the expectations theory imposes cross-equation restrictions. In the case of [DELTA][R.sub.t] and [S.sub.t], these restrictions become immediately apparent when we compare the two model equations above to an unrestricted bivariate, first order vector autoregression:

[DELTA][R.sub.t] = a[DELTA][R.sub.t-1] + b[S.sub.t-1] + [e.sub.[DELTA]R,t].

[S.sub.t] = c[DELTA][R.sub.t-1] + d[S.sub.t-1] + [e.sub.S,t].

In particular, we see that the expectations theory imposes a = c = 0, b = (1 - [beta][rho])/[beta], d = [rho], and [e.sub.[DELTA]R,t] = [e.sub.[tau],t] + [e.sub.x,t], [e.sub.S,t] = ([theta] - 1)[e.sub.x,t].(32) In our econometric analysis below, we will focus on deriving and testing similar restrictions for a more general rational expectations framework that contains the assumption of agents having more information than the econometrician (33)

Restrictions on a VAR Model

For the purpose of testing the cross-equation restrictions in the data, we adopt a general strategy initially put forth by Sargent (1979). Following Campbell and Shiller (1987), we consider a bivariate VAR in the short-rate change and the spread: (34)

[DELTA][R.sub.t] = [summation over (p/i=1)][a.sub.i][DELTA][R.sub.t-i] + [summation over (p/i=1)][b.sub.i][S.sub.t-i] + [e.sub.[DELTA]R,t]. (11)

[S.sub.t] = [summation over (p/i=1)][c.sub.i][DELTA][R.sub.t-i] + [summation over (p/i=1)][d.sub.i][S.sub.t-i] + [e.sub.S,t]. (12)

In this section, we work under the assumption that the expectations theory is exactly true, which we relax later. Under this condition, term premia are constant and the expression for the spread in (4) reduces to (35)

[S.sub.t] = [summation over ([infinity]/j=1)] [[beta].sup.j] [E.sub.t] [DELTA] [R.sub.t+j], (13)

as we saw in Section 3 above. This expression is important for two reasons. First, it says that according to the expectations theory the spread is simply the discounted sum of future expected short-rate changes. Second, in terms of econometrics, it reveals that as long as short rates are stationary in first differences, the spread must be stationary as well.

The derivation of testable restrictions that (13) imposes on (11) and (12) has four key ingredients. First, the law of iterated expectations implies that for any information set [[omega].sub.t] which is a subset of the market's information set [[OMEGA].sub.t],

E[E.sub.t][DELTA][R.sub.t+j]\[[omega].sub.t] = E[E [DELTA] [R.sub.t+j]\[[OMEGA].sub.t]]\[[omega].sub.t] = E[[DELTA] [R.sub.t+j]\[[omega].sub.t]].

Practically, this says that an econometrician's best estimate of market expectations of future short-rate changes, given a data set [[omega].sub.t], is equal to the econometrician's forecast of these short-rate changes given his or her data. Thus, under the assumption that the expectations theory is exactly true and using the fact that the current spread is in the information set, we can rewrite (13) as

[S.sub.t] = [summation over ([infinity]/j=1)] [[beta].sup.j] E[[DELTA] [R.sub.t+j]\[[omega].sub.t]]

so that the spread formula is unchanged when the information set is reduced. (36)

Second, the Wold decomposition theorem guarantees that if [DELTA] [R.sub.t] is stationary, it can be well described by a vector autoregression (possibly of infinite order p) where the explanatory variables are composed of information [[OMEGA].sub.t-1] available to the market at date t - 1.

Third, since we want to derive restrictions on the bivariate system composed of (11) and (12), we define the data set [[omega].sub.t] as p lags of [DELTA]R and S each. (37) The econometrician's best linear one-period forecast of short-rate changes thus becomes E[[DELTA][R.sub.t+1]\[[omega].sub.t]] = [h.sub.[DELTA]R]E[[omega].sub.t+1]\[[omega].sub.t]] = [h.sub.[DELTA]R]M[[omega].sub.t], where [h.sub.[DELTA]R] is a selection vector equaling [10...0] and where M is the companion matrix corresponding to (11) and (12), written in first order form as [[omega].sub.t] = M[[omega].sub.t-1] + [e.sub.t]; i.e.:

[FORMULA NOT REPRODUCIBLE IN ASCII] (14)

Fourth, given [[omega].sub.t] = M[[omega].sub.t-1] + [e.sub.t], multiperiod linear predictions of short-rate changes are easy to form:

E[[DELTA][R.sub.t+j]\[[omega].sub.t] = [h.sub.[DELTA]R][M.sup.j][[omega].sub.t].

Mapping these forecasts into [S.sub.t] = [summation over ([infinity]/j=1)][[beta].sup.j]E[[DELTA][R.sub.t+j]\[[omega].sub.t]] and expressing [S.sub.t] = [h.sub.S][[omega].sub.t] where [h.sub.S] is a selection vector with a one in the position corresponding to the spread and zeros elsewhere, we finally derive:

[h.sub.S][[omega].sub.t] = [summation over ([infinity]/j=1)][[beta].sup.j][h.sub.[DELTA]R][M.sub.j][[omega].sub. t] = [h.sub.[DELTA]R]M[[I - [beta]M].sup.-1][[omega].sub.t],

or equivalently:

[h.sub.s] = [h.sub.[DELTA]R][beta]M[[I - [beta]M].sup.-1]. (15)

Expression (15) represents a set of 2p cross-equation restrictions that the expectations theory imposes on the bivariate VAR system and that are sometimes called the hallmark of rational expectations models. Specifically, (11) and (12) contain 4p parameters [{[a.sub.i]}.sup.p.sub.i=1], [{[b.sub.i]}.sup.p.sub.i=1], [{[c.sub.i]}.sup.p.sub.i=1] and [{[d.sub.i]}.sup.p.sub.i=1]. However, under the null that the expectations theory holds true, only 2p of these parameters are free while the remaining half is constrained by the cross-equation restrictions in (15). (38)

Working with the same vector autoregression in short-rate changes and the spread, Campbell and Shiller (1987) test such rational expectations restrictions on U.S. data between 1959 and 1983 by means of a Wald test and conclude that the expectations theory is strongly rejected. Alternatively, Sargent (1979) advocates assessing the expectations theory by means of a likelihood ratio test with an asymptotic chi-square distribution, which is the approach that we follow here. The likelihood ratio is 2[[L.sub.UVAR] - [L.sub.ETVAR]], that is, the difference between the log likelihood values of the unrestricted VAR and the VAR subject to the restriction in (15), respectively. For a given significance level, the restrictions are then rejected if the likelihood ratio is larger than the critical chi-square value for 2p degrees of freedom.

Table 6 reports the unrestricted and the restricted VAR estimates for our 1951-2001 sample using our reference lag length of p = 4. (39) Remarkably, none of the restricted point estimates differ by more than two standard errors from their unrestricted counterparts. (40) However, the computed likelihood ratio of 35.71 is larger than the critical 0.1 percent chi-square value of 26.1. Our data set thus comfortably rejects the restrictions imposed by the expectations theory, confirming Campbell and Shiller's result over a substantially longer time period and using a more appropriate testing procedure. (41)

Time-Varying Term Premia

The restrictions in (15) are derived from the strong assumption that the expectations theory is exactly true up to term premia that are constant through time, which precludes even measurement error in the spread. Alternatively, we can adapt the testing approach discussed above and derive testable restrictions that allow for certain forms of time-variation in the term premia. To this end, reconsider the general formula (4) that links the long rate to the present value of future expected short rates and the expected term premia. Without imposing any restrictions, the spread can thus be expressed as the sum of two unobserved components:

[S.sub.t] = [F.sub.t] + [K.sub.t], (16)

where [F.sub.t] = [summation over ([infinity]/j=1)] [[beta].sup.j] E[[DELTA][R.sub.t+j]\[[OMEGA].sub.t]] and [K.sub.t] = (1 - [beta] [summation over ([infinity]/j=0)] [[beta].sup.j] E[[k.sub.t+j]\[[OMEGA]].sub.t]] denote the present value of the market's expectations about future short-rate changes and term premia, respectively. Combining this expression with the VAR framework [[omega].sub.t] = M [[omega].sub.t-1] + [e.sub.t], we can rewrite (16) as

[S.sub.t] = E[[F.sub.t]\[[omega].sub.t]] + [K.sub.t] + [[xi].sub.t],

where [[xi].sub.t] = [F.sub.t] - E[[F.sub.t]\[[omega].sub.t] = [summation over ([infinity]/j=1] [[beta].sup.j] {[E [DELTA] [R.sub.t+j]\[[OMEGA]].sub.t]] - E[[DELTA] [R.sub.t+j]\[[omega].sub.t]]} is the error arising from the fact the econometrician is using a smaller data set than the market to forecast future short-rate changes. (42) Equivalently, we can form expectations conditional on data [[omega].sub.t-l]:

E[[S.sub.t]\[[omega].sub.t-l]] = E[[F.sub.t]\[[omega].sub.t-l]] + E[[K.sub.t]\[[omega].sub.t-l]], (17)

where we recognize that E[[[xi].sub.t]\[[omega].sub.t-l]] = 0 since [[xi].sub.t] is uncorrelated by construction with any information in [[omega].sub.t-l].

Finally, we impose that the term premia [K.sub.t] is unforecastable from information [[omega].sub.t-l], that is, E[[K.sub.t]\[[omega].sub.t-l]] = 0. Under this assumption, which is weaker than the assumption [K.sub.t] = 0 employed in the tests of the expectations theory discussed earlier, we obtain the following testable restrictions:

[h.sub.S][M.sup.l] = [h.sub.[DELTA]R][beta] M[[I - [beta] M].sup.-1] [M.sup.l], (18)

where we used the same arguments as above to rewrite E[[S.sub.t]\[[omega].sub.t-l]] = [h.sub.S][M.sup.l][[omega].sub.t-l] and E[[F.sub.t]\[[omega].sub.t-l]] = [h.sub.[DELTA]R][beta] M[[I - [beta] M].sup.-1] [M.sup.l][[omega].sub.t-l]. (43) This strategy is suggested by the fact that Sargent (1979) actually tests the expectations theory by considering such a relaxed form of the cross-equation restrictions with l = 1 (i.e., a one-period lag in the information set).

The restrictions in (18) can be evaluated using a likelihood ratio test similar to that used above, which compares the fit of the constrained and unconstrained vector autoregressions. Because of the assumed stationarity of the joint process for spreads and short-rate changes, the eigenvalues of the companion matrix M are all smaller than one in absolute value. It must be the case, then, that the restrictions are satisfied as l becomes very large, since both sides of the equation contain only zeros in the limit. However, restrictions of the form of (18) are valid and interesting so long as the researcher is willing to assume that term premia are unforecastable at some intermediate horizon.

Table 7 reports likelihood ratios of the unrestricted VAR against the VAR subject to the restrictions in (18) for the forecasting horizons l = 1, 3, 6, and 12. (44) Notably, the restrictions are rejected for all of these lags. Thus, while the cointegration tests of Section 3 indicate that variations in the term premia are stationary, the results of Table 7 show that departures from the expectations theory are not only due to high-frequency deviations but also occur at intermediate, business cycle frequencies.

5. EXPECTATIONS AND THE SPREAD

The preceding section illustrates that the cross-equation restrictions implied by the expectations theory are soundly rejected, even when we allow for some limited time-variation in the term premia. However, as Campbell and Shiller (1987) argue, statistical tests of the cross-equation restrictions may be "highly sensitive to deviations from the expectations theory--so sensitive, in fact, that they may obscure some of the merits." (45) In other words, even if the theory is not strictly true, it may contain important elements of the truth. This section builds on the ingenious approach of Campbell and Shiller (1987) in computing an estimate of the expectations component of the spread--which they call a "theoretical spread"--in order to shed more light on this issue. This approach also permits us to (i) extract an estimate of the term premium and (ii) to derive a rule of thumb linking the observed spread to unobserved expectations concerning temporary variations in the short-term interest rate.

Decomposing the Spread in Theory

Our discussion above stresses that the observed spread is the sum of two unobserved components, [S.sub.t] = [F.sub.t] + [K.sub.t], which we call the expectations and term premium components. From (17) above, we know that the spread conditional on the econometrician's information set [[omega].sub.t-l] can be written as:

E[[S.sub.t]\[[omega].sub.t-l]] = E[[F.sub.t]\[[omega].sub.t-l]] + E[[K.sub.t]\[[omega].sub.t-l]].

Under the expectations theory, we assumed that E[[K.sub.t]\[[omega].sub.t-l]] is constant (or zero in deviations from the mean). In this section, we alternatively calculate an estimate of the expectations component given an information set and compare it to the prediction of the spread conditional on that same information set. From our results above, we know that the expectations component can be formed as E[[F.sub.t]\[[omega].sub.t-l]] = [summation over ([infinity]/j=1)] [[beta].sup.j] E[[DELTA][R.sub.t+j]\[[omega].sub.t-l]] = [h.sub.[DELTA]R][beta] M[[I - [beta]M].sup.-1] [M.sup.l][[omega].sub.t-l]], and we also know that the predicted spread can be calculated as E[[S.sub.t]\[[omega].sub.t-l]] = [h.sub.S][M.sup.l][w.sub.t-l]. In these formulas, the coefficients from an unrestricted VAR are used to provide the elements of the matrix M that are relevant to forecasting. The difference between the two expressions, E[[K.sub.t]\[[omega].sub.t-l]] = E[[S.sub.t]\[[omega].sub.t-l]] - E[[F.sub.t]\[[omega].sub.t-l]], is an implied variation in the term premium.

Decomposing the Spread in Practice

In view of the results from the prior section, we calculate two decompositions of the spread, based on different information sets.

Current information: We begin by calculating an estimate of the expectations component and the residual term premium using current information [[omega].sub.t]. In this setting, which corresponds to the analysis of Campbell and Shiller (1987), E[[S.sub.t]\[w.sub.t]] simply equals the actual spread and E[[F.sub.t]\[[omega].sub.t]] = [h.sub.[DELTA]R][beta]M[[I - [beta]M].sup.-1][[omega].sup.t]

Panel A of Figure 5 shows that the expectations component (the spread under the expectations theory) is strongly positively correlated with the actual spread (correlation coefficient = 0.99) and displays substantial variability. Panel B of Figure 5 shows the spread and the term premium (the gap between the spread and the expectations component). The residual term premium is much less variable.

It is useful to consider a decomposition of variance for the spread, similar to that which we used for permanent and temporary components in Section 3:

var([S.sub.t]) = var([F.sub.t]\[[omega].sub.t]) + var([K.sub.t]\[[omega].sub.t]) + 2 * cov([F.sub.t]\[[omega].sub.t], [K.sub.t]\[[omega].sub.t])

1.93 = 0.94 + 0.20 + 2 * (0.40)

Panel A of Table 8 reports second moments of the spread, the expectations component and the term premia. The variance of the spread is 1.93 (as noted in the derivation of the first spread rule of thumb), while the variance of the expectations component is 0.94. Since their respective standard deviations are not too different (1.39 and 0.97, respectively) and since they are virtually perfectly correlated, it is not surprising that a glance at the first panel of Figure 5 leads one to think that the expectations component explains most of the spread. By contrast, the standard deviation of the estimated term premium is much smaller (0.45), so it is natural to downplay its contribution after glancing at the second panel. But as panel A of Table 8 indicates, there is a very high estimated correlation of changes in the term premium and changes in the expectations component (0.94), so there is a substantial contribution to variability in the spread that arises from the covariance term (0.80 of a total of 1.93).

Economically, the spread appears excessively volatile relative to the estimated expectations component because there is a tendency for periods of high expectations components to occur when the term premium is also high. (46) Looking back to the first test of rational expectations restrictions, Figure 5 provides insight into why the cross-equation restrictions are rejected, since it highlights the distinct behavior of the spread and the expectations component. The spread contains information about the temporary component of interest rates highlighted by the expectations theory, but there are important departures as well.

Results based on lagged information: Figure 6 and panel B of Table 8 use forecasts from the vector autoregression, using information six months previous. In panel A of Figure 6, the actual spread [S.sub.t] and the forecast E[S.sub.t]\[[omega].sub.t-6] are plotted. While these series move together, the forecasted spread is much less volatile than the actual spread (the variance of the forecasted spread is 0.65, which is about one-third of the actual spread's variance of 1.95). In panel B, the forecasted spread E[S.sub.t]\[[omega].sub.t-6] and the forecasted expectations component E[F.sub.t]\[[omega].sub.t-6] are plotted. While the forecasted expectations component is highly correlated with the forecasted spread, it is clearly less volatile as well. In panel C, the forecasted spread E[S.sub.t]\[[omega].sub.t-6] and the forecasted term premium component E[K.sub.t]\[[omega].sub.t-6] = E[S.sub.t]\[[omega].sub.t-6] - E[F.sub.t]\[[omega].sub.t-6] are plotted. This residual is postively associated with E[S.sub.t]\[[omega].sub.t-6] with a near-perfec t correlation. Its variance (0.076) is also somewhat more than one-third of the variance of the term premium measure E[K.sub.t]\[[omega].sub.t] that is shown in Figure 5.

This figure illustrates, we conjecture, why the rational expectations restrictions are rejected when the information set is lagged, as reported previously in Table 7 and discussed in detail above. The deviations of the forecastable part of the spread E[S.sub.t][[omega].sub.t-6] from the forecastable part of the expectations component E[F.sub.t]\[[omega].sub.t-6] appear important. Indeed, there is some evidence that E[K.sub.t]\[[omega].sub.t-6] are more serially correlated than either E[S.sub.t]\[[omega].sub.t-6] or E[S.sub.t]\[[omega].sub.t-6], as opposed to being unforecastable in the manner required for the rational expectations restrictions to be satisfied.

A Second Rule of Thumb for the Spread

If the spread rises by 1 percent, then how great a rise in the expectations component should an observer infer has occurred? This is a natural question, analogous to one earlier posed for the temporary component of the nominal interest rate, identified via the VECM. Since the variance of the spread is 1.93 and the covariance between the spread and the expectations component is 1.33, the rule of thumb coefficient is b = 0.69 = 1.33/1.93. Hence, we have the following.

Spread rule of thumb #2: If the spread exceeds its mean by 1 percent, then our estimates suggest that the expectations component is high by 0.69 percent.

Earlier, we derived a very similar implication--a coefficient of 0.71 but with an opposite rule sign--for the link between the temporary component of the short-term interest rate and the spread. It is not an accident that these two measures are very closely associated. The temporary component of the short-term rate is defined as [R.sub.t] - [R.sub.t], with [R.sub.t] = [R.sub.t-1] + [lim.sub.k[right arrow][infinity]] [summation over (k/j=0)] [E.sub.t][DELTA][R.sub.t+k]. It is accordingly given by [R.sub.t] - [R.sub.t] = -[[lim.sub.k[right arrow][infinity]] [summation over (k/j=1)] [E.sub.t][DELTA][R.sub.t+k]]. The expectations component studied in this section is E[[F.sub.t]\[[omega].sub.t-1]] = [summation over ([infinity]/j=1)] [[beta].sup.j] [E.sub.t][[DELTA][R.sub.t+j]]. In each case, the expectations terms are made operational by use of very similar linear forecasting models; there are small differences because [beta] is slightly smaller than one, but the essential theoretical and empirical properties are very similar except for the change in sign.

6. FOCUSING ON RECENT HISTORY

Many studies of recent macroeconomic history document changes in the pace and pattern of macroeconomic activity that have occurred over the past two decades. (47) Other studies suggest that a major reason for these changes is that the Federal Reserve System has altered its behavior in important ways. For example, Goodfriend (1993) argues that the U.S. monetary policy decision-making came of age--gaining important recognition and credibility--during this period, after having earlier traveled on a wide-ranging odyssey. Accordingly, in this section, we explore how some key features of our previous analysis change if we restrict attention to 1986.7-2001.11. The start date of this period was selected as descriptive of recent U.S. monetary policy with increased credibility, following the narrative history of Goodfriend (2002): it includes the last few years of the Volcker period and the bulk of the Greenspan period. We focus our attention on two sets of issues. First, how did the estimated variability in the perman ent component of interest rates change during this period? Second, how did the estimated importance of the expectations effects on the long-short spread change during this period?

The Stochastic Trend in Interest Rates

One important conclusion from our earlier analysis is that there is a common stochastic trend in interest rates, which is closely associated with the long rate. To conduct the analysis for the recent period, we start by reestimating the VECM discussed in Section 3 and reported in Table 4. Then, we calculate the permanent component suggested by this specification, producing the results reported in Figure 7 and Table 9.

We focus on two main results. First, as Figure 7 shows, the stochastic trend continues to be an important contributor to the behavior of both the long-term and short-term interest rates. As in the full sample period, it is closely associated with the long rate. Further, it is much less closely associated with the short rate.

Panel B of Table 9 provides more detail. It shows that changes in the common stochastic trend (permanent component) have a variance of 0.048, which is less than one-half of the comparable variance reported in Table 5. Thus, there is evidence that the stochastic trend is less important for both short-term and long-term interest rates. We can measure this reduced influence on our rule of thumb. Based on the full sample, we calculate that a 1 percent rise in the long rate should bring about a 97 percent rise in the predicted permanent component. On the recent sample, this rule-of-thumb coefficient is smaller: a 1 percent rise should bring about only a 84 percent increase in the predicted permanent component. (48) Yet, while the effect is smaller, changes in long rate still strongly signal changes in the stochastic trend.

Expectations and the Spread

Another important conclusion of our analysis above is that the spread is an indicator of forecastable temporary variation in short-term interest rates and, in particular, of market expectations of these variations. Figure 8 and panel C of Table 9 show that this relationship has been maintained and, indeed, has apparently gained strength during the recent period. In particular, if we look at rule of thumb #2 for the spread, which indicates the extent to which a high spread should be interpreted as reflecting a high expectations component, then the rule-of-thumb coefficient is 0.77 = 1.46/1.89 for the recent period, whereas it was only 0.69 for the entire sample period. (49)

In sum, the two reported differences for this more recent period are intriguing, and it is natural to think about possible sources of the change in stochastic properties of the term structure. For example, we might conjecture that the reduced importance of the permanent component is the result of a more credible, inflation-stabilizing monetary policy. Given the lack of structure in our present analysis, however, it is impossible to support such a claim with statistical evidence or to quantify its importance compared to other potential explanations. Rather, we consider that these findings highlight a topic that warrants further investigation.

7. SUMMARY AND CONCLUSIONS

We conclude that expectations about the level of interest rates are very important for the behavior of long-term interest rates on two dimensions. First, changes in the long-term interest rate substantially reflect changes in the permanent component (stochastic trend) in the level of the short-term rate. Second, the spread between long-term and short-term rates depends heavily on a temporary component (deviations from stochastic trend) of the level of short-term rates. Although the strong form of the expectations theory is rejected by a battery of statistical tests, it remains a workable approximation for many applied purposes. Changes in the long rate are largely a signal that the common trend in rates has shifted; a high spread is an important signal that future short rates will rise. More specifically, we provide rules of thumb for interpreting the expectations component of changes in long rates and the level of the long-short spread.

While the expectations theory is rejected, our rational expectations statistical approach is constructive in highlighting the ways in which the linear expectations theory of the term structure fails. The nature of predictable departures from the expectations theory, which we interpreted as time-varying term premia, suggests to us the importance of studying linkages between these factors and the business cycle, since our analysis indicates that these were not simply high frequency deviations.

Finally, the econometric methods that we use are nonstructural, in that they do not take a stand on the specific economic model that determines short-term rates. Nevertheless, the results of our investigation do make some suggestions about the shape that structural models must take, since they indicate the presence of a stochastic trend in the level of the interest rate. Recent research on monetary policy rules, as exemplified by Clarida, Gali, and Gertler (1999), almost invariably assumes that the short-term interest rate is governed by a stable behavioral rule of the central bank, linking it simply to the level of inflation and the level of the output gap, a specification which would preclude such shifts in trend interest rates when incorporated into most macroeconomic models. Our results suggest that a crucial next step in the analysis of monetary policy rules must be the exploration of specifications that can give rise to a stochastic trend in interest rates. In addition, most current macroeconomic models would generally ascribe such shifts in interest rate trends to shifts in inflation trends. Our results thus suggest the importance of an analysis of the interplay between trend inflation, the long-term rate, and monetary policy.

APPENDIX A: THE SHILLER APPROXIMATION

The purpose of this appendix is to derive and exposit Shiller's approximate equation for the yield on a long-term bond. For a coupon bond of arbitrary maturity, N, the yield-to-maturity is the interest rate that makes the price equal to the present discounted value of its future cash flows {[C.sub.t+j]}, which may include both coupons and face value:

[P.sup.L.sub.t] = [summation over (N/j=1)] [C.sub.t+j]/[(1 + [R.sup.L.sub.t]).sup.j].

In the particular case of a bond with infinite term, which is commonly called a consol, the relationship is

[P.sup.L.sub.t] = [summation over ([infinity]/j=1]) C/[(1 + [R.sup.L.sub.t]).sup.j] = C/[R.sup.L.sub.t].

Between t and t + 1, the holding period yield on any coupon bond is given by

[H.sub.t+1] = [P.sub.t+1] + C - [P.sub.t]/[P.sub.t].

Accordingly, the holding-period yield on a consol is given by

[H.sub.t+1] = (C/[R.sup.L.sub.t+1]) + C - (C/[R.sup.L.sub.t])/(C/[R.sup.L.sub.t]) = (1/[R.sup.L.sub.t+1]) + 1/(1/[R.sup.L.sub.t]) - 1.

The ratio [R.sup.L.sub.t]/[R.sup.L.sub.t+1] is approximately 1 + [theta]([R.sup.L.sub.t] - [E.sub.t][R.sup.L.sub.t+1]) via a first order Taylor series approximation about the point [R.sup.L.sub.t] = [R.sup.L.sub.t+1] = [R.sup.L], [theta] = 1/[R.sup.L]. It then follows that the holding-period yield is approximately

[H.sub.t+1] = [theta]([R.sup.L.sub.t] - [R.sup.L.sub.t+1]) + [R.sup.L.sub.t].

Notice that small changes in the yield [R.sup.L.sub.t] - [R.sup.L.sub.t+1] have large implications for the holding-period yield [H.sub.t+1] because [theta] is a large number. For example, if the annual interest rate is 6 percent and the observation period is one month, then [theta] = 1/(0.005) = 200. Defining [beta] = 1/(1 + [R.sup.L]), this expression can be written as [H.sub.t+1] = 1/1-[beta][R.sup.L.sub.t] - [beta]/1-[beta][R.sup.L.sub.t+1], which is convenient for the discussion below.

Suppose next that this approximate holding-period yield is equated (in expected value) to the short-term interest rate [R.sub.t] and a term premium [k.sub.t]. Then, it follows that

[E.sub.t][H.sub.t+1] = [E.sub.t][1/1 - [beta][R.sup.L.sub.t] - [beta]/1 - [beta][R.sup.L.sub.t+1]] = [R.sub.t] + [k.sub.t]

or

[R.sup.L.sub.t] = [beta][E.sub.t][R.sup.L.sub.t+1] + (1 - [beta])([R.sub.t] + [k.sub.t]),

which is the form used in the main text. This derivation highlights the fact that the linear coefficient [beta] may "drift" over time if the average level of the long rate is very different. It also highlights the fact that this term structure formula is an approximation suitable for very long-term bonds.

APPENDIX B: FORECASTING WITH THE VECM

We estimate a VECM of the form

[DELTA][R.sub.t] = a(B)[DELTA][R.sup.L.sub.t-i] + b(B)[DELTA][R.sub.t-i] + f[S.sub.t-1] + [e.sub.Rt]

[DELTA][R.sup.L.sub.t] = c(B)[DELTA][R.sup.L.sub.t-i] + d(B)[DELTA][R.sub.t-i] + g[S.sub.t-1] + [e.sub.Lt],

where B is the backshift (lag) operator. We note that this difference between these two equations is

[DELTA][S.sub.t] = [c(B) - a(B)][DELTA][R.sup.L.sub.t-i] + [d(B) - b(B)][DELTA][R.sub.t-i] + (g - f)[S.sub.t-1] + [e.sub.Lt]-[e.sub.Rt],

so that we can write

[S.sub.t] = [c(B) - a(B)][DELTA][R.sup.L.sub.t-1] + [d(B) - b(B)][DELTA][R.sub.t-1] + (1 + g - f)[S.sub.t-1] + ([e.sub.Lt] - [e.sub.Rt]),

so that it is easy to write the system in state space form defining [x.sub.t-1] = [[DELTA][R.sub.t-1] [DELTA][R.sub.t-2] ... [DELTA][R.sub.t-p] [DELTA][R.sup.L.sub.t-1] [DELTA][R.sup.L.sub.t-2] [DELTA][R.sup.L.sub.t-p] [S.sub.t-1]], which captures all of the predictor variables in these three equations. The main state equation is of the form [x.sub.t] = M[x.sub.t-1] + G[e.sub.t], with the elements being

[x.sub.t] = [FORMULA NOT REPRODUCIBLE IN ASCII]

M = [FORMULA NOT REPRODUCIBLE IN ASCII]

G[e.sub.t] = [FORMULA NOT REPRODUCIBLE IN ASCII]

APPENDIX C: VARIOUS COINTEGRATED MODELS

In this appendix, we want to demonstrate that the vector autoregression system estimated by Campbell and Shiller (1987) implies a vector error correction model with the cointegrating vector [1 - 1]. The discussion is a specific case of the existence of a Phillips triangular form for a cointegrated system (see Hamilton [1994, 576-78]).

We write the vector error correction model as

[DELTA][R.sup.L.sub.t] = a(B)[DELTA][R.sup.L.sub.t-1] + b(B)[DELTA][R.sub.t-1] + f([R.sup.L.sub.t-1] - [R.sub.t-1]) + [e.sub.Lt]

[DELTA][R.sub.t] = c(B)[DELTA][R.sup.L.sub.t-1] + d(B)[DELTA][R.sub.t-1] + g([R.sup.L.sub.t-1] - [R.sub.t-1]) + [e.sub.Rt],

where B is the back-shift (lag) operator.

We write the VAR system of the CS form as

[S.sub.t] = g(B)[S.sub.t-1] + h(B)[DELTA][R.sub.t-1] + [e.sub.St]

[DELTA][R.sub.t] = i(B)[S.sub.t-1] + j(B)[DELTA][R.sub.t-1] + [e.sub.Rt].

Finding the first equation in the VECM: Add the second equation of the VAR to the first, resulting in

[R.sup.L.sub.t] - [R.sub.t-1] = [g(B) + i(B)][S.sub.t-1]

+ [h(B) + i(B)][DELTA][R.sub.t-1] + ([e.sub.St] + [e.sub.Rt]).

Reorganize this as

[R.sup.L.sub.t] - [R.sup.L.sub.t-1] = [1 + g(1) + i(1)][S.sub.t-1]

+ [g(B) - g(1) + i(B) - i(1)][S.sub.t-1]

+ [h(B) + i(B)][DELTA][R.sub.t-1] + ([e.sub.St] + [e.sub.Rt]),

where g(1) is the sum of coefficients in the g polynomial (and similarly for i). Since the coefficients in [g(B) - g(1)] sum to zero by construction, it is always possible to factor [g(B) - g(1)] = [gamma](B)(1 - B) with [gamma](B) having one less lag than g(B). Further, we can similarly write i(B) - i(1) = [phi](B)(1 - B). Hence, we can write the above equation as

[R.sup.L.sub.t] - [R.sup.L.sub.t-1] = [1 + g(1) + i(1)][S.sub.t-1]

+ [[gamma](B) + [phi](B)]([DELTA][R.sup.L.sub.t-1] - [DELTA][R.sub.t-1])

+ [h(B) + i(B)][DELTA][R.sub.t-1] + ([e.sub.St] + [e.sub.Rt]),

which takes the general form of the VECM equation with suitable definitions of a(B) and b(B).

Finding the second equation in the VECM: Similarly, we can rearrange the second equation above as

[DELTA][R.sub.t] = [i(B) - i(1)][S.sub.t-1] + j(B)[DELTA][R.sub.t-1] + i(1)[S.sub.t-1] + [e.sub.Rt]).

Hence,

[DELTA][R.sub.t] = [phi](B)[DELTA][R.sup.L.sub.t-1] + [j(B) - [gamma](B)][DELTA][R.sub.t-1] + i(1)[S.sub.t-1] + [e.sub.Rt],

which is the same form as the second equation of the VECM system. Thus, the Campbell-Shiller VAR implies a VECM.

[FIGURE 1 OMITTED]

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

[FIGURE 4 OMITTED]

[FIGURE 5 OMITTED]

[FIGURE 6 OMITTED]

[FIGURE 7 OMITTED]

[FIGURE 8 OMITTED]

Table 1

Decade Averages

 Short Rate Long Rate Spread

1950s 1.85 3.02 1.17
1960s 3.81 4.63 0.82
1970s 6.13 7.57 1.45
1980s 8.54 10.69 2.15
1990s 4.80 7.10 2.30
Full Sample 5.13 6.67 1.57

Notes: All values are in percent per annum.

Table 2

Unit Root Tests

Full Sample Estimates (1951.4--2001.11)

 [DELTA][R.sub.t] [DELTA][R.sup.L.sub.t]
 con. uncon. con. uncon.

constant 0 0.0123 0 0.0043
 (0.0057) (0.0025)
lagged 0 -0.0283 0 -0.0068
level (0.0116) (0.0042)
lag 1 -0.2151 -0.0198 0.0896 0.0918
 (0.0406) (0.0411) (0.0409) (0.0409)
lag 2 -0.1649 -0.1499 -0.0441 -0.0418
 (0.0415) (0.0419) (0.0407) (0.0407)
lag 3 -0.0082 0.0037 -0.1390 -0.1369
 (0.0416) (0.0417) (0.0407) (0.0407)
lag 4 -0.1193 -0.1094 0.0384 0.0398
 (0.0406) (0.0407) (0.0409) (0.0409)

R-square 0.0721 0.0811 0.0301 0.0348

F-value 2.9352 1.4415

 [S.sub.t-1]
 con. uncon.

constant 0 0.0154
 (0.0043)
lagged 0 -0.1149
level (0.0261)
lag 1 -0.3256 -0.2471
 (0.0404) (0.0437)
lag 2 -0.2610 -0.1954
 (0.0425) (0.0444)
lag 3 -0.0759 -0.0268
 (0.0425) (0.0433)
lag 4 -0.1521 -0.1157
 (0.0404) (0.0407)

R-square 0.1322 0.1594

F-value 9.6688

Notes: Numbers in parentheses represent standard errors. The critical 5
percent (10 percent) value for the Adjusted Dickey-Fuller F-test is 4.59
(3.78).

Table 3

Efficient Markets Tests

Full Sample Estimates (1951.4--2001.11)

 [DELTA]
 [R.sup.L.sub.t]
 test 1 test 2 test 3 test 4

constant 0.0002 0.0023 0.0033 0.0032
 (0.0010) (0.0015) (0.0015) (0.0016)
[S.sub.t-1] 0.0050 -0.0147 -0.0215 -0.0214
 (0.0084) (0.0085) (0.0095)
[DELTA][R.sub.t-1] -0.0284
 (0.0159)
[DELTA][R.sub.t-2] -0.0321
 (0.0160)
[DELTA][R.sub.t-3] -0.0258
 (0.0157)
[DELTA][R.sub.t-4] 0.0295
 (0.0152)
[DELTA][R.sup.L.sub.t-1] 0.1002
 (0.0410)
[DELTA][R.sup.L.sub.t-2] -0.0496
 (0.0406)
[DELTA][R.sup.L.sub.t-3] -0.1523
 (0.0409)
[DELTA][R.sup.L.sub.t-4] 0.0248
 (0.0411)
R-square -0.0040 0.0051 0.0408 0.0272



 test 5

constant 0.0034
 (0.0016)
[S.sub.t-1] -0.0229
 (0.0094)
[DELTA][R.sub.t-1] -0.0147
 (0.0171)
[DELTA][R.sub.t-2] -0.0164
 (0.0175)
[DELTA][R.sub.t-3] -0.0250
 (0.0168)
[DELTA][R.sub.t-4] 0.0301
 (0.0152)
[DELTA][R.sup.L.sub.t-1] 0.1048
 (0.0409)
[DELTA][R.sup.L.sub.t-2] -0.0335
 (0.0429)
[DELTA][R.sup.L.sub.t-3] -0.1328
 (0.0440)
[DELTA][R.sup.L.sub.t-4] 0.0507
 (0.0443)
R-square 0.0550

Notes: Numbers in parentheses represent standard errors. F-stat
(Regression 3 vs. Regression 2) = 5.598. F-stat (Regression 4 vs.
Regression 2) = 3.436. F-stat (Regression 5 vs. Regression 3) = 2.280.
F-stat (Regression 5 vs. Regression 4) = 4.441. The critical 5 percent
(1 percent) F(4,400) value is 2.39 (3.36).

Table 4

VAR/VECM Estimate

Full Sample Estiamtes (1951.4--2001.11)

 [DELTA][R.sub.t] [DELTA][R.sup.L.sub.t]
 VAR VECM VAR VECM

[S.sub.t-1] 0.1101 -0.0229
 (0.0237) (0.0094)
[DELTA][R.sub.t-1] -0.3095 -0.2382 0.0001 -0.0147
 (0.0408) (0.0430) (0.0160) (0.0171)
[DELTA][R.sub.t-2] -0.1997 -0.1393 -0.0038 -0.0164
 (0.0427) (0.0440) (0.0168) (0.0175)
[DELTA][R.sub.t-3] -0.0051 0.0426 -0.0151 -0.0250
 (0.0417) (0.0423) (0.0164) (0.0168)
[DELTA][R.sub.t-4] -0.0879 -0.0466 0.0387 0.0301
 (0.0377) (0.0382) (0.0148) (0.0152)
[DELTA][R.sup.L.sub.t-1] 0.8712 0.8209 0.0943 0.1048
 (0.1038) (0.1026) (0.0408) (0.0409)
[DELTA][R.sup.L.sub.t-2] 0.6250 0.5954 -0.0397 -0.0335
 (0.1095) (0.1078) (0.0430) (0.0429)
[DELTA][R.sup.L.sub.t-3] 0.1791 0.1698 -0.1347 -0.1328
 (0.1123) (0.1104) (0.0441) (0.0440)
[DELTA][R.sup.L.sub.t-4] 0.0430 0.0520 0.0526 0.0507
 (0.1133) (0.1114) (0.0445) (0.0443)

R-square 0.2220 0.2492 0.0459 0.0553
F-statistic 21.2172 21.9030 3.5785 3.8610

Notes: Numbers in parentheses represent standard errors. The likelihood
ratio statistic of the VECM against the VAR is 27.6704. Comparing this
value to the corresponding critical value in Horvath and Watson's tables
leads to strong rejection of null of two unit roots (p-value higher than
0.01).

Table 5

Summary Statistics for Permanent-Temporary Decomposition

Full Sample Estimates (1951.4--2001.11)

A. Short-Rate Changes

 Total Permanent Temporary

 0.6559 0.1083 0.5477
 0.4133 0.1046 0.0036
 0.9168 0.0152 0.5440

B. Long-Rate changes

Total Permanent Temporary

 0.0826 0.0802 0.0023
 0.8631 0.1046 -0.0244
 0.0499 -0.4614 0.0268

C. Long-Short Spread

 Total Temporary Long Rate Temporary Short Rate

 1.9318 0.5649 -1.3668
-0.9920 -0.3841 0.9827
 0.9559 0.1808 -0.9114

Notes: Table 5 is based on the VECM estimates in Table 4. Each panel
contains a 3 by 3 matrix. On the diagonal, variances are reported (e.g.,
the variance of changes in long rates is 0.0826). Above the diagonal,
covariances are listed (e.g., the covariance between changes in the long
rate and changes in its permanent component is 0.0802). Below the
diagonal, the corresponding correlation is reported (e.g., the
correlation between changes in the long rate and changes in its
permanent component is 0.8631).

Table 6

VAR Tests of the Expectations Hypothesis

Full Sample Estimates (1951.4--2001.11)

 [DELTA][R.sub.t] [S.sub.t]
 VAR VAR
 unconstrained consistent unconstrained consistent
 VAR with ET VAR with ET

[DELTA][R.sub.t-1] 0.5782 0.5739 -0.4927 -0.5739
 (0.1095) (0.1088) (0.1171) (0.1088)
[DELTA][R.sub.t-2] 0.4580 0.4604 -0.5059 -0.4604
 (0.1124) (0.1116) (0.1201) (0.1116)
[DELTA][R.sub.t-3] 0.2192 0.2268 -0.3701 -0.2268
 (0.1125) (0.1117) (0.1202) (0.1117)
[DELTA][R.sub.t-4] -0.0447 -0.0464 0.0767 0.0464
 (0.0379) (0.0377) (0.0405) (0.0377)
[S.sub.t-1] 0.9254 0.9218 0.1507 0.0838
 (0.1021) (0.1014) (0.1091) (0.1014)
[S.sub.t-2] -0.2228 -0.2159 0.0875 0.2159
 (0.1542) (0.1532) (0.1649) (0.1532)
[S.sub.t-3] -0.4233 -0.4184 0.3263 0.4184
 (0.1552) (0.1541) (0.1659) (0.1541)
[S.sub.t-4] -0.1693 -0.1761 0.3023 0.1761
 (0.1104) (0.1096) (0.1180) (0.1096)

Notes: All variables represent deviations from their respective means.
Numbers in parentheses represent standard errors. The likelihood ratio
test of the unconstrained VAR against the VAR consistent with the
expectations theory (ET) is 35.7131. Since the corresponding critical
0.1 percent [chi square] value for 8 degrees of freedom is only 26.1,
the restrictions imposed by the ET are strongly rejected.

Table 7

VAR Tests Based on Lagged Information

Full Sample Estimates (1951.4--2001.11)

Information Likelihood Ratio
 Lag (between unconstrained
 and constrained VAR)

 0 35.7131
 1 32.8594
 3 33.6881
 6 33.6300
 12 35.6203

Table 8

Summary Statistics for Expectations Component/Term Premium Decomposition

Full Sample Estimates (1951.4--2001.11)

Spread Expectations Term Premium

A. Based on Current Information

1.9318 1.3339 0.5979
0.9923 0.9355 0.3984
0.9633 0.9225 0.1994

B. Based on 6-months Forecasts

0.6495 0.4264 0.2231
0.9998 0.2800 0.1464
0.9995 0.9987 0.0767

Notes: Statistics correspond to Figures 5 and 6. Each panel contains a 3
by 3 matrix. On the diagonal, variances are reported (e.g., the variance
of 6-months forecasts of the spread is 0.6495). Above the diagonal,
covariances are listed (e.g., the covariance between the spread and
expectations in the current information case is 1.3339). Below the
diagonal, the corresponding correlation is reported (e.g., the
correlation between the spread and expectations in the current
information case is 0.9923).

Table 9

Summary Statistics for Two Decompositions

Subsample Estimates (1951.4-2001.11)

A. Short-Rate Changes

 Total Permanent Temporary

 0.4409 -0.0003 0.4411
-0.0019 0.0476 -0.0478
 0.9501 -0.3137 0.4890

B. Long-Rate Changes

 Total Permanent Temporary

 0.0636 0.0536 0.0100
 0.9747 0.0476 0.0061
 0.6314 0.4422 0.0039

C. Spread

Spread Expectations Term Premium

 1.8917 1.4574 0.4343
 0.9950 1.1341 0.3232
 0.9475 0.9108 0.1111

Notes: Each panel contains a 3 by 3 matrix in a manner similar to Tables
5 and 8. On the diagonal, variances are reported. Above the diagonal,
covariances are listed. Below the diagonal, the corresponding
correlation is reported.

(1.) See Hetzel and Leach (2001) for an interesting recent account of the events surrounding the Accord.

(2.) The sense in which this measure is optimal is discussed in more detail below, but it is based on minimizing the variance of prediction errors over our sample period of 1951 to 2001.

(3.) By contrast, a similar calculation indicates that changes in short-term interest rates are a much less strong indicator of changes in the stochastic trend: the comparable adjustment coefficient is 0.17 rather than 0.97. This finding is consistent with other evidence of important temporary variations in short-term interest rates, presented in this article and other studies.

(4.) We impose the cross-equation restrictions on the VAR and calculate a likelihood ratio test that compares the fit of the constrained and unconstrained VAR, while campbell and Shiller (1987) use a Wald-type test of the restrictions on an estimated unrestricted VAR. It is now understood that Wald tests of nonlinear restrictions are sensitive to the details of how such tests are set up and suffer from much more severe small-sample bias than the method we employ here (see Bekaert and Hodrick [2001]).

(5.) For the sake of simplicity, we use the same lag length of four months throughout the article. However, we also performed the different econometric tests with a higher lag length of p = 6 (as used for example by Watson [1999]) and found our results to be robust to this change.

(6.) See Dickey and Fuller (1981) for a discussion of the nonstandard distribution of this test statistic and a table of critical values.

(7.) A weaker null hypothesis, advocated for example by Hamilton (1994, 511-12), does not require [a.sub.0] = 0. This allows there to be a deterministic trend in the level of nominal rates, which seems implausible to us. But the second column of Table 2 also shows that there is no strong evidence against this null hypothesis, since f = -0.0283 with a standard error of 0.0116. More specifically, the value of the Dickey-Fuller t-statistic is -2.43, which is less than the 10 percent critical level of -2.57.

(8.) The estimated level coefficient is also smaller and the associated Dickey-Fuller t-statistic takes on a value of -1.62.

(9.) The constrained regressions display a similar pattern, although there are the familiar difficulties with interpreting [R.sup.2] when no constant term is present (see, for example, Judge et al. [1985, 30-31]).

(10.) See, for example, Campbell, Shiller, and Schoenholtz (1983) or Campbell and Shiller (1987).

(11.) For our full sample, the average of the long rate equals 6.67 percent, or expressed as a monthly fraction: [R.sup.L] = 0.0667/12 = 0.00556.

(12.) To undertake this derivation, note that [R.sub.t+j] - [R.sub.t] = [R.sub.t+j] - [R.sub.t+j-1]+...([R.sub.t+1] - [R.sub.t]). Hence, each expected change enters many times in the sum, with a total effect of [summation over ([infinity]/h=j)] [[beta].sup.h][E.sub.t]([R.sub.t+j] - [R.sub.t+j-1]) = [[beta].sub.j]/1 - [beta][E.sub.t]([R.sub.t+j] - [R.sub.t+j-1]).

(13.) Below, we use the notation [K.sub.t] = (1 - [beta]) [summation over ([infinity]/j=0)] [[beta].sup.j] [E.sub.t][k.sub.t+j]]. But if [k.sub.t] = k, then K = k.

(14.) See, for example, Campbell and Shiller (1991) for the term structure of interest rates or Bekaert and Hodrick (2001) for foreign exchange rates.

(15.) One potential explanation for the failure of the efficient markets tests--highlighted in Fama (1977)--is that there may be time-variation [k.sub.t] in the equilibrium returns, which investors require to hold an asset. Then the theory predicts that

[R.sup.L.sub.t] - [R.sup.L.sub.t-1] = (1/[beta] - 1)([R.sup.L.sub.t-1] - [R.sub.t-1] - [k.sub.t-1] + [[xi].sub.t].

But the researcher conducting the test does not observe time variation in k, which may give rise to a biased estimate on the spread. Fama stresses that efficient markets tests involve a joint hypothesis about the efficient use of information and a model of equilibrium returns, so that a rejection of the theory may arise from either element.

(16.) See the discussion of Nelson and Schwert (1977) on testing for a constant real rate.

(17.) For example, at a recent macroeconomics conference, one prominent monetary economist argued that the expectations theory of the term structure has been rejected so many times that it should never be built into any model.

(18.) In more technical terms, when [R.sub.t] and [R.sup.L.sub.t] are cointegrated, then the vector moving average representation of [DELTA][x.sub.t] = [[DELTA][R.sub.t] [DELTA][R.sup.L.sub.t]] (which exists by definition of the Wold decomposition theorem) is noninvertible. As a result, no corresponding finite-order VAR approximation can exist. See Hamilton (1994, 574-75) for details.

(19.) This type of test is somewhat more powerful than the unit root test on the spread reported in Table 2, which may be revealed by taking the difference between the two VECM equations and reorganizing the results slightly to obtain

[FORMULA NOT REPRODUCIBLE IN ASCII]

which can be further rewritten as

[FORMULA NOT REPRODUCIBLE IN ASCII]

That is, the Horvath-Watson test essentially introduces some additional stationary regressors to the forecasting equation for changes in the spread that was used in the DF test. Adding these regressors can improve the explanatory power of the regression, resulting in a more powerful test.

(20.) That is, long rates Granger-cause short rates.

(21.) As Horvath and Watson (1995) stress, the relevant critical values for the likelihood ratio must take into account that the spread is nonstationary under the null. Thus, we cannot refer to a standard chi-square table. We estimate the VAR and VECM without constant terms, since we are assuming no deterministic trends in interest rates. However, we allow for a mean value of the spread, which is not zero as shown in (7) and (8). Unfortunately, this combination of assumptions means that we cannot use the tables in Horvath and Watson (1995), but must conduct the Monte Carlo simulations their method suggests to calculate the critical values reported in the text. Details are contained in replication materials available at http://people.bu.edu/rking.

(22.) An alternative approach in this section would be to estimate the cointegrating vector and use the well-known testing method of Johansen (1988). Horvath and Watson (1995) establish that their procedure is more powerful if the cointegrating vector is known.

(23.) The idea that cointegration implies common stochastic trends is developed in Stock and watson (1988) and King, Plosser, Stock, and Watson (1991).

(24.) Under the expectations theory with a constant term premium, the average value of the spread must be the term premium K. So, to avoid proliferation of symbols, we use that notation here.

(25.) To understand the sensitivity of the trend to the form of the estimated equation for the long rate, we compared three alternative measures of the trend. The first was the test measure based on the estimated VECM (i.e., the one reported in this section); the second was based on replacing the long-rate equation with the result of a simple regression of long-rate changes on the spread (i.e., the specification that we used for testing the efficient markets restriction above) so that there was a small negative weight on the spread in the long-rate equation; and the third was based on the efficient markets restriction (i.e., placed a small positive weight on the lagged spread). While there were some differences in these trend estimates on a period-by-period basis, they tell the same basic story in terms of the general pattern of rise and fall in the stochastic trend.

(26.) Here and below, our estimate of the stochastic trend allows us to calculate the variance decomposition, including the variance of changes in the trend and the covariance term. Note that due to rounding errors, the variance decompositions do not add up exactly.

(27.) There is also substantial serial correlation in the spread, as well as in the temporary components of the short rate and the long rate. The first order autocorrelations of these series are, respectively, 0.81, 0.72, and 0.93.

(28.) Of course, we could have devised a similar rule of thumb for the short rate by replacing [DELTA][R.sup.L.sub.t] by [DELTA][R.sub.t] in the formula for the coefficient b. The result would have been a much more modest rule of thumb coefficient (0.1651 = 0.1083/0.6559). This smaller coefficient reflects the fact that temporary variations are much more important for the short rate.

(29.) For this purpose, we interpret [Y.sub.t] as the change in the temporary component of the short rate and replace [DELTA][R.sup.L.sub.t] with the spread (less its mean) in the above formula for b. Based on the third panel of Table 5, the covariance between changes in the temporary component of the short rate and the spread equals -1.37 and the variance for changes in the spread is 1.93.

(30.) According to the expectations theory, K does not have to equal zero. For the sake of convenience, we set K = 0, which can be reconciled with the data if we consider all variables as deviations from their respective means.

(31.) Such a stationary system is sometimes called a VECM in Phillips's triangular form. See Hamilton (1994. 576-78) and Appendix C.

(32.) VECM regressions like (7) and (8) in the previous section are also restricted by the expectations theory. According to our simple model, the dynamics of short- and long-rate changes take the form

[DELTA][R.sub.t] = [e.sub.Pt] + [e.sub.Tt] + 1 - [beta][rho]/[beta][S.sub.t-1],

[DELTA][R.sub.L,t] = [DELTA][[tau].sub.t] + [theta][DELTA][x.sub.t] = [e.sub.[tau],t] + [theta][e.sub.x,t] + 1 - [beta]/[beta][S.sub.t-1].

The second equation for the long-rate change is simply the efficient markets restriction.

(33.) In our simple model, the VECM approach (discussed in the previous footnote) helps to correctly uncover some features of the data that are not known a priori by the econometrician. First, the temporary component [x.sub.t] of the short rate is reflected in a temporary component of the long rate, but with a much dampened magnitude for plausible values of [beta] and [rho]. For example, if 1/[beta] = 1.005 and [rho] = 0.8, then the composite coefficient [theta] takes on a value of 0.005/0.025 = 0.2. Second, the spread is predicted to be a significant predictive variable for interest rates in the VECM, but especially for the temporary component of interest rates. These features of the model appear broadly in accord with the estimated VECM and its outputs, particularly in terms of the implication that there is a much smaller volatility of the temporary component of the long rate than the temporary component of the short rate. In addition, the generally poor predictive performance for changes in the long rate seems c onsistent with the importance of permanent shocks in that equation, relative to the small effect of the spread. Finally, the spread and the temporary component of the short-term interest rate are negatively associated in the example as in the outputs of the VECM. But other features of the model are at variance with the results obtained via estimating a VECM. In particular, the temporary component of the long rate has a strong positive association with the temporary component of the short rate in the model, while there is a negative correlation in the estimates discussed in the preceding section.

(34.) The example we discussed above used one lag for analytical convenience, but in this empirical context we use multiple lags to capture the dynamic interactions between the variables more completely.

(35.) Note that we have dropped the constant K from the equation for the sake of notational simplicity. In econometric terms, this simply means that, without a loss of generality, we have to test the expectations theory with demeaned data.

(36.) As Campbell and Shiller (1987) stress, the explanation for this result is subtle: the expectations theory says that the spread is simply the discounted sum of future expected short-rate changes. Under the null that the theory is true, all the relevant information that market participants use to forecast future short-rate changes must by definition be embodied in the actual spread. As long as [S.sub.t] is part of the econometrician's information set [[omega].sub.t] it must thus be the case that E[[summation over ([infinity]/j=1)] [[beta].sup.j] [DELTA] [R.sub.t+j]\[[OMEGA].sub.t]] = E[[summation over ([infinity]/j=1)] [[beta].sup.j] [DELTA] [R.sub.t+j]\[[omega].sub.t]]. It is important to note that this result is conditional on the expectations theory holding exactly. If we relax the null to allow for time-varying term premia or even a simple error term, [S.sub.t] no longer embodies all necessary information about expected future short-rate changes.

(37.) This restriction to the past history of interest rates follows Sargent (1979) and Campbell and Shiller (1987). It would be of some interest to explore the implications of adding other macroeconomic variables.

(38.) As Campbell and Shiller (1987) note, the cross-equation (15) can be simplified to a linear set of restrictions. Specifically, we can rewrite them as [h.sub.S][I - [beta]M] = [h.sub.[DELTA]R][beta]M, which implies that [a.sub.i] = -[c.sub.i] for i = 1,..., p; [d.sub.1] = 1/[beta] - [b.sub.1]; and [b.sub.i] = -[d.sub.i].

(39.) The reported results hold true for alternative lag lengths as well.

(40.) Because of the specific linear nature of the cross-equation restrictions noted above, the constraint estimates and the standard errors for different pairs of VAR coefficients are identical.

(41.) Bekaert and Hodrick (2001) show that in the context of cross-equation restrictions tests of present-value models such as the expectations theory, Wald tests suffer from substantially larger sample biases than likelihood ratio tests or Lagrangean multiplier tests.

(42.) As noted in a previous footnote, under the null that the expectations theory holds, [S.sub.t] embodies all necessary information about future short-rate changes, and thus E[[DELTA][R.sub.t+j]\[[OMEGA]].sub.t]] = E[[DELTA][R.sub.t+j]\[[omega].sub.t]] as long as [S.sub.t] is part of [[omega].sub.t]. However, since now we have relaxed the assumption of constant term premia (i.e., the expectations theory does not hold), we can no longer assume that [S.sub.t] contains all necessary information about future short-rate changes. This means that replacing the market's information set [[OMEGA].sub.t] with the econometrician's information set [[omega].sub.t] [subset] [[OMEGA].sub.t] (potentially) introduces a forecasting error.

(43.) It might appear that one could "divide out" the terms [M.sup.l] from both sides of (18), restoring the restrictions (15). However, the matrix M can be shown to be singular if E[[K.sub.t]\[[omega].sub.t-l]] = 0 is true (Kurmann [2002a]).

(44.) The variables in the information set [[omega].sub.t-l] remain the same as for the cross-equation restriction tests above (i.e., [omega] consists of lags of [DELTA]R and S). However, it would be interesting to assess the robustness of the reported results if we included additional variables that are likely to help forecast changes in the short rate.

(45.) Campbell and Shiller (1987, 1080).

(46.) While these are point estimates and do not take into account uncertainty implies by the fact that the unrestricted VAR is estimated rather than known, preliminary results in Kurmann (2002b) suggest that there may not be too much uncertainty in our context.

(47.) For example, see Blanchard and Simon (2001) or Stock and Watson (2002).

(48.) In terms of elements of Table 9, the rule-of-thumb coefficient is calculated as b = 0.0536/0.0636 = 0.84.

(49.) We think that a natural next stage of research involves a more systematic inquiry into the evolving nature of the links between short-term rates and long-term rates. For example, Watson (1999) argues that increased persistence in short-term interest rates--which in our case would involve evolving VAR coefficients--helps explain the increased variability of long-term rates from the 1965-1978 period to the 1985-1998 period. This section, by contrast, argues that the changes in the persistent component in interest rates (the stochastic trend) were less important during 1986-2001 than over the 1951-2001 sample that includes the volatile 1979-1984 period not studied by Watson. A recent attempt to take into account time variations in the VAR parameters is Favero (2001), who computes the long rate under the expectations theory using a rolling regression VAR approach.

REFERENCES

Bekaert, Geert, and Robert J. Hodrick. 2001. "Expectations Hypothesis Tests." Journal of Finance 56 (August): 1357-94.

Beveridge, Stephen, and Charles R. Nelson. 1981. "A New Approach to Decomposition of Economic Time Series into Permanent and Transitory Components with Particular Attention to Measurement of the 'Business Cycle'." Journal of Monetary Economics 7 (March): 151-74.

Blanchard, Olivier, and J. Simon. 2001. "The Long and Large Decline in U.S. Output Volatility." Brookings Papers on Economic Activity 1 (March): 135-64.

Campbell, John Y., and Robert J. Shiller. 1987. "Cointegration and Tests of Present Value Models." Journal of Political Economy 95 (October): 1062-88.

_____. 1991. "'Yield Spreads and Interest Rate Movements: A Bird's Eye View." The Review of Economic Studies 58 (May): 495-514.

_____, and Kermit L. Schoenholtz. 1983. "Forward Rates and Future Policy: Interpreting the Term Structure of Interest Rates." Brookings Papers on Economic Activity 1 (March): 173-217.

Clarida, Richard, Jordi Gali, and Mark Gertler. 1999. "The Science of Monetary Policy: A New Keynesian Perspective." Journal of Economic Literature 37 (December): 1661-707.

Dickey, David N., and Wayne A. Fuller. 1981. "Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root." Econometrica 49 (June): 1057-72.

Dotsey, Michael. 1998. "The Predictive Content of the Interest Rate Term Spread for Future Economic Growth." Federal Reserve Bank of Richmond Economic Quarterly 84 (Summer): 31-51.

Engle, Robert F., and Clive W. J. Granger. 1987. "Cointegration and Error Correction: Representation, Estimation and Testing." Econometrica 55 (March): 251-76. Reprinted in Long-Run Economic Relations: Readings in Cointegration, ed. Robert F. Engle and Clive W. J. Granger. New York: Oxford University Press, 1991.

Fama, Eugene F. 1977. Foundations of finance: portfolio decisions and securities prices. Blackwell.

Favero, Carlo A. 2001. "Taylor Rules and the Term Structure." Working paper, IGIER Universita L. Bocconi (December).

Goodfriend, Marvin. 1993. "Monetary Policy Comes of Age: A 20th Century Odyssey." Federal Reserve Bank of Richmond Economic Quarterly 79 (Winter): 1-22.

_____. 2002. "The Phases of U.S. Monetary Policy: 1987 to 2001." Manuscript. Prepared for the Charles Goodhart Festschrift, Bank of England, November 2001, revised January 2002.

Hetzel, Robert L., and Ralph F. Leach. 2001. "The Treasury-Fed Accord: A New Narrative Account." Federal Reserve Bank of Richmond Economic Quarterly 87 (Winter): 57-64.

Hamilton, James D. 1994. Time Series Analysis. Princeton: Princeton University Press.

Horvath, Michael T. K., and Mark W. Watson. 1995. "Testing for Cointegration When Some of the Cointegrating Vectors are Known." Econometric Theory 11 (December): 952-84.

Ibbotson Associates. 2002. Stocks, Bonds, Bills, and Inflation 2002 Yearbook. Chicago: Ibbotson Associates.

Johansen, Soren. 1988. "Statistical Analysis of Cointegration Vectors." Journal of Economic Dynamics and Control 12 (June/Sept.): 231-54. Reprinted in Long-Run Economic Relations: Readings in Cointegration, ed. Robert F. Engle and Clive W. J. Granger. New York: Oxford University Press, 1991.

Judge, George G., W. E. Griffitsh, R. Carter Hill, Helmut Lutkepohl, and Tsoung-Chao Lee. 1985. "The Theory and Practice of Econometrics." New York: Wiley.

King, Robert G., Charles I. Plosser, James H. Stock, and Mark W. Watson. "Stochastic Trends and Economic Fluctuations." American Economic Review 81 (September): 819-40.

Kurmann, Andre. 2002a. "Maximum Likelihood Estimation of Dynamic Stochastic Theories with An Application to New Keynesian Pricing." Chapter 2, New Keynesian Price and Cost Dynamics: Theory and Evidence, Ph.D. diss., University of Virginia.

_____. 2002b. "Quantifying the Uncertainty about Theoretical Interest Rate Spreads." Working paper, University of Virginia.

Mankiw, N. Gregory, and Jeffrey A. Miron. 1986. "The Changing Behavior of the Term Structure of Interest Rates." Quarterly Journal of Economics 101 (May): 211-28.

Nelson, Charles R., and G. William Schwert. 1977. "Short-Term Interest Rates as Predictors of Inflation: On Testing the Hypothesis That the Real Rate of Interest Is Constant." American Economic Review 67 (June): 478-86.

Owens, Raymond E., and Roy H. Webb. 2001. "Using the Federal Funds Futures Market to Predict Monetary Policy Actions." Federal Reserve Bank of Richmond Economic Quarterly 87 (Spring): 69-77.

Roll, Richard. 1969. The Behavior of Interest Rates: An Application of the Efficient Market Model to U.S. Treasury Bills. New York: Basic Books.

Sargent, Thomas J. 1979. "A Note on Maximum Likelihood Estimation of the Rational Expectations Model of the Term Structure." Journal of Monetary Economics 5 (January): 133-43.

Shiller, Robert J. 1972. Rational Expectations and the Term Structure of Interest Rates. Ph.D. diss., M.I.T.

Stock, James H., and Mark W. Watson. 1988. "Testing for Common Trends." Journal of the American Statistical Association (December): 1097-107.

_____. 2002. "Has the Business Cycle Changed and Why?" Manuscript. Prepared for NBER Macroeconomics Annual (April).

Watson, Mark W. 1999. "Explaining the Increased Variability in Long-Term Interest Rates." Federal Reserve Bank of Richmond Economic Quarterly 85 (Fall): 71-96.

The authors would like to thank Michael Dotsey, Huberto Ennis, Pierre-Daniel Sarte, and Mark Watson for helpful comments. The views expressed in this article are those of the authors and do not necessarily reflect those of the Federal Reserve Bank of Richmond or the Federal Reserve System. Robert G. King: Professor of Economics, Boston University, and consultant to the Research Department of the Federal Reserve Bank of Richmond. Andre Kurmann: Department of Economics, University of Quebec at Montreal.