文章基本信息

标题：Should oil prices receive so much attention? An evaluation of the predictive power of oil prices for the U.S. economy.
作者：Bachmeier, Lance ; Li, Qi ; Liu, Dandan 等
期刊名称：Economic Inquiry
印刷版ISSN：0095-2583
出版年度：2008
期号：October
语种：English
出版社：Western Economic Association International
摘要：Oil prices are monitored by consumers, firms, financial market traders, and government officials and are the subject of much media coverage. This in large part reflects the view that higher oil prices tend to be followed by inflation and recessions. (1) Early work by Hamilton (1983) as well as recent papers by Hamilton (2003) and Lee and Ni (2002) have provided convincing evidence of a relationship between oil prices and future economic activity. (2) Much of the literature has focused on whether oil prices improve the fit of a benchmark model via examination of impulse response functions as done by Lee and Ni (2002) or hypothesis testing as done by Hamilton (2003). Guo and Kliesen (2005) discussed a number of theoretical explanations for the oil price-macroeconomy relationship. The textbook explanation is that higher oil prices represent an increase in production costs, causing a fall in output and higher inflation. Another strand of literature has argued that large oil price movements cause uncertainty about future oil prices or costly reallocation of labor, (3) causing large oil price movements to have asymmetric effects on the economy, even to the point that both positive and negative oil price movements hurt output. Along these lines, Hamilton (1996) motivated the construction of a "net oil price increase" (NOPI) variable by arguing that it is the behavior of oil prices relative to recent experience that matters. This implies that the only time oil shocks have an impact on gross domestic product (GDP) is when the price of oil is above the high of the previous year.
关键词：Inflation (Economics);Inflation (Finance);Macroeconomics;Petroleum industry;Recessions;United States economic conditions

Should oil prices receive so much attention? An evaluation of the predictive power of oil prices for the U.S. economy.

Bachmeier, Lance ; Li, Qi ; Liu, Dandan 等

I. INTRODUCTION

Oil prices are monitored by consumers, firms, financial market traders, and government officials and are the subject of much media coverage. This in large part reflects the view that higher oil prices tend to be followed by inflation and recessions. (1) Early work by Hamilton (1983) as well as recent papers by Hamilton (2003) and Lee and Ni (2002) have provided convincing evidence of a relationship between oil prices and future economic activity. (2) Much of the literature has focused on whether oil prices improve the fit of a benchmark model via examination of impulse response functions as done by Lee and Ni (2002) or hypothesis testing as done by Hamilton (2003). Guo and Kliesen (2005) discussed a number of theoretical explanations for the oil price-macroeconomy relationship. The textbook explanation is that higher oil prices represent an increase in production costs, causing a fall in output and higher inflation. Another strand of literature has argued that large oil price movements cause uncertainty about future oil prices or costly reallocation of labor, (3) causing large oil price movements to have asymmetric effects on the economy, even to the point that both positive and negative oil price movements hurt output. Along these lines, Hamilton (1996) motivated the construction of a "net oil price increase" (NOPI) variable by arguing that it is the behavior of oil prices relative to recent experience that matters. This implies that the only time oil shocks have an impact on gross domestic product (GDP) is when the price of oil is above the high of the previous year.

For many purposes, though, the relevant question is whether the information in oil prices can be used to improve forecasts of macroeconomic variables. Examples include Federal Reserve policymaking (using the price of oil as an "indicator variable"), private sector planning, and portfolio management. For these applications, it should be of interest to evaluate the historical out-of-sample forecast performance of models that include oil prices. Moreover, impulse response functions and hypothesis tests are not able to provide information about how much out-of-sample forecasts can be improved. (4)

This paper evaluates forecasts of different measures of output, inflation, and monetary policy, all of which are important macroeconomic variables that can plausibly be expected to respond to oil shocks, based on both economic theory and media reports. We apply standard out-of-sample predictive ability tests to data covering the period January 1986 to December 2004. This time period has seen substantial variation in the price of oil, from a low of $11 per barrel in late 1998 to a high of $53 per barrel in October 2004, and includes two Gulf Wars, as well as multiple boom-and-bust cycles. If oil prices have value as an indicator variable, it should clearly be present in our sample, as oil price fluctuations of this magnitude will explain a large proportion of the variance of the macroeconomic variables.

Our results suggest that there are few cases where models with oil prices are even able to improve upon the forecasts of simple autoregressive (AR) models. This finding is robust to changes in specification that allow for nonlinearities, the use of rolling estimation windows to account for possible parameter instability, the use of industry-level data, and changes in the forecast horizon. To ensure that our results are not due to low power of the predictive ability tests that we employ, we show that the potential forecasting gains from including oil prices are in most cases close to 0. We conclude that oil prices provide little information about the future direction of the economy.

The rest of the paper proceeds as follows. The next section describes our methodology, Section III describes the data, Section IV presents and discusses the results of tests for predictive ability of oil prices for the macroeconomic variables, and Section V concludes the paper.

II. METHODOLOGY

Denote by [x.sub.t] the value of a macroeconomic variable that we want to forecast and by [DELTA][o.sub.t] the percentage change in the nominal price of crude oil over period t. The null hypothesis to be tested is that [DELTA]o does not help to predict x, so that the mean-squared prediction error (MSE) for the model [x.sub.t] = g([x.sub.t-1], [x.sub.t-2] ..., [x.sub.t-p], [DELTA][o.sub.t-1], [DELTA][o.sub.t-2],..., [DELTA][o.sub.t-q]) is not less than the MSE for the model [x.sub.t] = f([x.sub.t-1], [x.sub.t-2], ..., [x.sub.t-p]). Let [u.sub.1t] = [x.sub.t] - f([x.sub.t-1], [x.sub.t-2], ..., [x.sub.t-p]) be the time t forecast error for the smaller model and [u.sub.2t] = [x.sub.t] - g([x.sub.t-1], [x.sub.t-2], ..., [x.sub.t-p], [DELTA][o.sub.t-1], [DELTA][o.sub.t-2], ..., [DELTA][o.sub.t-q]) be the time t forecast error for the model that includes [DELTA]o. The null hypothesis can then be stated as

[H.sub.0] : E([u.sup.2.sub.1t]) = E([u.sup.2.sub.2t]).

The alternative hypothesis is [H.sub.a] : E([u.sup.2t.sub.1t]) > E([u.sup.2.sub.2t]).

There are a total of T observations in the data set, P out-of-sample predictions are made, and the estimation sample is R = T - p. (5) Let [[??].sub.1t] and [[??].sub.2t] denote some consistent estimators for [u.sub.1t] and [U.sub.2t]. Then, we expect that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] under [H.sub.0]

and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [H.sub.a]

Several tests for equal predictive ability of nested models have been proposed, including Clark and McCracken (2001), McCracken (2007), and Clark and West (2007); see Corradi and Swanson (2006) and West (2006) for reviews of the literature and references on predictive ability testing. The first test we use is the Chao, Corradi, and Swanson (2001) (hereafter CCS) test. If both f(x) and g(x) are linear functions so that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], then [H.sub.0] being true implies that the expected value of the statistic

(1) [m.sub.p] = (1 / [square root of P]) [T.summation over (t = R+1)] [[??].sub.1t][W.sub.t-1]

is 0 (asymptotically), where [W.sub.t-1] = ([DELTA][o.sub.t-1], [DELTA][o.sub.t-2], ..., [DELTA][o.sub.t-1])'. (6)

A wealth of evidence has accumulated that, at least for output, it is important to relax assumptions about g(x) being a linear function. Two commonly used transformations of the oil price series are the NOPI introduced by Hamilton (1996) and the volatility of oil prices. The motivation for NOPI is that it is the price of oil relative to recent experience that matters most to consumers and producers. The motivation for volatility is that uncertainty about oil prices can cause consumers to postpone major purchases and firms to postpone investment. (7) Our second test is therefore that neither the NOPI variable nor the oil price volatility has marginal predictive content for any of the macroeconomic variables. NOPI is constructed as in Bernanke, Gertler, and Watson (1997) to be equal to the maximum of 0 and the percentage change over the highest price observed in the previous 12 mo. The availability of daily market prices for crude oil and the fact that oil prices exhibit substantial day-to-day fluctuations make the realized volatility of oil prices a natural measure of volatility (see, e.g. Andersen et al. 2003). (8) Because neither of these transformations require estimation, the CCS test statistic given by Equation (1) can be used directly, where [W.sub.t-1] a represents either lags of NOPI or realized volatility as opposed to lagged oil price changes.

The above CCS test relies on the assumption of a linear forecasting model under both [H.sub.0] and [H.sub.a]. To remedy this, we consider the integrated conditional moment test of Corradi and Swanson (2002, 2007), which we denote as CS. The version of the CS test that we use here maintains a linear forecasting model under [H.sub.0]. However, it allows for general nonlinear forecasting models under [H.sub.a]. For the case of a quadratic loss function, the CS test statistic is

(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

[Z.sub.t-1] = ([x.sub.t-1], [x.sub.t-2], ..., [x.sub.t-p], [DELTA][o.sub.t-1], [DELTA][o.sub.t-2], ..., [DELTA][o.sub.t-1])', [phi]([gamma]) is the density function for the uniform distribution, and [gamma] is a vector with p + q elements (the total number of variables in [Z.sub.t-1]). Under [H.sub.0] E([u.sup.2.sub.1t]) = E([u.sup.2.sub.2t]), we have E[[u.sub.1t]w([Z.sub.t-1], [gamma])] = 0 for all [gamma]. (9)

An important advantage of the CS test is that it is consistent against all nonlinear alternatives, without requiring the specification and estimation of an alternative nonlinear model. A different but more popular approach for testing whether a variable has predictive power is to compare the out-of-sample forecasts of a specific nonlinear model to those of a smaller (nested) linear model. Naturally, that approach can be more powerful if the researcher has the information about the form of the nonlinearity or if the focus is on evaluating the fit of a particular nonlinear model. The drawback is that the test results will depend on the nonlinear model that is chosen as the alternative. It is difficult in that framework to interpret a failure to reject the null hypothesis, because it could mean that [H.sub.0] is true but it could also mean that there is a problem with the estimation methodology and/or the specific choice of the (alternative) nonlinear model. Moreover, if a priori the researcher believes that there is more than one plausible nonlinear alternative, the test statistics need to be modified in order to control the size of the test. In contrast, the CS test is easy to implement, has been shown in Monte Carlo experiments by Corradi and Swanson (2007) to have good power in samples of the size that we have here, and is consistent against generic nonlinear alternatives. Therefore, we choose to use the CS test to deal with the case of a generic nonlinear forecasting model (under the alternative hypothesis).

For our last comparison, we estimate flexible nonlinear models using a fully nonparametric estimation approach. The functions f(x) and g(x) are both estimated by nonparametric kernel methods, based on the Nadaraya-Watson kernel estimator. In this case,

[u.sub.1t] = [x.sub.t] - [??]([x.sub.t-1], [x.sub.t-2], ..., [x.sub.t-p])

for the AR model and

[u.sub.2t] = [x.sub.t] - [??]([x.sub.t-1], [x.sub.t-1], ..., [x.sub.t-p], [DELTA][o.sub.t-1], ..., [DELTA][o.sub.t-q])

for the vector autoregressive (VAR) model with oil. Nonparametric models relax any assumption about functional forms, and as a result, they encompass all nonlinear models including threshold, smooth transition, and Markov switching models.

The one-step-ahead forecast of [x.sub.t + i] at time t using the nonparametric AR model is given by

(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [W.sub.s-1] = ([x.sub.s-1], [x.sub.s-2], ..., [x.sub.s-p]), h = ([h.sub.1], ..., [h.sub.p]) are the smoothing parameters and K(x) is the product Gaussian kernel function, that is, K(([W.sub.s-1] - [W.sub.t])/h) = [[PI].sup.p.sub.j = 1] k(([x.sub.s-j] - [x.sub.t+1-j])/[h.sub.j]). Similarly, the forecast of [x.sub.t+1] at time t using the nonparametric VAR model is given by

(4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [Z.sub.s-1] = ([x.sub.s-1], [x.sub.s-2], ..., [x.sub.s-p], [DELTA][o.sub.s-1], ... [DELTA][o.sub.s-q]), h denotes the p + q smoothing parameters (associated with the p + q components of [Z.sub.t-1]), and K(x) is again the product Gaussian kernel function. Given that the Schwarz information criterion (SIC) selects either one or two lags for the parametric models, we use a similar number of lags for estimation of nonparametric models by setting p = 2 and q = 1 in the nonparametric estimation.

A practical difficulty associated with kernel estimation is the choice of bandwidth (the vector h in Equations 3 and 4). The recursive nature of our analysis makes matters more difficult because h must be chosen many times. The bandwidth is chosen in this paper by a grid search over a set of plausible values of h and then setting h equal to the vector that minimizes the out-of-sample forecast MSE for the 50 observations prior to the time a forecast is made. For example, to make a one-step-ahead forecast of the consumer price index (CPI) inflation rate in January 1986, for which the models are estimated with data available through December 1985, we use the value of h that would have resulted in the lowest out-of-sample MSE over the period from January 1981 to December 1985. To forecast the CPI inflation rate in February 1986, we use the value of h that produced the lowest MSE over the period from February 1981 to January 1986. The bandwidth is thus chosen each time a forecast is made.

The above-cited predictive ability tests (such as CCS and CS) do not apply to nonparametric models. Fan and Li (1996) proposed a test comparing two nonparametric regression models but they only considered the case of in-sample testing. We report the ratio of the MSE for the two nonparametric regression models:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

where [[??].sub.1t] and [[??].sub.2t] are the nonparametric residuals from the null and the alternative models, respectively. Note that [lambda] < 1 corresponds to cases in which the out-of-sample forecasting MSE of the nonparametric VAR model (with lagged oil price changes as regressors) is smaller than that of the nonparametric AR model (without using oil prices).

All of the above tests use an estimated AR model as the benchmark. The one-step-ahead AR model forecasts are produced for each of the macroeconomic series for the period 1986-2004, so that P = 228 when x is observed monthly and P = 76 when x is observed quarterly. Following the recommendation of Inoue and Kilian (2006), in each case we select the lag length by the SIC before each forecast is made. Thus, consider as an example the producer price index (PPI) series, for which we have monthly data from January 1948 to December 2004. To make a forecast for January 1986, 12 AR models (i.e., AR(1), AR(2), ..., AR(12)) of PPI inflation were estimated using data from January 1948 to December 1985, the SIC was calculated for each model, and the model with the lowest associated SIC value was used to forecast PPI inflation for January 1986. To then make the forecast for February 1986, observations for January 1986 were added to the data set, the 12 AR models were reestimated, and the model with the lowest SIC was used to forecast PPI inflation for February 1986. This process was repeated until 228 one-step-ahead out-of-sample forecasts of PPI inflation were produced for the period January 1986 to December 2004. Given the importance of lag selection for our exercise, with the SIC selecting an average of one lag for GDP up to an average of seven lags for the 1-Mo Treasury bill rate, we have also fixed the lag length to 12 mo for monthly data and four quarters for quarterly data, with no effect on our results. The error series [[??].sub.1t] is simply the difference between actual and forecast values of PPI inflation at each point in time.

III. DATA

The crude oil data come from two sources. Monthly data on West Texas Intermediate (WTI) for 1946-2004 were downloaded from the Federal Reserve Economic Database (FRED) provided by the Federal Reserve Bank of St. Louis. Daily WTI data for 1986-2004, used to construct the realized volatility of oil prices, were downloaded from the Web site of the Energy Information Administration. (10) Throughout the entire analysis, [DELTA][o.sub.t] denotes the first difference of the logarithm of the monthly WTI series. Our results are almost identical when the real price of oil (price of WTI deflated by the CPI) is used instead of the nominal oil price.

The macroeconomic series, [x.sub.t], includes both aggregate and disaggregated data. The aggregate series comes from three different groups, all downloaded from FRED and described in Table 1: (1) price data, including the CPI excluding energy and the PPI; (2) measures of aggregate output, including GDP, industrial production, and the unemployment rate; and (3) financial and monetary policy variables, including the federal funds rate, 1- and 10-yr interest rates, the term spread (the difference between 10- and 1-yr interest rates), the default spread (the difference between AAA and BAA bond yields), real M2 balances, and the velocity of M2 with respect to GDP. This set of variables is sufficiently broad to include most of the theoretically plausible oil price-macroeconomic relationships. The analysis proceeds using quarterly data when [x.sub.t] is equal to GDP or M2 velocity, while in all other cases, monthly data are used.

The disaggregated data are monthly industry-level industrial production figures for 11 industries: primary metals, fabricated metals, machinery, electrical equipment, aerospace, furniture, apparel, paper, chemicals, and plastics and rubber--reported in Federal Reserve Statistical Release G.17. Lee and Ni (2002) provided evidence of an in-sample relationship between production in many of these industries and the price of oil over the period 1959-1997. We take the first difference of the natural logarithm of the CPI, PPI, GDP, industrial production, real M2 balances, and all industry-level industrial production series. The other variables enter the forecasting models in levels.

IV. EMPIRICAL RESULTS

This section discusses results of the analysis described above. We begin with the benchmark results, which are based on one-step-ahead forecasts of the aggregate macroeconomic variables. We then check the robustness of our conclusions along several dimensions, including the use of industry-level data, longer forecast horizons, and rolling data windows to control for the possibility of parameter instability. In each case, we see that our conclusions are largely unaffected by changes in the methodology.

A. Benchmark Results

Table 2 reports results of the tests described above for each of the 12 macroeconomic series. In the first column for each test are p values for the CCS test of the null hypothesis of no (linear) predictability of oil prices for the given macroeconomic variable. The linear oil shock measure is similar to the oil shock in the classic paper by Hamilton (1983). At a 5% level, we reject the null hypothesis for only one series, PPI inflation. This result is to be expected because oil prices enter the PPI directly albeit as just one component of finished energy goods. Energy goods in December 2004 had a relative importance of 17% of finished goods in the construction of the PPI. (11) The important question is whether oil prices contain information about future movements in the nonenergy sectors of the economy. Interestingly, there is no evidence that oil prices help to forecast CPI inflation once energy prices are removed, with a p value for the CPI excluding energy of .89.

A limitation of any statistical test is that it is only informative about statistical significance, unable to say anything about the economic significance of a particular relationship. There is no reason to believe that the CCS test lacks power for our application--see Chao, Corradi, and Swanson (2001) for simulation evidence on the power of the CCS test. Nonetheless, to ensure that the benchmark results are not driven by the use of a test with low power, the second column for each test in Table 2 reports the [R.sup.2] from a regression of tilt on [Z.sub.t-1]. (12) We interpret the [R.sup.2] as the potential improvement on the AR forecast from using the information in oil prices. Under the null hypothesis, the [R.sup.2] of this regression will be close to 0. The [R.sup.2] for PPI inflation is 11%, which is approximately equal to the relative importance of energy goods in the construction of the PPI. For most of the other variables, the [R.sup.2] is essentially 0, with the largest (the default spread) being 6%. Failure of the CCS test to reject is therefore not surprising. In-sample [R.sup.2] calculations, available upon request, show the same pattern: the [R.sup.2] for PPI inflation goes from 5% to 18% when oil prices are added to the model, while the [R.sup.2] for the other variables changes very little after adding oil prices. There is clearly no exploitable linear relationship between oil prices and output (GDP growth, industrial production growth, or the unemployment rate), and the case for forecasting variables related to current and future monetary policy is similarly weak. In summary, our findings for the benchmark comparison are due to the inability of oil price movements to improve upon simple AR forecasts, not our choice of test statistic. (13,14)

B. Alternative Specifications

As discussed earlier, the recent literature has moved away from the linear oil shock measure used by Hamilton (1983). It is therefore natural to ask whether the findings of the previous section are robust to changes in specification. Columns 4 and 5 of Table 2 are p values for the CCS test of whether the NOPI helps to predict the macroeconomic variables and the corresponding [R.sup.2] of a regression of [[??].sub.t-1] on [Z.sub.t-1]. The [R.sup.2] for PPI inflation--approximately 15%--is a little stronger than that for the benchmark comparison, but once again, this should be expected because oil prices enter the PPI directly. It should be noted that the NOPI series has far less variability than the first difference of oil prices because most observations are 0, which may make it difficult for the CCS test to reject. Nonetheless, the largest [R.sup.2] for any of the other series is 7% and in most cases is very close to 0. The CCS test and the [R.sup.2] are largely in agreement, so that forecasts of output and the different measures of current and future monetary policy are not likely to be improved upon by conditioning on lagged NOPI.

The results for realized volatility of oil prices uncover our first evidence of monetary policy (through the federal funds rate) reacting to oil shocks. Forecasts of GDP and the CPI can also potentially be improved by controlling for" oil price volatility. One interpretation is that, as suggested by Bernanke, Gertler, and Watson (1997), the Federal Reserve attempts to influence CPI inflation and GDP growth by changing its federal funds rate target in response to oil price shocks. Whatever the reason, forecasting models of oil volatility and the CPI, GDP, and the federal funds rate may be worthy of further analysis. It should be noted that the [R.sup.2] for CPI inflation is low (approximately 5%).

Column 8 of Table 2 gives p values of the CS test for nonlinear predictability. CS is a generalized version of the CCS test. It allows the researcher to choose a generic weight function w(*) which is nonlinear in [Z.sub.t-1] so that the test has improved power against many nonlinear alternative models. The results of this test are basically the same as for the CCS tests. Only in the case of PPI inflation does the CS test reject, suggesting that oil price may only help to forecast PPI inflation; the CS test apparently detects the same relationship as the CCS test.

The last column in Table 2 is the relative mean-squared error of a fully nonparametric VAR model, with oil prices, against a nonparametric AR model. Details on the construction of these forecasts are discussed in Section II. The nonparametric model does not nest the previous two nonlinear specifications, NOPI and realized volatility, because the right-hand-side variables are different. NOPI compares the present month's oil price with the highest oil price observed in the previous year, while the realized volatility series is constructed using daily oil price data. The right-hand-side variables for the nonparametric forecasting models do not include either the highest oil price of the previous year or the daily observations on oil prices. (15) The advantage of the nonparametric model is that it allows for any type of nonlinearity in the relationship between the macroeconomic variables and the lagged oil price changes. The results indicate that there is very little to be gained from a nonparametric specification. For 10 of the 12 cases, the relative MSE is greater than 1, meaning that the model with oil prices does worse than the AR model, and for the other two cases, the relative MSE is close to 1. The distribution of predictive ability test statistics, such as the CCS test or DM test, has not been derived for nonparametric estimation models, but formal tests are really not necessary given that the relative MSE is seldom below 1.

We can summarize our findings as follows. The only variable for which forecasts are consistently improved upon by controlling for oil price fluctuations is PPI inflation. This is not surprising, as oil prices enter the PPI directly, causing oil prices and the PPI to be related by construction. Adding oil prices as additional predictors, we do not find any improvement in forecasts of either CPI inflation or various measures of output and monetary policy. Our findings are robust when oil price changes are replaced by either the NOPI or the oil price volatility, with a few notable exceptions for oil price volatility, which suggests a direction for future research. We emphasize that because our results are based on the application of the CCS test, we have not relied on a specific alternative model (such as a potentially over-parameterized linear VAR model or incorrectly specified parametric nonlinear model) and do not rely on a particular estimation strategy (such as ordinary least squares estimation with no restrictions on the parameters). The Federal Reserve, for instance, often relies on structural model forecasts, a point made clearly by Meyer (1997). Additionally, our nonparametric estimation results do not impose any parametric regression functional forms under either the null or the alternative hypothesis.

C. Changes in Methodology

We now investigate the robustness of our results to changes in the methodology. We first consider the use of disaggregated rather than aggregate data. This is motivated by Lee and Ni (2002) to avoid the possibility that oil shocks affect some sectors of the economy but that using aggregate data makes it difficult to uncover those effects. Additionally, stock traders are often interested in the performance of individual sectors of the economy rather than the aggregate economy. Table 3 replicates Table 2 for industry-level industrial production growth in 11 sectors that might plausibly be expected to be affected by oil shocks. As before, the CCS test finds no evidence that the linear oil shock measure has predictive power for output in any of the industries, and the largest [R.sup.2] is less than 4%. The MSE for a VAR model including oil prices relative to the MSE of an AR model is greater than 1 in all cases, so that the model with oil prices never predicts better. (16) This does not mean that our results conflict with Lee and Ni (2002). Our interest is in out-of-sample forecast performance, whereas the goal of their paper was to characterize the relationship between oil prices and industry output. Many factors can explain the poor historical forecast performance of models with oil without suggesting any inadequacy in the specification of Lee and Ni (2002).

For the NOPI and realized volatility oil shock measures, the results are marginally more encouraging. In fact, for machinery and furniture, the [R.sup.2] is more than 10% even if the CCS test is unable to reject the null hypothesis of no predictive ability for either variable. There is also some evidence that NOPI shocks improve predictions of output growth in the plastics/rubber and machinery industries. The last two columns of Table 3 show little to be gained from relaxing assumptions about functional form. The CS test rejects for only one series, and the MSE of the nonparametric VAR model is not less than that of the AR model for any of the industries. Overall, there is little evidence in Table 3 that overturns our conclusion that oil shocks are not a useful predictor of aggregate economic variables. Oil prices only have predictive power in a few special cases, and the predictive power comes from a nonlinear transformation of the oil price series, which is much different from the way oil prices typically enter into policy discussions or reports in the business press.

Another potential explanation is that we have to this point considered only one-step ahead forecasts. While this is the leading case, in practice it may take time for changes in oil to have nontrivial effects on output, inflation, or monetary policy (although to some extent we do capture this with our forecasts of long-term interest rates and the term spread). To investigate the importance of the forecast horizon, we have done the CCS and CS tests for 6-, 12-, and 24-mo-ahead forecasts, which are horizons that are potentially of greater interest. We use an "h-step-ahead" forecasting procedure, as defined by Marcellino, Stock, and Watson (2006), with the estimation equation for the AR model being

[x.sub.t] = [[beta].sub.0] + [h+p+1.summation over (i=1)] [[beta].sub.i][x.sub.t-i] + [[epsilon].sub.t],

where h is the forecast horizon and p is the lag length. In other words, [x.sub.t] is regressed on information available h periods earlier, and forecasts are calculated by plugging time t information into the estimated equation. As before, following the suggestion of Inoue and Kilian (2006), the lag length p is chosen by the SIC prior to making each forecast. These results, reported in Table 4, are consistent with our previous results.

Finally, we consider the possibility that the results in Table 2 are due to parameter instability. Stock and Watson (1996) have highlighted problems with instability in macroeconomic relationships. Numerous other papers have shown the problems instability can cause for out-of-sample forecast performance. (17) Many of the papers that have used nonlinear transformations of oil prices, including Ferderer (1996), Hamilton (1996, 2003), and Lee, Ni, and Ratti (1995), were motivated by claims of instability in the oil price-GDP relationship. To some extent, changes in the oil price-macroeconomy relationship will not cause problems for the CCS test, as it requires estimation of only the benchmark AR model. Nonetheless, to allow for the possibility that parameter instability is driving our results, we compute forecasts for the AR model with parameters estimated using rolling 10-yr data windows, along the lines of Swanson (1998), and compare these forecasts to the AR model that serves as our benchmark in Table 2. Thus, rather than making forecasts using all data that would have been available at the time the forecast was made, we use only the most recent 10 yr of data.

The use of rolling estimation windows does not change our findings. In only 9 out of 48 comparisons does the use of rolling estimation windows lead to slightly improved forecasts. 18 In only one comparison does the use of a rolling estimation window lead to a reduction in MSE of 10% or more. Similar results hold true for industry-level data. We conclude that our results are not driven by parameter instability. (19)

V. CONCLUDING REMARKS

The goal of this paper has been to quantify whether and how much predictions of macroeconomic variables can be improved by including information about oil price movements. We have examined a broad range of variables over a time period for which there has been significant variation in oil prices and find few examples where oil price fluctuations are useful predictors of inflation, output, or monetary policy. The only strong relationships appear to involve oil price volatility and either CPI inflation or GDP growth. For most variables, we cannot reject the hypothesis that oil prices have no predictability. We conclude from this that oil prices are generally not informative about the future direction of the economy.

We conclude that it is hard to justify much of the attention given to oil price movements. It is easy to understand why high gas prices, which are driven by oil prices, are unpopular with consumers. It is also easy to understand why firms in industries directly dependent on oil as an input should be concerned with spikes in the price of oil, as that will result in higher final goods prices or lower profits. On the other hand, it is not obvious why the Federal Reserve or firms in other industries should change their behavior in response to oil shocks.

This does not contradict the conclusions of Hamilton (2003) or Lee and Ni (2002), because these papers were interested in the in-sample fit of models that include oil prices, whereas we are interested in evaluating out-of-sample forecasts. Interesting discussions of the difference between in-sample model evaluation and out-of-sample forecast performance can be found in Campbell and Thompson (2005), Inoue and Kilian (2006), and Kilian and Taylor (2003), among others. As these authors point out, there are reasons that a simple benchmark model can be expected to forecast as well as a correctly specified economic model. In cases where a model will be used for forecasting, it may be best to directly evaluate the recent historical forecast performance of that model.

We have primarily relied on the out-of-sample predictive ability tests of Chao, Corradi, and Swanson (2001) and Corradi and Swanson (2002, 2007), so that the failure of oil as a predictor cannot be attributed to arbitrary choices for the specification and estimation of the alternative models (i.e., those that include oil prices). Similarly, unmodeled nonlinearities cannot account for our failure to find predictability, both because the CS test is consistent against generic nonlinear alternatives and because forecast MSE calculations from robust nonparametric regression models and two parametric nonlinear models show no gain from including oil prices. The results of our benchmark comparisons are basically unaffected by the use of rolling estimation windows to account for parameter instability, the use of disaggregated data, or the changes in the forecast horizon. Finally, while the results of the predictive ability tests used here may (as with any test) be the result of low power against certain alternatives, we report measures of economic significance ([R.sup.2]) and find little to be gained even in a best-case scenario. It goes without saying that our findings cannot be driven by a lack of variation in the oil price series.

ABBREVIATIONS

AR: Autoregressive

CCS: Chao, Corradi, and Swanson (2001)

CPI: Consumer Price Index

CS: Corradi and Swanson (2007)

DM: Diebold and Mariano (1995)

FRED: Federal Reserve Economic Database

GDP: Gross Domestic Product

MSE: Mean-Squared Prediction Error

NOPI: Net Oil Price Increase

PPI: Producer Price Index

SIC: Schwarz Information Criterion

VAR: Vector Autoregressive

WTI: West Texas Intermediate

REFERENCES

Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys. "Modeling and Forecasting Realized Volatility." Econometrica, 71, 2003, 529-626.

Barsky, R. B., and L. Kilian. "Do We Really Know that Oil Caused the Great Stagflation? A Monetary Alternative," in NBER Maeroeconomics Annual 2001, edited by B. S. Bernanke and K. Rogoff. Cambridge, MA: MIT Press, 2002, 137-83.

--. "Oil and the Macroeconomy Since the 1970's." Journal of Economic Perspectives, 18(4), 2004, 115-34.

Bernanke, B. S. "Irreversibility, Uncertainty, and Cyclical Investment." Quarterly Journal of Economics, 98, 1983, 85-106.

Bernanke, B. S., M. Gertler, and M. Watson. "Systematic Monetary Policy and the Effects of Oil Price Shocks." Brookings Papers on Economic Activity, 1997, 1997, 91-142.

Campbell, J. Y., and S. B. Thompson. "Predicting the Equity Premium Out of Sample: Can Anything Beat the Historical Average?" Review of Financial Studies, 2005, forthcoming.

Chao, J., V. Corradi, and N. R. Swanson. "An Out of Sample Test for Granger Causality." Macroeconomic Dynamics, 5, 2001, 598-620.

Clarida, R., J. Gali, and M. Gertler. "Monetary Policy Rules and Macroeconomic Stability: Evidence and Some Theory." Quarterly Journal of Economies, 115, 2000, 147-80.

Clark, T. E., and M. W. McCracken. "Tests of Equal Forecast Accuracy and Encompassing for Nested Models." Journal of Econometrics, 105, 2001, 85-110.

--. "The Power of Tests of Predictive Ability in the Presence of Structural Breaks." Journal of Eeonometrics, 124, 2004, 1-31.

--. "The Predictive Content of the Output Gap for Inflation: Resolving In-Sample and Out-of-Sample Evidence." Journal of Money, Credit, and Banking, 2006, 1127-48.

Clark, T. E., and K. W. West. "Approximately Normal Tests for Equal Predictive Accuracy in Nested Models." Journal of Econometrics, 138, 2007, 291-311.

Corradi, V., and N. R. Swanson. "A Consistent Test for Nonlinear Out of Sample Predictive Accuracy." Journal of Econometrics, 110, 2002, 353-81.

--. "Predictive Density Evaluation," in Handbook of Economic Forecasting, edited by C. W. J. Granger, G. Elliott, and A. Timmerman. Amsterdam: Elsevier, 2006, 197-284.

--. "Nonparametric Bootstrap Procedures for Predictive Inference Based on Recursive Estimation Schemes." International Economic Review, 48, 2007, 67-109.

Diebold, F. X., and R. S. Mariano. "Comparing Predictive Accuracy." Journal of Business and Economic Statistics, 13, 1995, 253-63.

Fan, Y., and Q. Li. "Consistent Model Specification Tests: Omitted Variables and Semiparametric Functional Forms." Eeonometrica, 64, 1996, 865-90.

Ferderer, J. P. "Oil Price Volatility and the Macroeconomy." Journal of Macroeconomies, 18, 1996, 1-26.

Guo, H., and K. L. Kliesen. "Oil Price Volatility and U.S. Macroeconomic Activity." Federal Reserve Bank of St. Louis Review, 87(6), 2005, 669-83.

Hamilton, J. D. "Oil and the Macroeconomy Since World War II." Journal of Political Economy, 91, 1983, 228-48.

--. "A Neoclassical Model of Unemployment and the Business Cycle." Journal of Political Economy, 96, 1988, 593-617.

"This is What Happened to the Oil Price-Macroeconomy Relationship." Journal of Monetary Economics, 38, 1996, 215-20.

--. "What is an Oil Shock?" Journal of Econometrics, 113, 2003, 363-98.

Inoue, A., and L. Kilian. "In-Sample or Out-of-Sample Tests of Predictability: Which One Should We Use?" Econometric Reviews, 23, 2004, 371-402.

--. "On the Selection of Forecasting Models." Journal of Econometrics, 130, 2006, 273-306.

Jones, C. M., and G. Kaul. "Oil and the Stock Markets." Journal of Finance, 51, 1996, 463-91.

Kilian, L. "A Comparison of the Effects of Exogenous Oil Supply Shocks on Output and Inflation in the G7 Countries." Journal of the European Economic Association, 2006a, forthcoming.

--. "Exogenous Oil Supply Shocks: How Big Are They and How Much Do They Matter for the U.S. Economy?" Review of Economics and Statistics, 2006b, forthcoming.

Kilian, L., and M. P. Taylor. "Why is it so Difficult to Beat the Random Walk Forecast of Exchange Rates?" Journal of International Economics, 60, 2003, 85-107.

Leduc, S., and K. Sill. "A Quantitative Analysis of Oil-Price Shocks, Systematic Monetary Policy, and Economic Downturns." Journal of Monetary Economics, 51, 2004, 781-808.

Lee, K., and S. Ni. "On the Dynamic Effects of Oil Price Shocks: A Study Using Industry Level Data." Journal of Monetary Economics, 49, 2002, 823-52.

Lee, K., S. Ni, and R. A. Ratti. "Oil Shocks and the Macroeconomy: The Role of Price Variability." Energy Journal, 16, 1995, 39-56.

Marcellino, M., J. H. Stock, and M. W. Watson. "A Comparison of Direct and Iterated Multistep AR Methods for Forecasting Macroeconomic Time Series." Journal of Econometrics, 135, 2006, 499-526.

McCracken, M. W. "Asymptotics for Out of Sample Tests of Granger Causality." Journal of Econometrics, 2007, 140, 719-52.

Meyer, L. H. "The Role for Structural Macroeconomic Models." AEA Panel on Monetary and Fiscal Policy, New Orleans, LA, January 5, 1997. Accessed December 2005 http://www.federalreserve.gov/ BOARDDOCS/SPEECHES/19970105.htm.

Stock, J. H., and M. W. Watson. "Evidence on Structural Instability in Macroeconomic Time Series Relations." Journal of Business and Economic Statistics, 14, 1996, 11-30.

Swanson, N. R. "Money and Output Viewed Through a Rolling Window." Journal of Monetary Economics, 41, 1998, 455-74.

Wei, C. "Energy, the Stock Market and the Putty-Clay Investment Model." American Economic Review, 93, 2003, 311-24.

West, K. W. "Asymptotic Inference about Predictive Ability." Econometrica, 1996, 64, 1067-84.

--. "Forecast Evaluation," in Handbook of Economic Forecasting, edited by C. W. J. Granger, G. Elliott, and A. Timmerman. Amsterdam: Elsevier, 2006, 100-34.

(1.) As one example, the 2005 G8 summit documents include a statement on the global economy and oil, which at one point states, "We agreed that secure, reliable and affordable energy sources are fundamental to economic stability and development." (p. 2; the full text of the document can be downloaded at http://www.fco.gov.uk/Files/kfile/ PostGS_Gleneagles_GlobalEconomy.pdf). See Hamilton (2003) and Barsky and Kilian (2000) for extensive reviews of the literature testing for a relationship between oil prices and output.

(2.) See Barsky and Kilian (2002, 2004), Bernanke, Gertler, and Watson (1997), Clarida, Gali, and Gertler (2000), Jones and Kaul (1996), Leduc and Sill (2004), Kilian (2006a, 2006b), and Wei (2003), among others, for alternative interpretations of the evidence on the effects of oil shocks.

(3.) See, for example, Ferderer (1996) and Hamilton (1988).

(4.) Inoue and Kilian (2004) discussed issues in choosing between in-sample and out-of-sample model evaluation.

(5.) This notation is standard in the literature on forecast evaluation. See, for example, West (1996).

(6.) Chao, Corradi, and Swanson (2001) derived the distribution of [m.sub.p] and provided the conditions necessary for [m.sub.p] to have a [chi square] distribution under [H.sub.0]. Failure to reject Ho implies that the oil price series does not have statistically significant marginal predictive content for the given macroeconomic variable.

(7.) See, for example, Bernanke (1983), Lee, Ni, and Ratti (1995), and Ferderer (1996).

(8.) Realized volatility is constructed as the variance of oil prices over month t: [RV.sub.t] = [[summation].sup.k.sub.i=1][([DELTA][o.sub.1] - [[mu].sub.t]).sup.2]/k, where k is the number of days in month t and [[mu].sub.t], is the mean oil price change in month t.

(9.) To calculate the sample value of Equation (2), 2,000 values of [gamma] were randomly selected from a uniform distribution over [0,5], [m.sub.p]([gamma]) was calculated for each of the chosen [gamma], and the test statistic [M.sub.p] was set equal to the average of the [m.sub.p]([gamma]). If w(x) is a generically comprehensive function, the CS test is consistent against generic nonlinear alternatives. Following Corradi and Swanson (2007), we set [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [[gamma].sub.i] and [z.sub.i,t] are the ith elements of [gamma] and [Z.sub.t], respectively, [[bar.z].sub.i] is the sample mean of [z.sub.i,t], and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], is the sample standard deviation of [z.sub.i,t] (over t). Critical values are calculated using the block bootstrap methodology of Corradi and Swanson (2007).

(10.) We use two different sources for the WTI data because the FRED data are only available at a monthly frequency, while the daily Energy Information Agency data begin in the mid-1980s.

(11.) http://www.bls.gov/news.release/ppi.t01.htm, Bureau of Labor Statistics, PPI News Release Table 1. Producer price indexes and percent changes by stage of processing.

(12.) The notation for the CCS test is described below Equation (1).

(13.) As is common practice in the forecasting literature, we have also compared the MSE of an unrestricted VAR model to the MSE of the AR model. The MSE of the VAR model is generally close to or greater than the MSE of the AR model, suggesting that there is no gain in forecast accuracy from adding lagged oil prices to the AR model. These results are not reported in order to save space but are available from the authors upon request.

(14.) It is worth noting that our main findings would be strengthened if an adjustment were made to control for the fact that we are repeatedly applying a classical test for a large number of variables. Such an adjustment would make it more difficult to reject the null hypothesis of no predictive ability.

(15.) It is in principle straightforward to estimate a nonparametric model that nests NOPI and realized volatility but as a practical matter that would require the inclusion of either nonstationary variables (NOPI) or so many variables that the curse of dimensionality would make estimation imprecise.

(16.) These results are available upon request.

(17.) See Clark and McCracken (2006) for discussion and references.

(18.) There are 12 macroeconomic variables (those in Table 1) and four forecast horizons (1, 6, 12, and 24 too). The results of these comparisons are available upon request.

(19.) Note that the CCS test and CS test are not designed for comparison of forecasts when the only difference is that one model uses more observations for estimation.

LANCE BACHMEIER, QI LI and DANDAN LIU *

* We wish to thank John Chao, Lutz Kilian, Norm Swanson, an anonymous referee, and the editor, Dennis Jansen, for many helpful comments. We are especially grateful to Phil Rothman for suggesting this topic.

Bachmeier: Assistant Professor, Department of Economics, Kansas State University, Manhattan, KS 66506. Phone 785-532-4578, Fax 785-532-6919, E-mail lanceb@ksu.edu

Li: Professor, Department of Economics, Texas A&M University, College Station, TX 77843-4228. Phone 979-845-9954, Fax 979-847-8757, E-mail qi@ econmail.tamu.edu; and Department of Economics, Tsinghua University, Beijing 100084, P.R. China.

Liu. Assistant Professor, Department of Economics, Bowling Green State University, Bowling Green, OH 43403-0001. Phone 419-372-4879, Fax 419-372-1557, E-mail dliu@bgsu.edu

TABLE 1 Variable Descriptions

Variable Sample Start Date Frequency Transformation

CPI excluding energy January 1959 Monthly Log difference
PPI January 1948 Monthly Log difference
GDP January 1949 Quarterly Log difference
Industrial production January 1948 Monthly Log difference
Unemployment rate January 1950 Monthly None
Federal funds rate January 1956 Monthly None
1-yr T-Bond January 1955 Monthly None
10-yr T-Bond January 1955 Monthly None
Term spread January 1955 Monthly None
Default spread January 1948 Monthly None
Real M2 January 1950 Monthly Log difference
M2 velocity January 1961 Quarterly None

TABLE 2 One-Step-Ahead Forecasts

 [DELTA]WTI NOPI

Variable CCS [R.sup.2] CCS [R.sup.2]

CPI (excluding energy) 0.89 0.006 0.21 0.029
PPI 0.03 0.110 0.28 0.145
GDP 0.77 0.001 0.20 0.072
Industrial production 0.20 0.007 0.34 0.009
Unemployment rate 0.89 0.006 0.48 0.007
Federal funds rate 0.18 0.021 0.45 0.001
1-yr T-Bond 0.26 0.029 0.58 0.005
10-yr T-Bond 0.75 0.010 0.48 0.009
Term spread 0.52 0.008 0.83 0.002
Default spread 0.06 0.064 0.68 0.011
Real M2 0.18 0.003 0.03 0.001
M2 velocity 0.35 0.014 0.84 0.001

 Kernel
 Volatility [DELTA]W TI
 MSE (VAR)/
Variable CCS [R.sup.2] CS MSE (AR)

CPI (excluding energy) 0.00 0.052 1.00 0.95
PPI 0.16 0.061 0.03 1.00
GDP 0.01 0.105 0.55 1.27
Industrial production 0.23 0.024 0.36 1.05
Unemployment rate 0.27 0.008 1.00 1.00
Federal funds rate 0.04 0.099 1.00 1.11
1-yr T-Bond 0.16 0.029 1.00 1.11
10-yr T-Bond 0.29 0.012 1.00 1.04
Term spread 0.10 0.010 0.06 1.01
Default spread 0.96 0.001 1.00 1.04
Real M2 0.36 0.001 0.25 1.01
M2 velocity 0.71 0.013 0.29 0.98

Notes: Numbers under the CCS and CS (tests) are the p values. Boldface
numbers emphasize that the p values are less than 5%.

TABLE 3 Industry-Level One-Step-Ahead Forecasts

 [DELTA]WTI NOPI

Variable CCS [R.sup.2] CCS [R.sup.2]

Primary metals 0.71 0.002 0.29 0.010
Fabricated metals 0.12 0.012 0.35 0.008
Machinery 0.92 0.000 0.01 0.040
Electrical equipment 0.89 0.017 0.39 0.006
Motor vehicles 0.95 0.005 0.95 0.005
Aerospace 0.56 0.003 0.79 0.002
Furniture 0.77 0.004 0.43 0.060
Apparel 0.85 0.035 0.19 0.024
Paper 0.84 0.006 0.06 0.025
Chemicals 0.69 0.001 0.06 0.035
Plastics and rubber 0.16 0.013 0.03 0.075

 Kernel
 Volatility [DELTA]WTI
 VISE (VAR)/
Variable CCS [R.sup.2] CS MSE (AR)

Primary metals 0.86 0.016 0.38 1.16
Fabricated metals 0.55 0.045 0.29 1.15
Machinery 0.11 0.138 0.99 1.07
Electrical equipment 0.77 0.001 0.03 1.09
Motor vehicles 0.35 0.017 0.06 1.08
Aerospace 0.90 0.001 1.00 1.03
Furniture 0.15 0.122 0.56 1.15
Apparel 0.08 0.019 0.62 1.15
Paper 0.04 0.033 1.00 1.13
Chemicals 0.85 0.005 1.00 1.26
Plastics and rubber 0.79 0.026 0.89 1.18

Notes: Same notes as in Table 2 apply here.

TABLE 4 Multi-Step Forecasts

 h = 6 mo

Variable CCS [R.sup.2] CS

CPI (excluding energy) 0.82 0.009 1.00
PPI 0.16 0.035 0.07
GDP 0.76 0.001 0.75
Industrial production 0.95 0.000 0.79
Unemployment rate 0.94 0.002 1.00
Federal funds rate 0.53 0.003 0.89
1-yr T-Bond 0.54 0.003 1.00
10-yr T-Bond 0.89 0.000 0.72
Term spread 0.39 0.004 0.02
Default spread 0.99 0.000 0.97
Real M2 0.76 0.000 0.69
M2 velocity 0.95 0.001 0.18

 h = 12 mo

Variable CCS [R.sup.2] CS

CPI (excluding energy) 0.03 0.022 1.00
PPI 0.52 0.001 0.91
GDP 0.82 0.001 0.46
Industrial production 0.58 0.001 1.00
Unemployment rate 0.93 0.002 1.00
Federal funds rate 0.43 0.004 0.84
1-yr T-Bond 0.67 0.001 0.68
10-yr T-Bond 0.80 0.000 0.58
Term spread 0.65 0.001 0.00
Default spread 0.12 0.013 0.49
Real M2 0.48 0.001 0.59
M2 velocity 0.44 0.005 0.46

 h = 24 mo

Variable CCS [R.sup.2] CS

CPI (excluding energy) 0.54 0.002 1.00
PPI 0.09 0.010 0.79
GDP 0.39 0.012 0.50
Industrial production 0.17 0.007 0.59
Unemployment rate 0.41 0.005 0.78
Federal funds rate 0.74 0.001 0.53
1-yr T-Bond 0.57 0.002 0.33
10-yr T-Bond 0.68 0.001 0.25
Term spread 0.83 0.000 0.00
Default spread 0.89 0.000 0.42
Real M2 0.21 0.003 0.12
M2 velocity 0.78 0.007 0.11

Notes: Same notes as in Table 2 apply here.