Should oil prices receive so much attention? An evaluation of the predictive power of oil prices for the U.S. economy.
Bachmeier, Lance ; Li, Qi ; Liu, Dandan 等
I. INTRODUCTION
Oil prices are monitored by consumers, firms, financial market
traders, and government officials and are the subject of much media
coverage. This in large part reflects the view that higher oil prices
tend to be followed by inflation and recessions. (1) Early work by
Hamilton (1983) as well as recent papers by Hamilton (2003) and Lee and
Ni (2002) have provided convincing evidence of a relationship between
oil prices and future economic activity. (2) Much of the literature has
focused on whether oil prices improve the fit of a benchmark model via
examination of impulse response functions as done by Lee and Ni (2002)
or hypothesis testing as done by Hamilton (2003). Guo and Kliesen (2005)
discussed a number of theoretical explanations for the oil
price-macroeconomy relationship. The textbook explanation is that higher
oil prices represent an increase in production costs, causing a fall in
output and higher inflation. Another strand of literature has argued
that large oil price movements cause uncertainty about future oil prices
or costly reallocation of labor, (3) causing large oil price movements
to have asymmetric effects on the economy, even to the point that both
positive and negative oil price movements hurt output. Along these
lines, Hamilton (1996) motivated the construction of a "net oil
price increase" (NOPI) variable by arguing that it is the behavior
of oil prices relative to recent experience that matters. This implies
that the only time oil shocks have an impact on gross domestic product
(GDP) is when the price of oil is above the high of the previous year.
For many purposes, though, the relevant question is whether the
information in oil prices can be used to improve forecasts of
macroeconomic variables. Examples include Federal Reserve policymaking (using the price of oil as an "indicator variable"), private
sector planning, and portfolio management. For these applications, it
should be of interest to evaluate the historical out-of-sample forecast
performance of models that include oil prices. Moreover, impulse
response functions and hypothesis tests are not able to provide
information about how much out-of-sample forecasts can be improved. (4)
This paper evaluates forecasts of different measures of output,
inflation, and monetary policy, all of which are important macroeconomic
variables that can plausibly be expected to respond to oil shocks, based
on both economic theory and media reports. We apply standard
out-of-sample predictive ability tests to data covering the period
January 1986 to December 2004. This time period has seen substantial
variation in the price of oil, from a low of $11 per barrel in late 1998
to a high of $53 per barrel in October 2004, and includes two Gulf Wars,
as well as multiple boom-and-bust cycles. If oil prices have value as an
indicator variable, it should clearly be present in our sample, as oil
price fluctuations of this magnitude will explain a large proportion of
the variance of the macroeconomic variables.
Our results suggest that there are few cases where models with oil
prices are even able to improve upon the forecasts of simple
autoregressive (AR) models. This finding is robust to changes in
specification that allow for nonlinearities, the use of rolling
estimation windows to account for possible parameter instability, the
use of industry-level data, and changes in the forecast horizon. To
ensure that our results are not due to low power of the predictive
ability tests that we employ, we show that the potential forecasting
gains from including oil prices are in most cases close to 0. We
conclude that oil prices provide little information about the future
direction of the economy.
The rest of the paper proceeds as follows. The next section
describes our methodology, Section III describes the data, Section IV
presents and discusses the results of tests for predictive ability of
oil prices for the macroeconomic variables, and Section V concludes the
paper.
II. METHODOLOGY
Denote by [x.sub.t] the value of a macroeconomic variable that we
want to forecast and by [DELTA][o.sub.t] the percentage change in the
nominal price of crude oil over period t. The null hypothesis to be
tested is that [DELTA]o does not help to predict x, so that the
mean-squared prediction error (MSE) for the model [x.sub.t] =
g([x.sub.t-1], [x.sub.t-2] ..., [x.sub.t-p], [DELTA][o.sub.t-1],
[DELTA][o.sub.t-2],..., [DELTA][o.sub.t-q]) is not less than the MSE for
the model [x.sub.t] = f([x.sub.t-1], [x.sub.t-2], ..., [x.sub.t-p]). Let
[u.sub.1t] = [x.sub.t] - f([x.sub.t-1], [x.sub.t-2], ..., [x.sub.t-p])
be the time t forecast error for the smaller model and [u.sub.2t] =
[x.sub.t] - g([x.sub.t-1], [x.sub.t-2], ..., [x.sub.t-p],
[DELTA][o.sub.t-1], [DELTA][o.sub.t-2], ..., [DELTA][o.sub.t-q]) be the
time t forecast error for the model that includes [DELTA]o. The null
hypothesis can then be stated as
[H.sub.0] : E([u.sup.2.sub.1t]) = E([u.sup.2.sub.2t]).
The alternative hypothesis is [H.sub.a] : E([u.sup.2t.sub.1t]) >
E([u.sup.2.sub.2t]).
There are a total of T observations in the data set, P
out-of-sample predictions are made, and the estimation sample is R = T -
p. (5) Let [[??].sub.1t] and [[??].sub.2t] denote some consistent
estimators for [u.sub.1t] and [U.sub.2t]. Then, we expect that
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] under [H.sub.0]
and
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [H.sub.a]
Several tests for equal predictive ability of nested models have
been proposed, including Clark and McCracken (2001), McCracken (2007),
and Clark and West (2007); see Corradi and Swanson (2006) and West
(2006) for reviews of the literature and references on predictive
ability testing. The first test we use is the Chao, Corradi, and Swanson
(2001) (hereafter CCS) test. If both f(x) and g(x) are linear functions
so that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], then
[H.sub.0] being true implies that the expected value of the statistic
(1) [m.sub.p] = (1 / [square root of P]) [T.summation over (t =
R+1)] [[??].sub.1t][W.sub.t-1]
is 0 (asymptotically), where [W.sub.t-1] = ([DELTA][o.sub.t-1],
[DELTA][o.sub.t-2], ..., [DELTA][o.sub.t-1])'. (6)
A wealth of evidence has accumulated that, at least for output, it
is important to relax assumptions about g(x) being a linear function.
Two commonly used transformations of the oil price series are the NOPI
introduced by Hamilton (1996) and the volatility of oil prices. The
motivation for NOPI is that it is the price of oil relative to recent
experience that matters most to consumers and producers. The motivation
for volatility is that uncertainty about oil prices can cause consumers
to postpone major purchases and firms to postpone investment. (7) Our
second test is therefore that neither the NOPI variable nor the oil
price volatility has marginal predictive content for any of the
macroeconomic variables. NOPI is constructed as in Bernanke, Gertler,
and Watson (1997) to be equal to the maximum of 0 and the percentage
change over the highest price observed in the previous 12 mo. The
availability of daily market prices for crude oil and the fact that oil
prices exhibit substantial day-to-day fluctuations make the realized
volatility of oil prices a natural measure of volatility (see, e.g.
Andersen et al. 2003). (8) Because neither of these transformations
require estimation, the CCS test statistic given by Equation (1) can be
used directly, where [W.sub.t-1] a represents either lags of NOPI or
realized volatility as opposed to lagged oil price changes.
The above CCS test relies on the assumption of a linear forecasting
model under both [H.sub.0] and [H.sub.a]. To remedy this, we consider
the integrated conditional moment test of Corradi and Swanson (2002,
2007), which we denote as CS. The version of the CS test that we use
here maintains a linear forecasting model under [H.sub.0]. However, it
allows for general nonlinear forecasting models under [H.sub.a]. For the
case of a quadratic loss function, the CS test statistic is
(2) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
[Z.sub.t-1] = ([x.sub.t-1], [x.sub.t-2], ..., [x.sub.t-p],
[DELTA][o.sub.t-1], [DELTA][o.sub.t-2], ..., [DELTA][o.sub.t-1])',
[phi]([gamma]) is the density function for the uniform distribution, and
[gamma] is a vector with p + q elements (the total number of variables
in [Z.sub.t-1]). Under [H.sub.0] E([u.sup.2.sub.1t]) =
E([u.sup.2.sub.2t]), we have E[[u.sub.1t]w([Z.sub.t-1], [gamma])] = 0
for all [gamma]. (9)
An important advantage of the CS test is that it is consistent
against all nonlinear alternatives, without requiring the specification
and estimation of an alternative nonlinear model. A different but more
popular approach for testing whether a variable has predictive power is
to compare the out-of-sample forecasts of a specific nonlinear model to
those of a smaller (nested) linear model. Naturally, that approach can
be more powerful if the researcher has the information about the form of
the nonlinearity or if the focus is on evaluating the fit of a
particular nonlinear model. The drawback is that the test results will
depend on the nonlinear model that is chosen as the alternative. It is
difficult in that framework to interpret a failure to reject the null
hypothesis, because it could mean that [H.sub.0] is true but it could
also mean that there is a problem with the estimation methodology and/or
the specific choice of the (alternative) nonlinear model. Moreover, if a
priori the researcher believes that there is more than one plausible
nonlinear alternative, the test statistics need to be modified in order
to control the size of the test. In contrast, the CS test is easy to
implement, has been shown in Monte Carlo experiments by Corradi and
Swanson (2007) to have good power in samples of the size that we have
here, and is consistent against generic nonlinear alternatives.
Therefore, we choose to use the CS test to deal with the case of a
generic nonlinear forecasting model (under the alternative hypothesis).
For our last comparison, we estimate flexible nonlinear models
using a fully nonparametric estimation approach. The functions f(x) and
g(x) are both estimated by nonparametric kernel methods, based on the
Nadaraya-Watson kernel estimator. In this case,
[u.sub.1t] = [x.sub.t] - [??]([x.sub.t-1], [x.sub.t-2], ...,
[x.sub.t-p])
for the AR model and
[u.sub.2t] = [x.sub.t] - [??]([x.sub.t-1], [x.sub.t-1], ...,
[x.sub.t-p], [DELTA][o.sub.t-1], ..., [DELTA][o.sub.t-q])
for the vector autoregressive (VAR) model with oil. Nonparametric
models relax any assumption about functional forms, and as a result,
they encompass all nonlinear models including threshold, smooth
transition, and Markov switching models.
The one-step-ahead forecast of [x.sub.t + i] at time t using the
nonparametric AR model is given by
(3) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where [W.sub.s-1] = ([x.sub.s-1], [x.sub.s-2], ..., [x.sub.s-p]), h
= ([h.sub.1], ..., [h.sub.p]) are the smoothing parameters and K(x) is
the product Gaussian kernel function, that is, K(([W.sub.s-1] -
[W.sub.t])/h) = [[PI].sup.p.sub.j = 1] k(([x.sub.s-j] -
[x.sub.t+1-j])/[h.sub.j]). Similarly, the forecast of [x.sub.t+1] at
time t using the nonparametric VAR model is given by
(4) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where [Z.sub.s-1] = ([x.sub.s-1], [x.sub.s-2], ..., [x.sub.s-p],
[DELTA][o.sub.s-1], ... [DELTA][o.sub.s-q]), h denotes the p + q
smoothing parameters (associated with the p + q components of
[Z.sub.t-1]), and K(x) is again the product Gaussian kernel function.
Given that the Schwarz information criterion (SIC) selects either one or
two lags for the parametric models, we use a similar number of lags for
estimation of nonparametric models by setting p = 2 and q = 1 in the
nonparametric estimation.
A practical difficulty associated with kernel estimation is the
choice of bandwidth (the vector h in Equations 3 and 4). The recursive nature of our analysis makes matters more difficult because h must be
chosen many times. The bandwidth is chosen in this paper by a grid
search over a set of plausible values of h and then setting h equal to
the vector that minimizes the out-of-sample forecast MSE for the 50
observations prior to the time a forecast is made. For example, to make
a one-step-ahead forecast of the consumer price index (CPI) inflation
rate in January 1986, for which the models are estimated with data
available through December 1985, we use the value of h that would have
resulted in the lowest out-of-sample MSE over the period from January
1981 to December 1985. To forecast the CPI inflation rate in February
1986, we use the value of h that produced the lowest MSE over the period
from February 1981 to January 1986. The bandwidth is thus chosen each
time a forecast is made.
The above-cited predictive ability tests (such as CCS and CS) do
not apply to nonparametric models. Fan and Li (1996) proposed a test
comparing two nonparametric regression models but they only considered
the case of in-sample testing. We report the ratio of the MSE for the
two nonparametric regression models:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where [[??].sub.1t] and [[??].sub.2t] are the nonparametric
residuals from the null and the alternative models, respectively. Note
that [lambda] < 1 corresponds to cases in which the out-of-sample
forecasting MSE of the nonparametric VAR model (with lagged oil price
changes as regressors) is smaller than that of the nonparametric AR
model (without using oil prices).
All of the above tests use an estimated AR model as the benchmark.
The one-step-ahead AR model forecasts are produced for each of the
macroeconomic series for the period 1986-2004, so that P = 228 when x is
observed monthly and P = 76 when x is observed quarterly. Following the
recommendation of Inoue and Kilian (2006), in each case we select the
lag length by the SIC before each forecast is made. Thus, consider as an
example the producer price index (PPI) series, for which we have monthly
data from January 1948 to December 2004. To make a forecast for January
1986, 12 AR models (i.e., AR(1), AR(2), ..., AR(12)) of PPI inflation
were estimated using data from January 1948 to December 1985, the SIC
was calculated for each model, and the model with the lowest associated
SIC value was used to forecast PPI inflation for January 1986. To then
make the forecast for February 1986, observations for January 1986 were
added to the data set, the 12 AR models were reestimated, and the model
with the lowest SIC was used to forecast PPI inflation for February
1986. This process was repeated until 228 one-step-ahead out-of-sample
forecasts of PPI inflation were produced for the period January 1986 to
December 2004. Given the importance of lag selection for our exercise,
with the SIC selecting an average of one lag for GDP up to an average of
seven lags for the 1-Mo Treasury bill rate, we have also fixed the lag
length to 12 mo for monthly data and four quarters for quarterly data,
with no effect on our results. The error series [[??].sub.1t] is simply
the difference between actual and forecast values of PPI inflation at
each point in time.
III. DATA
The crude oil data come from two sources. Monthly data on West
Texas Intermediate (WTI) for 1946-2004 were downloaded from the Federal
Reserve Economic Database (FRED) provided by the Federal Reserve Bank of
St. Louis. Daily WTI data for 1986-2004, used to construct the realized
volatility of oil prices, were downloaded from the Web site of the
Energy Information Administration. (10) Throughout the entire analysis,
[DELTA][o.sub.t] denotes the first difference of the logarithm of the
monthly WTI series. Our results are almost identical when the real price
of oil (price of WTI deflated by the CPI) is used instead of the nominal
oil price.
The macroeconomic series, [x.sub.t], includes both aggregate and
disaggregated data. The aggregate series comes from three different
groups, all downloaded from FRED and described in Table 1: (1) price
data, including the CPI excluding energy and the PPI; (2) measures of
aggregate output, including GDP, industrial production, and the
unemployment rate; and (3) financial and monetary policy variables,
including the federal funds rate, 1- and 10-yr interest rates, the term
spread (the difference between 10- and 1-yr interest rates), the default
spread (the difference between AAA and BAA bond yields), real M2
balances, and the velocity of M2 with respect to GDP. This set of
variables is sufficiently broad to include most of the theoretically
plausible oil price-macroeconomic relationships. The analysis proceeds
using quarterly data when [x.sub.t] is equal to GDP or M2 velocity,
while in all other cases, monthly data are used.
The disaggregated data are monthly industry-level industrial
production figures for 11 industries: primary metals, fabricated metals,
machinery, electrical equipment, aerospace, furniture, apparel, paper,
chemicals, and plastics and rubber--reported in Federal Reserve
Statistical Release G.17. Lee and Ni (2002) provided evidence of an
in-sample relationship between production in many of these industries
and the price of oil over the period 1959-1997. We take the first
difference of the natural logarithm of the CPI, PPI, GDP, industrial
production, real M2 balances, and all industry-level industrial
production series. The other variables enter the forecasting models in
levels.
IV. EMPIRICAL RESULTS
This section discusses results of the analysis described above. We
begin with the benchmark results, which are based on one-step-ahead
forecasts of the aggregate macroeconomic variables. We then check the
robustness of our conclusions along several dimensions, including the
use of industry-level data, longer forecast horizons, and rolling data
windows to control for the possibility of parameter instability. In each
case, we see that our conclusions are largely unaffected by changes in
the methodology.
A. Benchmark Results
Table 2 reports results of the tests described above for each of
the 12 macroeconomic series. In the first column for each test are p
values for the CCS test of the null hypothesis of no (linear)
predictability of oil prices for the given macroeconomic variable. The
linear oil shock measure is similar to the oil shock in the classic
paper by Hamilton (1983). At a 5% level, we reject the null hypothesis
for only one series, PPI inflation. This result is to be expected
because oil prices enter the PPI directly albeit as just one component
of finished energy goods. Energy goods in December 2004 had a relative
importance of 17% of finished goods in the construction of the PPI. (11)
The important question is whether oil prices contain information about
future movements in the nonenergy sectors of the economy. Interestingly,
there is no evidence that oil prices help to forecast CPI inflation once
energy prices are removed, with a p value for the CPI excluding energy
of .89.
A limitation of any statistical test is that it is only informative
about statistical significance, unable to say anything about the
economic significance of a particular relationship. There is no reason
to believe that the CCS test lacks power for our application--see Chao,
Corradi, and Swanson (2001) for simulation evidence on the power of the
CCS test. Nonetheless, to ensure that the benchmark results are not
driven by the use of a test with low power, the second column for each
test in Table 2 reports the [R.sup.2] from a regression of tilt on
[Z.sub.t-1]. (12) We interpret the [R.sup.2] as the potential
improvement on the AR forecast from using the information in oil prices.
Under the null hypothesis, the [R.sup.2] of this regression will be
close to 0. The [R.sup.2] for PPI inflation is 11%, which is
approximately equal to the relative importance of energy goods in the
construction of the PPI. For most of the other variables, the [R.sup.2]
is essentially 0, with the largest (the default spread) being 6%.
Failure of the CCS test to reject is therefore not surprising. In-sample
[R.sup.2] calculations, available upon request, show the same pattern:
the [R.sup.2] for PPI inflation goes from 5% to 18% when oil prices are
added to the model, while the [R.sup.2] for the other variables changes
very little after adding oil prices. There is clearly no exploitable
linear relationship between oil prices and output (GDP growth,
industrial production growth, or the unemployment rate), and the case
for forecasting variables related to current and future monetary policy
is similarly weak. In summary, our findings for the benchmark comparison
are due to the inability of oil price movements to improve upon simple
AR forecasts, not our choice of test statistic. (13,14)
B. Alternative Specifications
As discussed earlier, the recent literature has moved away from the
linear oil shock measure used by Hamilton (1983). It is therefore
natural to ask whether the findings of the previous section are robust
to changes in specification. Columns 4 and 5 of Table 2 are p values for
the CCS test of whether the NOPI helps to predict the macroeconomic
variables and the corresponding [R.sup.2] of a regression of
[[??].sub.t-1] on [Z.sub.t-1]. The [R.sup.2] for PPI
inflation--approximately 15%--is a little stronger than that for the
benchmark comparison, but once again, this should be expected because
oil prices enter the PPI directly. It should be noted that the NOPI
series has far less variability than the first difference of oil prices
because most observations are 0, which may make it difficult for the CCS
test to reject. Nonetheless, the largest [R.sup.2] for any of the other
series is 7% and in most cases is very close to 0. The CCS test and the
[R.sup.2] are largely in agreement, so that forecasts of output and the
different measures of current and future monetary policy are not likely
to be improved upon by conditioning on lagged NOPI.
The results for realized volatility of oil prices uncover our first
evidence of monetary policy (through the federal funds rate) reacting to
oil shocks. Forecasts of GDP and the CPI can also potentially be
improved by controlling for" oil price volatility. One
interpretation is that, as suggested by Bernanke, Gertler, and Watson
(1997), the Federal Reserve attempts to influence CPI inflation and GDP
growth by changing its federal funds rate target in response to oil
price shocks. Whatever the reason, forecasting models of oil volatility
and the CPI, GDP, and the federal funds rate may be worthy of further
analysis. It should be noted that the [R.sup.2] for CPI inflation is low
(approximately 5%).
Column 8 of Table 2 gives p values of the CS test for nonlinear
predictability. CS is a generalized version of the CCS test. It allows
the researcher to choose a generic weight function w(*) which is
nonlinear in [Z.sub.t-1] so that the test has improved power against
many nonlinear alternative models. The results of this test are
basically the same as for the CCS tests. Only in the case of PPI
inflation does the CS test reject, suggesting that oil price may only
help to forecast PPI inflation; the CS test apparently detects the same
relationship as the CCS test.
The last column in Table 2 is the relative mean-squared error of a
fully nonparametric VAR model, with oil prices, against a nonparametric
AR model. Details on the construction of these forecasts are discussed
in Section II. The nonparametric model does not nest the previous two
nonlinear specifications, NOPI and realized volatility, because the
right-hand-side variables are different. NOPI compares the present
month's oil price with the highest oil price observed in the
previous year, while the realized volatility series is constructed using
daily oil price data. The right-hand-side variables for the
nonparametric forecasting models do not include either the highest oil
price of the previous year or the daily observations on oil prices. (15)
The advantage of the nonparametric model is that it allows for any type
of nonlinearity in the relationship between the macroeconomic variables
and the lagged oil price changes. The results indicate that there is
very little to be gained from a nonparametric specification. For 10 of
the 12 cases, the relative MSE is greater than 1, meaning that the model
with oil prices does worse than the AR model, and for the other two
cases, the relative MSE is close to 1. The distribution of predictive
ability test statistics, such as the CCS test or DM test, has not been
derived for nonparametric estimation models, but formal tests are really
not necessary given that the relative MSE is seldom below 1.
We can summarize our findings as follows. The only variable for
which forecasts are consistently improved upon by controlling for oil
price fluctuations is PPI inflation. This is not surprising, as oil
prices enter the PPI directly, causing oil prices and the PPI to be
related by construction. Adding oil prices as additional predictors, we
do not find any improvement in forecasts of either CPI inflation or
various measures of output and monetary policy. Our findings are robust
when oil price changes are replaced by either the NOPI or the oil price
volatility, with a few notable exceptions for oil price volatility,
which suggests a direction for future research. We emphasize that
because our results are based on the application of the CCS test, we
have not relied on a specific alternative model (such as a potentially
over-parameterized linear VAR model or incorrectly specified parametric
nonlinear model) and do not rely on a particular estimation strategy
(such as ordinary least squares estimation with no restrictions on the
parameters). The Federal Reserve, for instance, often relies on
structural model forecasts, a point made clearly by Meyer (1997).
Additionally, our nonparametric estimation results do not impose any
parametric regression functional forms under either the null or the
alternative hypothesis.
C. Changes in Methodology
We now investigate the robustness of our results to changes in the
methodology. We first consider the use of disaggregated rather than
aggregate data. This is motivated by Lee and Ni (2002) to avoid the
possibility that oil shocks affect some sectors of the economy but that
using aggregate data makes it difficult to uncover those effects.
Additionally, stock traders are often interested in the performance of
individual sectors of the economy rather than the aggregate economy.
Table 3 replicates Table 2 for industry-level industrial production
growth in 11 sectors that might plausibly be expected to be affected by
oil shocks. As before, the CCS test finds no evidence that the linear
oil shock measure has predictive power for output in any of the
industries, and the largest [R.sup.2] is less than 4%. The MSE for a VAR
model including oil prices relative to the MSE of an AR model is greater
than 1 in all cases, so that the model with oil prices never predicts
better. (16) This does not mean that our results conflict with Lee and
Ni (2002). Our interest is in out-of-sample forecast performance,
whereas the goal of their paper was to characterize the relationship
between oil prices and industry output. Many factors can explain the
poor historical forecast performance of models with oil without
suggesting any inadequacy in the specification of Lee and Ni (2002).
For the NOPI and realized volatility oil shock measures, the
results are marginally more encouraging. In fact, for machinery and
furniture, the [R.sup.2] is more than 10% even if the CCS test is unable
to reject the null hypothesis of no predictive ability for either
variable. There is also some evidence that NOPI shocks improve
predictions of output growth in the plastics/rubber and machinery
industries. The last two columns of Table 3 show little to be gained
from relaxing assumptions about functional form. The CS test rejects for
only one series, and the MSE of the nonparametric VAR model is not less
than that of the AR model for any of the industries. Overall, there is
little evidence in Table 3 that overturns our conclusion that oil shocks
are not a useful predictor of aggregate economic variables. Oil prices
only have predictive power in a few special cases, and the predictive
power comes from a nonlinear transformation of the oil price series,
which is much different from the way oil prices typically enter into
policy discussions or reports in the business press.
Another potential explanation is that we have to this point
considered only one-step ahead forecasts. While this is the leading
case, in practice it may take time for changes in oil to have nontrivial effects on output, inflation, or monetary policy (although to some
extent we do capture this with our forecasts of long-term interest rates
and the term spread). To investigate the importance of the forecast
horizon, we have done the CCS and CS tests for 6-, 12-, and 24-mo-ahead
forecasts, which are horizons that are potentially of greater interest.
We use an "h-step-ahead" forecasting procedure, as defined by
Marcellino, Stock, and Watson (2006), with the estimation equation for
the AR model being
[x.sub.t] = [[beta].sub.0] + [h+p+1.summation over (i=1)]
[[beta].sub.i][x.sub.t-i] + [[epsilon].sub.t],
where h is the forecast horizon and p is the lag length. In other
words, [x.sub.t] is regressed on information available h periods
earlier, and forecasts are calculated by plugging time t information
into the estimated equation. As before, following the suggestion of
Inoue and Kilian (2006), the lag length p is chosen by the SIC prior to
making each forecast. These results, reported in Table 4, are consistent
with our previous results.
Finally, we consider the possibility that the results in Table 2
are due to parameter instability. Stock and Watson (1996) have
highlighted problems with instability in macroeconomic relationships.
Numerous other papers have shown the problems instability can cause for
out-of-sample forecast performance. (17) Many of the papers that have
used nonlinear transformations of oil prices, including Ferderer (1996),
Hamilton (1996, 2003), and Lee, Ni, and Ratti (1995), were motivated by
claims of instability in the oil price-GDP relationship. To some extent,
changes in the oil price-macroeconomy relationship will not cause
problems for the CCS test, as it requires estimation of only the
benchmark AR model. Nonetheless, to allow for the possibility that
parameter instability is driving our results, we compute forecasts for
the AR model with parameters estimated using rolling 10-yr data windows,
along the lines of Swanson (1998), and compare these forecasts to the AR
model that serves as our benchmark in Table 2. Thus, rather than making
forecasts using all data that would have been available at the time the
forecast was made, we use only the most recent 10 yr of data.
The use of rolling estimation windows does not change our findings.
In only 9 out of 48 comparisons does the use of rolling estimation
windows lead to slightly improved forecasts. 18 In only one comparison
does the use of a rolling estimation window lead to a reduction in MSE
of 10% or more. Similar results hold true for industry-level data. We
conclude that our results are not driven by parameter instability. (19)
V. CONCLUDING REMARKS
The goal of this paper has been to quantify whether and how much
predictions of macroeconomic variables can be improved by including
information about oil price movements. We have examined a broad range of
variables over a time period for which there has been significant
variation in oil prices and find few examples where oil price
fluctuations are useful predictors of inflation, output, or monetary
policy. The only strong relationships appear to involve oil price
volatility and either CPI inflation or GDP growth. For most variables,
we cannot reject the hypothesis that oil prices have no predictability.
We conclude from this that oil prices are generally not informative
about the future direction of the economy.
We conclude that it is hard to justify much of the attention given
to oil price movements. It is easy to understand why high gas prices,
which are driven by oil prices, are unpopular with consumers. It is also
easy to understand why firms in industries directly dependent on oil as
an input should be concerned with spikes in the price of oil, as that
will result in higher final goods prices or lower profits. On the other
hand, it is not obvious why the Federal Reserve or firms in other
industries should change their behavior in response to oil shocks.
This does not contradict the conclusions of Hamilton (2003) or Lee
and Ni (2002), because these papers were interested in the in-sample fit
of models that include oil prices, whereas we are interested in
evaluating out-of-sample forecasts. Interesting discussions of the
difference between in-sample model evaluation and out-of-sample forecast
performance can be found in Campbell and Thompson (2005), Inoue and
Kilian (2006), and Kilian and Taylor (2003), among others. As these
authors point out, there are reasons that a simple benchmark model can
be expected to forecast as well as a correctly specified economic model.
In cases where a model will be used for forecasting, it may be best to
directly evaluate the recent historical forecast performance of that
model.
We have primarily relied on the out-of-sample predictive ability
tests of Chao, Corradi, and Swanson (2001) and Corradi and Swanson
(2002, 2007), so that the failure of oil as a predictor cannot be
attributed to arbitrary choices for the specification and estimation of
the alternative models (i.e., those that include oil prices). Similarly,
unmodeled nonlinearities cannot account for our failure to find
predictability, both because the CS test is consistent against generic
nonlinear alternatives and because forecast MSE calculations from robust
nonparametric regression models and two parametric nonlinear models show
no gain from including oil prices. The results of our benchmark
comparisons are basically unaffected by the use of rolling estimation
windows to account for parameter instability, the use of disaggregated
data, or the changes in the forecast horizon. Finally, while the results
of the predictive ability tests used here may (as with any test) be the
result of low power against certain alternatives, we report measures of
economic significance ([R.sup.2]) and find little to be gained even in a
best-case scenario. It goes without saying that our findings cannot be
driven by a lack of variation in the oil price series.
ABBREVIATIONS
AR: Autoregressive
CCS: Chao, Corradi, and Swanson (2001)
CPI: Consumer Price Index
CS: Corradi and Swanson (2007)
DM: Diebold and Mariano (1995)
FRED: Federal Reserve Economic Database
GDP: Gross Domestic Product
MSE: Mean-Squared Prediction Error
NOPI: Net Oil Price Increase
PPI: Producer Price Index
SIC: Schwarz Information Criterion
VAR: Vector Autoregressive
WTI: West Texas Intermediate
REFERENCES
Andersen, T. G., T. Bollerslev, F. X. Diebold, and P. Labys.
"Modeling and Forecasting Realized Volatility." Econometrica,
71, 2003, 529-626.
Barsky, R. B., and L. Kilian. "Do We Really Know that Oil
Caused the Great Stagflation? A Monetary Alternative," in NBER Maeroeconomics Annual 2001, edited by B. S. Bernanke and K. Rogoff.
Cambridge, MA: MIT Press, 2002, 137-83.
--. "Oil and the Macroeconomy Since the 1970's."
Journal of Economic Perspectives, 18(4), 2004, 115-34.
Bernanke, B. S. "Irreversibility, Uncertainty, and Cyclical Investment." Quarterly Journal of Economics, 98, 1983, 85-106.
Bernanke, B. S., M. Gertler, and M. Watson. "Systematic
Monetary Policy and the Effects of Oil Price Shocks." Brookings
Papers on Economic Activity, 1997, 1997, 91-142.
Campbell, J. Y., and S. B. Thompson. "Predicting the Equity
Premium Out of Sample: Can Anything Beat the Historical Average?"
Review of Financial Studies, 2005, forthcoming.
Chao, J., V. Corradi, and N. R. Swanson. "An Out of Sample
Test for Granger Causality." Macroeconomic Dynamics, 5, 2001,
598-620.
Clarida, R., J. Gali, and M. Gertler. "Monetary Policy Rules
and Macroeconomic Stability: Evidence and Some Theory." Quarterly
Journal of Economies, 115, 2000, 147-80.
Clark, T. E., and M. W. McCracken. "Tests of Equal Forecast
Accuracy and Encompassing for Nested Models." Journal of
Econometrics, 105, 2001, 85-110.
--. "The Power of Tests of Predictive Ability in the Presence
of Structural Breaks." Journal of Eeonometrics, 124, 2004, 1-31.
--. "The Predictive Content of the Output Gap for Inflation:
Resolving In-Sample and Out-of-Sample Evidence." Journal of Money,
Credit, and Banking, 2006, 1127-48.
Clark, T. E., and K. W. West. "Approximately Normal Tests for
Equal Predictive Accuracy in Nested Models." Journal of
Econometrics, 138, 2007, 291-311.
Corradi, V., and N. R. Swanson. "A Consistent Test for
Nonlinear Out of Sample Predictive Accuracy." Journal of
Econometrics, 110, 2002, 353-81.
--. "Predictive Density Evaluation," in Handbook of
Economic Forecasting, edited by C. W. J. Granger, G. Elliott, and A.
Timmerman. Amsterdam: Elsevier, 2006, 197-284.
--. "Nonparametric Bootstrap Procedures for Predictive
Inference Based on Recursive Estimation Schemes." International
Economic Review, 48, 2007, 67-109.
Diebold, F. X., and R. S. Mariano. "Comparing Predictive
Accuracy." Journal of Business and Economic Statistics, 13, 1995,
253-63.
Fan, Y., and Q. Li. "Consistent Model Specification Tests:
Omitted Variables and Semiparametric Functional Forms."
Eeonometrica, 64, 1996, 865-90.
Ferderer, J. P. "Oil Price Volatility and the
Macroeconomy." Journal of Macroeconomies, 18, 1996, 1-26.
Guo, H., and K. L. Kliesen. "Oil Price Volatility and U.S.
Macroeconomic Activity." Federal Reserve Bank of St. Louis Review,
87(6), 2005, 669-83.
Hamilton, J. D. "Oil and the Macroeconomy Since World War
II." Journal of Political Economy, 91, 1983, 228-48.
--. "A Neoclassical Model of Unemployment and the Business
Cycle." Journal of Political Economy, 96, 1988, 593-617.
"This is What Happened to the Oil Price-Macroeconomy
Relationship." Journal of Monetary Economics, 38, 1996, 215-20.
--. "What is an Oil Shock?" Journal of Econometrics, 113,
2003, 363-98.
Inoue, A., and L. Kilian. "In-Sample or Out-of-Sample Tests of
Predictability: Which One Should We Use?" Econometric Reviews, 23,
2004, 371-402.
--. "On the Selection of Forecasting Models." Journal of
Econometrics, 130, 2006, 273-306.
Jones, C. M., and G. Kaul. "Oil and the Stock Markets."
Journal of Finance, 51, 1996, 463-91.
Kilian, L. "A Comparison of the Effects of Exogenous Oil
Supply Shocks on Output and Inflation in the G7 Countries." Journal
of the European Economic Association, 2006a, forthcoming.
--. "Exogenous Oil Supply Shocks: How Big Are They and How
Much Do They Matter for the U.S. Economy?" Review of Economics and
Statistics, 2006b, forthcoming.
Kilian, L., and M. P. Taylor. "Why is it so Difficult to Beat
the Random Walk Forecast of Exchange Rates?" Journal of
International Economics, 60, 2003, 85-107.
Leduc, S., and K. Sill. "A Quantitative Analysis of Oil-Price
Shocks, Systematic Monetary Policy, and Economic Downturns."
Journal of Monetary Economics, 51, 2004, 781-808.
Lee, K., and S. Ni. "On the Dynamic Effects of Oil Price
Shocks: A Study Using Industry Level Data." Journal of Monetary
Economics, 49, 2002, 823-52.
Lee, K., S. Ni, and R. A. Ratti. "Oil Shocks and the
Macroeconomy: The Role of Price Variability." Energy Journal, 16,
1995, 39-56.
Marcellino, M., J. H. Stock, and M. W. Watson. "A Comparison
of Direct and Iterated Multistep AR Methods for Forecasting
Macroeconomic Time Series." Journal of Econometrics, 135, 2006,
499-526.
McCracken, M. W. "Asymptotics for Out of Sample Tests of
Granger Causality." Journal of Econometrics, 2007, 140, 719-52.
Meyer, L. H. "The Role for Structural Macroeconomic
Models." AEA Panel on Monetary and Fiscal Policy, New Orleans, LA,
January 5, 1997. Accessed December 2005 http://www.federalreserve.gov/
BOARDDOCS/SPEECHES/19970105.htm.
Stock, J. H., and M. W. Watson. "Evidence on Structural
Instability in Macroeconomic Time Series Relations." Journal of
Business and Economic Statistics, 14, 1996, 11-30.
Swanson, N. R. "Money and Output Viewed Through a Rolling
Window." Journal of Monetary Economics, 41, 1998, 455-74.
Wei, C. "Energy, the Stock Market and the Putty-Clay
Investment Model." American Economic Review, 93, 2003, 311-24.
West, K. W. "Asymptotic Inference about Predictive
Ability." Econometrica, 1996, 64, 1067-84.
--. "Forecast Evaluation," in Handbook of Economic
Forecasting, edited by C. W. J. Granger, G. Elliott, and A. Timmerman.
Amsterdam: Elsevier, 2006, 100-34.
(1.) As one example, the 2005 G8 summit documents include a
statement on the global economy and oil, which at one point states,
"We agreed that secure, reliable and affordable energy sources are
fundamental to economic stability and development." (p. 2; the full
text of the document can be downloaded at
http://www.fco.gov.uk/Files/kfile/ PostGS_Gleneagles_GlobalEconomy.pdf).
See Hamilton (2003) and Barsky and Kilian (2000) for extensive reviews
of the literature testing for a relationship between oil prices and
output.
(2.) See Barsky and Kilian (2002, 2004), Bernanke, Gertler, and
Watson (1997), Clarida, Gali, and Gertler (2000), Jones and Kaul (1996),
Leduc and Sill (2004), Kilian (2006a, 2006b), and Wei (2003), among
others, for alternative interpretations of the evidence on the effects
of oil shocks.
(3.) See, for example, Ferderer (1996) and Hamilton (1988).
(4.) Inoue and Kilian (2004) discussed issues in choosing between
in-sample and out-of-sample model evaluation.
(5.) This notation is standard in the literature on forecast
evaluation. See, for example, West (1996).
(6.) Chao, Corradi, and Swanson (2001) derived the distribution of
[m.sub.p] and provided the conditions necessary for [m.sub.p] to have a
[chi square] distribution under [H.sub.0]. Failure to reject Ho implies
that the oil price series does not have statistically significant
marginal predictive content for the given macroeconomic variable.
(7.) See, for example, Bernanke (1983), Lee, Ni, and Ratti (1995),
and Ferderer (1996).
(8.) Realized volatility is constructed as the variance of oil
prices over month t: [RV.sub.t] =
[[summation].sup.k.sub.i=1][([DELTA][o.sub.1] - [[mu].sub.t]).sup.2]/k,
where k is the number of days in month t and [[mu].sub.t], is the mean
oil price change in month t.
(9.) To calculate the sample value of Equation (2), 2,000 values of
[gamma] were randomly selected from a uniform distribution over [0,5],
[m.sub.p]([gamma]) was calculated for each of the chosen [gamma], and
the test statistic [M.sub.p] was set equal to the average of the
[m.sub.p]([gamma]). If w(x) is a generically comprehensive function, the
CS test is consistent against generic nonlinear alternatives. Following
Corradi and Swanson (2007), we set [MATHEMATICAL EXPRESSION NOT
REPRODUCIBLE IN ASCII], where [[gamma].sub.i] and [z.sub.i,t] are the
ith elements of [gamma] and [Z.sub.t], respectively, [[bar.z].sub.i] is
the sample mean of [z.sub.i,t], and [MATHEMATICAL EXPRESSION NOT
REPRODUCIBLE IN ASCII], is the sample standard deviation of [z.sub.i,t]
(over t). Critical values are calculated using the block bootstrap
methodology of Corradi and Swanson (2007).
(10.) We use two different sources for the WTI data because the
FRED data are only available at a monthly frequency, while the daily
Energy Information Agency data begin in the mid-1980s.
(11.) http://www.bls.gov/news.release/ppi.t01.htm, Bureau of Labor
Statistics, PPI News Release Table 1. Producer price indexes and percent
changes by stage of processing.
(12.) The notation for the CCS test is described below Equation
(1).
(13.) As is common practice in the forecasting literature, we have
also compared the MSE of an unrestricted VAR model to the MSE of the AR
model. The MSE of the VAR model is generally close to or greater than
the MSE of the AR model, suggesting that there is no gain in forecast
accuracy from adding lagged oil prices to the AR model. These results
are not reported in order to save space but are available from the
authors upon request.
(14.) It is worth noting that our main findings would be
strengthened if an adjustment were made to control for the fact that we
are repeatedly applying a classical test for a large number of
variables. Such an adjustment would make it more difficult to reject the
null hypothesis of no predictive ability.
(15.) It is in principle straightforward to estimate a
nonparametric model that nests NOPI and realized volatility but as a
practical matter that would require the inclusion of either
nonstationary variables (NOPI) or so many variables that the curse of
dimensionality would make estimation imprecise.
(16.) These results are available upon request.
(17.) See Clark and McCracken (2006) for discussion and references.
(18.) There are 12 macroeconomic variables (those in Table 1) and
four forecast horizons (1, 6, 12, and 24 too). The results of these
comparisons are available upon request.
(19.) Note that the CCS test and CS test are not designed for
comparison of forecasts when the only difference is that one model uses
more observations for estimation.
LANCE BACHMEIER, QI LI and DANDAN LIU *
* We wish to thank John Chao, Lutz Kilian, Norm Swanson, an
anonymous referee, and the editor, Dennis Jansen, for many helpful
comments. We are especially grateful to Phil Rothman for suggesting this
topic.
Bachmeier: Assistant Professor, Department of Economics, Kansas
State University, Manhattan, KS 66506. Phone 785-532-4578, Fax
785-532-6919, E-mail lanceb@ksu.edu
Li: Professor, Department of Economics, Texas A&M University,
College Station, TX 77843-4228. Phone 979-845-9954, Fax 979-847-8757,
E-mail qi@ econmail.tamu.edu; and Department of Economics, Tsinghua
University, Beijing 100084, P.R. China.
Liu. Assistant Professor, Department of Economics, Bowling Green
State University, Bowling Green, OH 43403-0001. Phone 419-372-4879, Fax
419-372-1557, E-mail dliu@bgsu.edu
TABLE 1 Variable Descriptions
Variable Sample Start Date Frequency Transformation
CPI excluding energy January 1959 Monthly Log difference
PPI January 1948 Monthly Log difference
GDP January 1949 Quarterly Log difference
Industrial production January 1948 Monthly Log difference
Unemployment rate January 1950 Monthly None
Federal funds rate January 1956 Monthly None
1-yr T-Bond January 1955 Monthly None
10-yr T-Bond January 1955 Monthly None
Term spread January 1955 Monthly None
Default spread January 1948 Monthly None
Real M2 January 1950 Monthly Log difference
M2 velocity January 1961 Quarterly None
TABLE 2 One-Step-Ahead Forecasts
[DELTA]WTI NOPI
Variable CCS [R.sup.2] CCS [R.sup.2]
CPI (excluding energy) 0.89 0.006 0.21 0.029
PPI 0.03 0.110 0.28 0.145
GDP 0.77 0.001 0.20 0.072
Industrial production 0.20 0.007 0.34 0.009
Unemployment rate 0.89 0.006 0.48 0.007
Federal funds rate 0.18 0.021 0.45 0.001
1-yr T-Bond 0.26 0.029 0.58 0.005
10-yr T-Bond 0.75 0.010 0.48 0.009
Term spread 0.52 0.008 0.83 0.002
Default spread 0.06 0.064 0.68 0.011
Real M2 0.18 0.003 0.03 0.001
M2 velocity 0.35 0.014 0.84 0.001
Kernel
Volatility [DELTA]W TI
MSE (VAR)/
Variable CCS [R.sup.2] CS MSE (AR)
CPI (excluding energy) 0.00 0.052 1.00 0.95
PPI 0.16 0.061 0.03 1.00
GDP 0.01 0.105 0.55 1.27
Industrial production 0.23 0.024 0.36 1.05
Unemployment rate 0.27 0.008 1.00 1.00
Federal funds rate 0.04 0.099 1.00 1.11
1-yr T-Bond 0.16 0.029 1.00 1.11
10-yr T-Bond 0.29 0.012 1.00 1.04
Term spread 0.10 0.010 0.06 1.01
Default spread 0.96 0.001 1.00 1.04
Real M2 0.36 0.001 0.25 1.01
M2 velocity 0.71 0.013 0.29 0.98
Notes: Numbers under the CCS and CS (tests) are the p values. Boldface
numbers emphasize that the p values are less than 5%.
TABLE 3 Industry-Level One-Step-Ahead Forecasts
[DELTA]WTI NOPI
Variable CCS [R.sup.2] CCS [R.sup.2]
Primary metals 0.71 0.002 0.29 0.010
Fabricated metals 0.12 0.012 0.35 0.008
Machinery 0.92 0.000 0.01 0.040
Electrical equipment 0.89 0.017 0.39 0.006
Motor vehicles 0.95 0.005 0.95 0.005
Aerospace 0.56 0.003 0.79 0.002
Furniture 0.77 0.004 0.43 0.060
Apparel 0.85 0.035 0.19 0.024
Paper 0.84 0.006 0.06 0.025
Chemicals 0.69 0.001 0.06 0.035
Plastics and rubber 0.16 0.013 0.03 0.075
Kernel
Volatility [DELTA]WTI
VISE (VAR)/
Variable CCS [R.sup.2] CS MSE (AR)
Primary metals 0.86 0.016 0.38 1.16
Fabricated metals 0.55 0.045 0.29 1.15
Machinery 0.11 0.138 0.99 1.07
Electrical equipment 0.77 0.001 0.03 1.09
Motor vehicles 0.35 0.017 0.06 1.08
Aerospace 0.90 0.001 1.00 1.03
Furniture 0.15 0.122 0.56 1.15
Apparel 0.08 0.019 0.62 1.15
Paper 0.04 0.033 1.00 1.13
Chemicals 0.85 0.005 1.00 1.26
Plastics and rubber 0.79 0.026 0.89 1.18
Notes: Same notes as in Table 2 apply here.
TABLE 4 Multi-Step Forecasts
h = 6 mo
Variable CCS [R.sup.2] CS
CPI (excluding energy) 0.82 0.009 1.00
PPI 0.16 0.035 0.07
GDP 0.76 0.001 0.75
Industrial production 0.95 0.000 0.79
Unemployment rate 0.94 0.002 1.00
Federal funds rate 0.53 0.003 0.89
1-yr T-Bond 0.54 0.003 1.00
10-yr T-Bond 0.89 0.000 0.72
Term spread 0.39 0.004 0.02
Default spread 0.99 0.000 0.97
Real M2 0.76 0.000 0.69
M2 velocity 0.95 0.001 0.18
h = 12 mo
Variable CCS [R.sup.2] CS
CPI (excluding energy) 0.03 0.022 1.00
PPI 0.52 0.001 0.91
GDP 0.82 0.001 0.46
Industrial production 0.58 0.001 1.00
Unemployment rate 0.93 0.002 1.00
Federal funds rate 0.43 0.004 0.84
1-yr T-Bond 0.67 0.001 0.68
10-yr T-Bond 0.80 0.000 0.58
Term spread 0.65 0.001 0.00
Default spread 0.12 0.013 0.49
Real M2 0.48 0.001 0.59
M2 velocity 0.44 0.005 0.46
h = 24 mo
Variable CCS [R.sup.2] CS
CPI (excluding energy) 0.54 0.002 1.00
PPI 0.09 0.010 0.79
GDP 0.39 0.012 0.50
Industrial production 0.17 0.007 0.59
Unemployment rate 0.41 0.005 0.78
Federal funds rate 0.74 0.001 0.53
1-yr T-Bond 0.57 0.002 0.33
10-yr T-Bond 0.68 0.001 0.25
Term spread 0.83 0.000 0.00
Default spread 0.89 0.000 0.42
Real M2 0.21 0.003 0.12
M2 velocity 0.78 0.007 0.11
Notes: Same notes as in Table 2 apply here.