Forecasting the Swiss economy using VECX* models: an exercise in forecast combination across models and observation windows.
Assenmacher-Wesche, Katrin ; Pesaran, M. Hashem
This paper uses vector error correction models of Switzerland for
forecasting output, inflation and the short-term interest rate. It
considers three different ways of dealing with forecast uncertainties.
First, it investigates the effect on forecasting performance of
averaging over forecasts from different models. Second, it considers
averaging forecasts from different estimation windows. It is found that
averaging over estimation windows is at least as effective as averaging
over different models and both complement each other. Third, it examines
whether using weighting schemes from the machine learning literature
improves the average forecast. Compared to equal weights the effect of
alternative weighting schemes on forecast accuracy is small in the
present application.
Key words: Bayesian model averaging; choice of observation window;
long-run structural vector autoregression
JEL Classifications: C53; C32
I. Introduction
Forecasting macroeconomic variables is of importance for market
participants and policymakers alike. Although great care is generally
taken in designing a specific forecasting model, the true forecast
uncertainty is often underestimated because various sources of
forecasting errors, like parameter and model uncertainties, are not
taken into account properly. (l) This paper considers the problem of
forecast uncertainty in the context of a long-run structural vector
error correcting model of the Swiss economy. The model includes the
effective nominal exchange rate of the Swiss franc, real gross domestic
product (GDP), the real money stock, measured by M2, the three-month
interest rate, inflation and the ratio of domestic to foreign prices as
endogenous variables, and foreign output, the foreign interest rate and
the oil price as exogenous variables. We first present an overidentified
long-run vector error correction model with exogenous variables (VECX*
model) and use it for forecasting. The model contains five long-run
relations identified as the purchasing power parity, money demand,
output convergence, uncovered interest parity, and the Fisher equation.
We then allow for forecast uncertainty along three different
dimensions. First, we deal with model uncertainty. When deciding on a
specific model, one always has to make choices like, e.g., the number of
lags to include, the number of cointegrating relations to assume, the
long-run restrictions to impose, and the data-generating processes to
adopt for the exogenous variables. In this paper, we confine ourselves
to a class of models that differ only with respect to these
characteristics instead of considering entirely different model types.
To allow for model uncertainty we apply Bayesian model averaging and
combine forecasts from several plausible specifications of the model.
Second, economic relations can be subject to structural breaks.
Pesaran and Timmerman (2007) proposed to take this into account by
estimating a model over different observation windows and then pooling
the forecasts. While estimation is more efficient if all available data
are used when the models are stable, the occurrence of structural
breaks, which are often difficult to identify and measure accurately
with statistical methods, might bias the forecasts. One pragmatic way to
deal with this is to average forecasts from models estimated over
different estimation windows. Since economic theory is more informative
regarding the nature of the long-run relations, in this exercise we do
not allow for parameter uncertainty of the long-run coefficients, but
consider alternative estimates of the short-run coefficients computed
over different observation windows starting in the fourth quarter of
each year between 1965 and 1976.
Third, we assess the usefulness of different weighting schemes in
model averaging, such as equal weights, Akaike (AIC) weights and
weighting schemes advanced in the machine-learning literature (Yang,
2004; Sancetta, 2006).
We find that averaging forecasts from different models reduces the
forecast error considerably. In addition, averaging the forecasts over
estimation windows is at least as effective as model averaging in
improving forecast precision. Moreover, averaging across the two
dimensions complements each other, and leads to further reductions in
forecast errors. By contrast, in our application, choice of the weights
when combining forecasts does not seem to be that important.
The paper is set out as follows. Section 2 discusses the
econometric methodology and presents the estimates for the baseline
version of our forecasting model. Section 3 evaluates the forecasts from
the baseline version of the model. Section 4 explores the effect of
averaging forecasts across different models and estimation windows. We
find that the forecast average across all models and estimation windows
outperforms our long-run restricted VECX* model as well as a univariate
AR(1) benchmark model. In addition, we try different weighting schemes
and assess their influence on the forecasting performance of the model.
Though one would expect that excluding models performing poorly from the
average forecast should improve results, we find that schemes weighting
models approximately equally perform better. Finally, Section 5 offers
some conclusions.
2. The VECX* model
The model used for forecasting is a structural cointegrated vector
error-correction model that relates the core macroeconomic variables of
the Swiss economy (denoted by the vector [x.sub.t]) to current and
lagged values of a number of key foreign variables (denoted by the
vector [x.sup.*.sub.t]), which we call the Swiss VECX* model. The model
is developed along the same lines as the model for the UK in Garratt,
Lee, Pesaran and Shin (2003a, 2006). A detailed documentation of the
model can be found in Assenmacher-Wesche and Pesaran (2008).
We use quarterly data starting in 1965, so that after differencing
and accounting for the necessary lags the model is estimated on data
starting in 1965Q4. We stop the estimation in 1999Q4 and use data from
2000Q1 to 2006Q3 to evaluate the recursive out-of-sample forecasting
performance.
The choice of the variables is influenced by the purpose of the
model, namely forecasting the rate of inflation and modelling the
monetary transmission process. Therefore, the model will incorporate
those key relations from economic theory that can be expected to have an
impact on the inflation rate. One of these relations is money demand,
which postulates a long-run relation between the real money stock,
denoted by [m.sub.t], the logarithm of real gross domestic product
(GDP), [y.sub.t], and the nominal interest rate, [r.sub.t], which we
take to be the three-month LIBOR rate. (2) Another is the Fisher
interestrate parity which establishes a long-run relation between the
interest rate and inflation, [[pi].sub.t]. For Switzerland as a small,
open economy the exchange rate, [e.sub.t], has an important influence on
economic activity. Therefore, purchasing power parity, which links the
nominal exchange rate to the ratio of the domestic to the foreign price
level, [p.sub.t] - [p.sup.*.sub.t], is included. In addition, we
consider the price of oil, [p.sup.oil.sub.t], as the most important
commodity price, which is expected to have direct and indirect impacts
on domestic as well as on world inflation. Finally, international
business cycles and interest-rate cycles are allowed to have an
influence on the domestic economy by considering long-run relations
between domestic and foreign real GDP and interest rates. (3) The latter
two variables, together with the oil price, are regarded as weakly
exogenous variables.
Foreign output, [y.sup.*.sub.t], and the foreign CPI,
[p.sup.*.sub.t], are computed as weighted averages, using three-year
moving averages of the trade shares with Switzerland. For example, the
foreign output is computed as
[y.sup.*.sub.t] = [N.summation over (j=1)] [[bar.w].sub.jt]
[y.sub.jt],
where [y.sub.jt] Nit is the log real output of country j, and
[[bar.w].sub.jt] are the average trade weights.
The trade weights are based on Switzerland's fifteen largest
trade partners and are computed as averages of Switzerland's
imports from and exports to the country in question divided by the total
trade of all the fifteen countries. (4) Trade to these fifteen countries
on average accounts for 82 per cent of total Swiss foreign trade. For
the construction of foreign financial variables we use weighted averages
of the US and the Euro Area variables, with the weights based on the
three-year moving averages of the trade shares of these two regions with
Switzerland. Specifically, the foreign interest rate is computed as the
weighted average of the three-month interest rates of the Euro Area and
the US, and the Swiss exchange rate is computed as the weighted average
of the log exchange rate of the Swiss franc in terms of the US dollar
and the euro. (5) This seems justified considering the dominant role
played by these economies in the evolution of the financial market
interconnections of the Swiss economy and the rest of the world. (6)
Before turning to the estimation of the VECX* model we perform unit
root tests, which indicate that the variables can be regarded as 1(1).
Initially all estimations and tests were carried out over the period
1965Q4 to 1999Q4. We reserve the rest of the available data to
investigate the forecasting performance of the model. When computing recursive out-of-sample forecasts, we only use information that would
have been available to a forecaster at that point in time. Nevertheless,
we use final vintage data so that our exercise is not an analysis of how
well forecast averaging performs in real time. (7)
We start from a conditional VECX* model for the endogenous
variables in error-correction form with a restricted trend,
[DELTA][x.sub.t] = [[PI].sub.x] [[z.sub.t-1] - [gamma](t-1)]]
+[LAMBDA][DELTA][x.sup.*.sub.t] + [p-1.summation over (i=1)] [PSI]
[sub.i[DELTA][z.sub.t-i] + [c.sub.0] + [v.sub.t], (1)
and a marginal model for the exogenous variables,
[DELTA][x.sup.*.sub.t] = [p-1.summation over (i=1)] [GAMMA]
[sub.*i[DELTA][z.sub.t-i] + [a.sub.x*0] + [u.sub.x*t], (2)
The 9x1 vector of variables [z.sub.t. = ([x.sub.'.sub.t,
[x.sup.*'.sub.t]) in the model contains six endogenous variables,
[x.sub.t] = {[e.sub.t], [m.sub.t], [y.sub.t], [r.sub.t], [pi].sub.t],
[p.sub.t] - [p.sup.*.sub.t]} and and three weakly exogenous variables,
[x.sup.*.sub.t] = ([y.sub.*.sub.t], [r.sup.*.sub.t], [p.sup.oil.sub.t]}.
Regarding the lag order of the underlying VAR (p) the Akaike
criterion indicates two lags whereas the Schwarz and the Hannan Quinn
criteria prefer a single lag. Though we start with the same number of
lags on the endogenous and exogenous variables in equation (1), we later
will set certain coefficients in[[PSI].sub.i], to zero and distinguish
between the number of lags on the endogenous variables, [p.sub.x], and
the exogenous variables, [p.sub.x*]. In addition, the lag length in the
marginal model can differ from the number of lags considered in the
conditional model. In the forecasting exercise below we shall also
consider the effects on the forecasts of the endogenous variables of
choosing different marginal models for the exogenous variables,
[DELTA][x.sub.*.t].
We next test for the number of cointegrating relations. Using
simulated critical values the trace and the maximum eigenvalue ([lambda]-max) statistics suggest that r=3 at the 10 per cent level of
significance, though the trace test only marginally rejects the
hypothesis that r=4. However, Assenmacher-Wesche and Pesaran (2008),
using data over a more recent sample (1976-2006), find five
cointegrating relations, which is more in line with the long-run theory.
In what follows we also assume five long-run relations, as economic
theory suggests, but investigate the effect of dropping cointegrating
relations later when dealing with model uncertainty.
We proceed to impose economically meaningful over-identifying
restrictions on [beta] that are in accordance with theoretical priors,
namely the purchasing power parity (PPP), money demand (MD), output
convergence based on the gap between domestic and foreign output (GAP),
interest rate parity between the domestic and foreign interest rate
(UIP), and a Fisher equation linking the domestic interest rate with
inflation (denoted by FIP). The estimates of these relations, computed
over the sample period 1965Q4-1999Q4, and their 95 per cent confidence
bounds, based on a non-parametric bootstrap with 1000 replications, are
as follows:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)
We impose a unitary income elasticity of money demand since the
estimated coefficient was close to one. We do not report estimates for
the constant term because it will be re-estimated in the recursive
out-of-sample forecasting exercise.
A likelihood ratio (LR) test of the 22 over-identifying
restrictions gives a test statistic of 106.21, compared to a
bootstrapped critical value of 61.66 for the 5 per cent level of
significance and of 71.04 for the 1 per cent significance level. The
test therefore rejects the restrictions at conventional significance
levels (the p-value is 0.1 per cent).
Since the purpose of this paper is to assess the effect of model
uncertainty on forecast performance, we impose all theoretically
motivated constraints on the long-run relations in the long-run
restricted VECX* (2,2) model and investigate the effects of relaxing
some of these restrictions later. Moreover, model uncertainty of this
type can be taken into account using Bayesian model averaging
techniques, which give a theoretical framework for considering forecasts
from various specifications (see Geweke and Whiteman, 2006). We
therefore not only explore the forecast results for our long-run
theory-consistent VECX* (2,2) specification, but also consider the
effects of changes in the number of cointegrating relations, the
identification restrictions and the lag order on the forecasting
performance of the model.
3. Forecasting with the VECX* (2,2) model Macroeconometric
forecasting is subject to different types of uncertainties that may
impact on the accuracy of a model's forecasts. These include future
uncertainty, parameter uncertainty (for a given model), and model
uncertainty. (8) Future uncertainty refers to the uncertainty that
surrounds the realisation of future shocks (innovations) to the model
under consideration. Parameter uncertainty refers to the robustness of
forecasts with respect to a given set of parameter values (for a
specific model).
The standard approach to future and parameter uncertainty is to
report confidence intervals instead of point forecasts. Nevertheless,
confidence intervals are of limited usefulness if forecasts from
multiple models are considered. Model uncertainty arises because there
is no consensus about the true model. Though tests can be applied to
search for an appropriate model specification, results are often
inconclusive and depend on the order in which the tests were performed,
so that different plausible specifications can be maintained at the end
of the search process. In addition, macroeconometric models are likely
to be subject to structural breaks due to policy changes and shifts in
tastes and technology. As Clements and Hendry (1998, 1999, 2006)
emphasise, structural breaks are often the main source of forecast
failure and represent the most serious form of model uncertainty.
In this paper we follow Pesaran and Timmermann (2007) and attempt
to deal with model uncertainty and structural breaks by pooling
forecasts from the same model but estimated over different sample
periods, as well as by pooling forecasts estimated over the same sample
period but obtained from different models. The latter type of pooling
has been the subject of an extensive literature on classical methods of
forecast combination and Bayesian model averaging, whilst the former is
new and to our knowledge has not been applied before. (9) The pooling of
forecasts from models estimated over different estimation windows is
viewed as a relatively robust and simple procedure compared to dealing
with possible structural breaks that are difficult to detect and to
exploit in forecasting in a timely manner. On this also see Pesaran and
Pick (2007).
In the following, we shall first examine the forecasting
performance of the VECX* (2,2) model discussed in Section 2 that imposes
the 22 over-identifying restrictions derived from economic theory. We
refer to these as 'long-run restricted VECX* (2,2)' forecasts.
We shall then proceed to investigate how forecasts change with different
specifications of the conditional and the marginal model, and whether
forecasts improve when they are averaged over different model
specifications. When pooling forecasts from different estimation windows
we will consider windows starting between 1965Q4-1976Q4 and assess
whether averaging of forecasts from different estimation windows helps
improve the forecasting performance. (10) Since we will average
forecasts over different model specifications and over different
estimation windows, we need a terminology to distinguish between these
two types of averaging. We will refer to the average forecast over
different models for a specific estimation window as the AveM forecast,
whereas the average forecast over estimation windows for a specific
model will be denoted by AveW. We also consider pooling of forecasts
from different models, estimated over different estimation windows. We
shall refer to these as AveAve forecasts to highlight the two distinct
dimensions over which the forecast averaging has been carried out.
Finally, we will assess the effect on forecasting performance of using
different weighting schemes to construct the AveM forecast.
To construct the forecasts we need both the conditional and the
marginal models as set out in equations (1) and (2). Combining them we
have
[z.sub.t] = [p.summation over (i=1)] [PHI] [sub.i.z.sub.t-i] +
[a.sub.0] + [a.sub.1]t + [u.sub.t],
where [z.sub.t], = ([x'.sub.t], [x.sup.*.sub.t]')',
[[PHI].sub.1] = [I.sub.m] - [[PI].sub.x] + [[GAMMA].sub.1],
[[PHI].sub.i] = [[GAMMA].sub.i] - [[GAMMA].sub.i-1] with i = 2, ...,
p-1, [[PHI].sub.p] = [[GAMMA].sub.p-1]. The coefficient matrices
[[GAMMA].sub.i], [a.sub.0] and [a.sub.l] include the parameters from
both the marginal and conditional models and are defined as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
In order to avoid deterministic trends in interest rates,
[a.sub.x*0] is set to zero in the foreign interest-rate equation.
We consider forecast horizons of up to eight quarters ahead since
this is the relevant time horizon for central banks when setting
interest rates. Our strategy for forecast evaluation is as follows. The
model is estimated to the end of 1999Q4 and one- to eight-quarter-ahead
forecasts are then produced for 2000Q1-2002Q4. The sample period is
extended by one observation, the short-run parameters are re-estimated
to the end of 2000Q1 and another set of forecasts is generated, this
time for 2000Q2-2003Q1. Since the long-run coefficients of the model
presumably change only slowly, we do not re-estimate them. This
procedure is repeated until the end of the available sample, 2006Q3, is
reached. At the end of the sample, however, we are not able to evaluate
the forecasts for longer time horizons. For the model estimated up to
2006Q2, for instance, we can only compare the one-quarter-ahead forecast
with the actual data for 2006Q3. For that reason, the forecast
statistics rely on a different number of forecasts for each horizon,
ranging from 27 forecast errors for the one-quarter-ahead forecasts to
20 for the eight-quarter-ahead forecasts.
The forecasting performance clearly depends on the evaluation
period chosen. In this respect, the period from 2000Q1-2006Q3 provides a
number of challenges for the various forecasts that we consider. Over
the whole of the forecast period, inflation was low and the quarterly
changes of the price level fluctuated in a narrow band between -1.0 and
2.3 per cent per annum. Similarly, interest rates were low compared to
their historical values whereas real money growth was strong during 2002
and 2003 and peaked at 28 per cent per annum in 2003Q2. Since the
evaluation period is somewhat atypical, it will be particularly
interesting to see if the AveAve pooling of forecasts can lead to
forecast improvements as compared to forecasts from the best (in-sample)
model.
We evaluate the forecasts in terms of their root mean squared
forecast error (RMSFE), which constitutes a specific, although widely
used loss function. (11) Let [z.sub.t+h] be the level of the variable
that we wish to forecast, i.e., the level of output, inflation, or the
interest rate. Denote the forecast of this variable formed at time t by
[??](t+h,t), and define the h-step ahead forecasted changes as
[[??].sub.t],(h) = [[??](t + h, t) - [z.sub.t] / h and the associated
h-step ahead realised changes as [x.sub.t](h) = ([z.sub.t+h] -
[z.sub.t])/h. The h-step ahead forecast error is then computed as
[e.sub.t](h) = [x.sub.t] (h) - [[??].sub.t] (h) = [[z.sub.t+h] +
[??](t + h, t)]/h.
For a forecast evaluation period from T+1 to T+n, the RMSFE is
defined as
RMSFE = 100 [square root of [(n-h+1).sup.-1] [T+n-h.summation over
(t=T)] [e.sup.2.sub.t](h)]].
For convenience, we report the RMSFE in per cent.
Table 1 shows the RMSFE per quarter in per cent for the forecasts
based on the VECX* (2,2) model for the longest estimation window, using
all available data from 1965Q4 onward. The forecasts for the exogenous
variables are from a marginal model that regresses the change in the
exogenous variables, [DELTA][x.sup.*.sub.t];, on the change in the
endogenous and exogenous variables, [DELTA][z.sub.t-1]. We denote this
marginal model by [M.sup.*.sub.a], which is also estimated sequentially
over the same sample period as the conditional model. (12) The average
RMSFE per quarter decreases with a longer forecast horizon. The reason
is that we focus on the average change per quarter in the variable over
h quarters. Though the change per quarter at longer forecast horizons is
small, this generally accumulates to a substantial deviation of the
forecast level from the actual level of the variable at long horizons.
The RMSFE for output growth is between 0.57 and 0.33 per cent per
quarter, whereas the RMSFE for inflation is 0.27 per cent for the
one-quarter horizon but decreases to 0.07 per cent at the eight-quarter
horizon. The RMSFE for the interest rate is lowest, lying at around 0.07
per cent per quarter.
Summing up, the long-run restricted VECX* (2,2) model performs
reasonably well and we will take it as one of our reference models when
investigating if forecasts can be improved by double averaging (i.e., by
following the AveAve procedure discussed above).
4. Pooling of forecasts
There is now a sizable literature showing that averaging over
different forecasts can lead to forecast improvements. The problem of
interest can be described as estimating the forecast probability density
function, Pr([Z.sub.T] + 1, h |[Z.sub.w],T]), of a vector of variables
[Z.sub.T + 1, h] = ([Z.sub.T] + 1, ...,[Z.sub.T+h]) conditional on the
available observations at the end of period T, [Z.sub.w, T] =
([Z.sub.T-w+1,(Z.sub.T-w+2], ...,[Z.sub.T]), where h denotes the
forecast horizon, and w is the size of the observation window. For a
given model, [M.sub.m], and a given estimation window, w, the forecast
probability density function Pr([Z.sub.T+1,h]|[Z.sub.w,T]) can be
estimated by [??](Z.sub.T+l,h]|[Z.sub.w,T], [M.sub.m]), which involves
estimating model [M.sub.m] over the estimation window of size w from the
end of estimation sample at T. In the face of model uncertainty,
assuming that there are M models under consideration and using Bayes
formula, we have the familiar Bayesian Model Averaging expression given
by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (4)
where [??]([Z.sub.T+1,h],[Z.sub.w,T],[M.sub.m]) is the predictive
density of [Z.sub.T+1,h], conditional on model [M.sub.m] and the
observation window w, and [??]([M.sub.m]|[Z.sub.w,T]) is the posterior
probability of model [M.sub.m]), also estimated over the observation
window w.
If a particular model, [M.sub.m], is stable over time, the best
estimator of Pr([Z.sub.T+l,h]|[M.sub.m]) would be based on all available
information, i.e, the longest estimation window possible. Standard
applications of Bayesian Model Averaging implicitly assume that all
models under consideration are stable. But in reality some or all the
models under consideration could be subject to structural breaks and
different choices of estimation samples might be warranted. The optimal
choice of the estimation window depends on the nature of the breaks
(their frequency and intensity) and is in general rather difficult to
ascertain. In the presence of unknown structural breaks, averaging over
different estimation windows is recommended (Pesaran, Pettenuzzo and
Timmermann, 2006; Pesaran and Timmermann, 2007). While leaving out
observations at the beginning of the sample will lead to less precise
coefficient estimates, one probably discards observations that stem from
a different regime and thus deteriorate forecasts. If the structural
breaks are unknown, there is a trade-off between both effects.
A pragmatic solution to the model instability problem would be to
consider a number of alternative windows, starting from a minimum window
size to the largest permitted by the available data set, and then
average the forecasts across the windows--what we have termed the Ave W
forecasts. (13) The minimum window size can be determined as a multiple
of the number of parameters being estimated, or could be based on
information regarding a known structural break nearest to the forecast
date, T. The maximum window size can be set, subject to data
availability, to be sufficiently large so that a satisfactory
approximation to the asymptotic theory that underlies the estimation of
model [M.sub.m] can be achieved. In most macroeconomic applications,
including the one in this paper, the maximum window size coincides with
the longest observation window that is available. This might not,
however, be the case when forecasting high frequency financial data.
Allowing for both model and estimation window uncertainty yields
the following Ave Ave formula
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where [Z.sub.T,T] = ([z.sub.l], ...,[z.sub.T]) denotes all the
available observations, [??]([M.sub.m] [Z.sub.w,T]) is the weight
attached to model [M.sub.m], m=l,2, ...,M, estimated over the estimation
window w=T,T-1, ...,T-W+1, at the end of period T; the windows are
arranged from the longest window of size T, to the shortest window of
size T-W+1.
Bayesian model averaging requires the specification of the prior
probability of model [M.sub.m] and of the prior probability of the
model's coefficients, era, conditional on [M.sub.m], for m=l,2,
...,M. In our applications we focus on equal weights. This approach is
justified if the data-generation process is subject to structural breaks
and uncertainty over which model is the right one is diffused. It
entails the risk, however, that one considers bad models in the average
that should better have been left out. We first present forecast
averages that weight all forecasts equally, before we investigate other
weighting schemes that have been proposed in the literature. (14)
4.1 Average over different model specifications (AveM)
When averaging forecasts from different model specifications, we
first need to define the class of models to be considered. To improve
forecast performance by pooling forecasts from several models, it is
important that the models considered are statistically viable and
economically meaningful. This is especially relevant when equal weights
are used, since they do not take account of past model performance. With
this in mind we make the following choices. We base our choice of
alternative models on the long-run restricted VECX*(2,2) model developed
in Section 2. First, we consider uncertainty regarding the number of
cointegrating relations. Second, we will vary the order of the lags on
the endogenous and the exogenous variables, [P.sub.x] and [p.sub.x*], in
the VECX* ([P.sub.x],[P.sub.x*]) specification in equation (1). Third,
we shall consider different specifications for the model we use to
forecast the exogenous variables. (15)
In general, one would expect that imposing long-run equilibrium relations should improve the forecasting performance of a model, at
least over the medium to long-term horizons. Testing the restrictions
implied by economic theory in Section 2, however, gave ambiguous results
as to whether these restrictions are consistent with the data.
Therefore, the first set of models we shall consider differ with respect
to the long-run restrictions that are imposed. While economic theory
suggested five long-run relations, the statistical tests pointed to the
existence of only three or possibly four cointegrating vectors. One way
to deal with this uncertainty is to estimate several models with
different restrictions and to average forecasts across these models.
Since we are uncertain about the true cointegration rank, r, of
[[PI].sub.x] we consider all possible ranks between r=l and r=5. When
having fewer than five cointegrating vectors, we do not know which of
the over-identified economic relations, i.e., PPP, money demand, output
gap, uncovered interest parity or the Fisher relation, to impose. We
therefore compute forecasts with all possible combinations of
over-identifying restrictions. Specifically, we have five possible
combinations of long-run restrictions when r=l, ten possible
combinations when r--2, and so on. In total, this gives 31 different
model specifications. (16) In addition, we consider models with one to
five exactly identified cointegrating vectors. This gives a total of 36
different model specifications.
Averaging over forecasts from different specifications of the
long-run restrictions generally improves on forecasts based on the VECX*
(2,2) model. Table 2 shows the RMSFE for output growth, inflation and
the interest rate for the average over the 36 different model
specifications, applying equal weight to each model when computing the
average. (17) At the one-year horizon, we find a reduction in the RMSFE
of between 10 and 20 per cent for output and the interest rate and of
even more than 50 per cent for inflation. (18)
Next, we will consider different lag lengths for the endogenous and
exogenous variables in the conditional model. Using the estimation
sample ending in 1999Q4, the Akaike criterion pointed to the inclusion
of two lags whereas the Schwarz and the Hannan-Quinn criteria favoured
one lag. We therefore consider all possible combinations of one and two
lags for the endogenous and exogenous variables, i.e., in addition to
our long-run restricted VECX* (2,2) model, we compute forecasts from a
VECX* (2,1), a VECX* (1,2) and a VECX* (1,1) model. Testing for
cointegration in these additional three models (again for the estimation
sample ending in 1999Q4), we find a cointegration rank of either r=3 or
r=4. We therefore compute averages over the same 36 model specifications
discussed above also for the VECX* (2,1), VECX* (1,2) and VECX* (1,1)
models.
Averaging forecasts from all models is likely to improve forecast
performance further. In the following, we present the RMSFEs for the
average forecasts from the 36 different specifications of the long-run
relations up to four quarters ahead. The first column in table 3 shows
that the average forecast based on the VECX*(2,2) model performs best
for inflation, whilst those based on the VECX* (2,1) model produce best
forecasts for output and interest rates.
Next we investigate the effect on the forecasting performance of
using different marginal models for the exogenous variables. We will
consider two different specifications. First, we regress the change in
the exogenous variables, [DELTA][x.sup.*.sub.t], on [DELTA][z.sub.t-1]
(i.e., the first lagged change in the endogenous and exogenous
variables), see equation (2). We call this the [M.sup.*.sub.a] model.
Second, we include only the lagged changes in the exogenous variables,
[DELTA][x.sup.*.sub.t-1] as regressors in the marginal model for
[DELTA][x.sup.*.t]. This latter choice can be motivated by Switzerland
being a small economy that has no influence on foreign variables. We
refer to this marginal model as the [M.sup.*.sub.b] model. For
forecasting both marginal models, [M.sup*.sub.a] and [M.sup.*.sub.b] are
estimated sequentially over the same sample period as the conditional
model. While we include a constant in the equations (in first
differences) for foreign output and the oil price, the equation for the
foreign interest rate is estimated without a constant in order not to
generate a trend in the level of the interest rate.
To assess the improvement coming from an explicit marginal model
for the exogenous variables, we also compute forecasts with the
exogenous variable set to their unconditional sample mean
([M.sub.*.sub.c]). In effect, this corresponds to regressing each of the
exogenous variables on a constant only. Note that also in this case the
mean is computed sequentially over the same period as the conditional
model (i.e., up to and including period T, T+1, etc.) so that no
post-sample information is used in computing the forecasts of [x.sup.*].
Finally, we set the exogenous variables to their realised values, which
we call the [M.sup.*.sub.d]) model. (19) As at the time of forecasting
the realised values of [x.sup.*] are unknown, these forecasts are not
feasible and are provided as a benchmark against which the other
feasible marginal models can be assessed.
Averaging the forecasts from different lag specifications and
marginal models is also likely to result in forecast improvements. Table
3 shows that the [M.sup.*.sub.b] marginal model produces a lower RMSFE
for output and the interest rate, while the [M.sup.*.sub.a] model
generates better forecasts of inflation. Perhaps not surprisingly, the
RMSFE is smallest if the realised values for the exogenous variables are
used. But setting the exogenous variables to their sample means also
produces a low RMSFE that is comparable to those of the other marginal
models. A possible reason is that changes in the exogenous variables, in
particular the oil price, are close to a random walk and thus difficult
to forecast. Finally, the AveM results based on forecasts across the
different marginal models are shown in the third column of table 3. We
compute averages over the [M.sup.*.sub.a] and [M.sup.*.sub.b] models
only since [M.sup.*.sub.c] and [M.sup.*.sub.d]) do not constitute proper
models for the exogenous variables.
The last row in each panel of table 3 shows the RMSFE for forecasts
that are averaged across different conditional models. Of particular
interest is the average over both the different conditional and the
marginal models, which is in the third column of the last row in each
panel of table 3. Averaging over all model dimensions produces an RMSFE
that is close to the lowest of all individual RMSFEs in the table. This
leads us to expect a further improvement in forecast performance if
different estimation windows are taken into account - an issue that we
will explore next.
4.2 Averages over estimation windows (AveW)
We investigate the effect of changing the estimation window by
estimating each model on a sample starting in 1965Q4 and then reducing
the estimation sample successively by leaving out one year at a time at
the beginning of the sample. Our shortest estimation window starts in
1976Q4, which is just after the breakdown of the Bretton Woods System that has changed the behaviour of many macroeconomic variables
considerably. (20) This gives a total of twelve different estimation
windows. For the over-identified models, the long-run slope coefficients
are kept constant at their 1965Q4-1999Q4 values and are not re-estimated
over the shorter sample periods. (21) Since the long-run relations are
based on economic theory, we can expect them to be more stable across
time than the short-run adjustment coefficients, which are estimated
from the data without any restrictions. Moreover, there is little
agreement in economic theory on the forces that drive the short-run
adjustment of macroeconomic variables to their long-run equilibrium
values. Note that the just-identified [beta] vectors are re-estimated
since we cannot attach an economic interpretation to them.
[FIGURE 1 OMITTED]
[FIGURE 2 OMITTED]
[FIGURE 3 OMITTED]
Figures 1-3 indicate that averaging forecasts from models estimated
over different estimation windows improves the forecasts. The figures
display the distribution of quarterly RMSFEs for forecasts of inflation,
output growth and the short-term interest rate over the next year for
each model, estimated over twelve different estimation windows, starting
between 1965Q4 and 1976Q4. The estimation windows are shown on the
horizontal axis and the RMSFE on the vertical axis. Since we have 36
different specifications for [beta], four different lag lengths and two
marginal models, this gives a total of 288 models for each estimation
window. The whiskers of the error bars indicate the 15th percentile and
the 85th percentile of the RMSFEs, while the lower end of the box marks
the 25th percentile and the upper end the 75th percentile. The line
inside the box represents the median. RMSFEs falling outside the 15th
and the 85th percentile are marked by dots. The RMSFE from our long-run
restricted VECX* (2,2) model is identified by an asterisk. We see that
for the longer estimation windows the VECX* (2,2) does not perform
particularly well, whereas its RMSFE for output growth and inflation is
in the lower quartile range for the estimation windows starting after
1974. This suggests the presence of a structural break, but this
information is, of course, not available ex ante.
4.3 Averaging over models and windows (AveAve)
In the following, we will discuss how forecasting performance
improves when forecasts are averaged both across models and estimation
windows. From figures 1-3 it is apparent that considerable variability
in RMSFEs is present, both across the model and the window dimensions.
In particular, windows starting in 1973Q4 and 1974Q4, i.e., at the time
of the first oil-price shock, display comparatively large RMSFEs. One
can also see, however, that not all models are affected in the same way
by the choice of estimation window. The straight line in figures 1-3
represents the RMSFE for forecasts that are averaged both across models
and estimation windows, denoted as the 'AveAve' forecast. In
all cases the AveAve forecast lies in the lower part of the distribution
of RMSFEs.
[FIGURE 4 OMITTED]
[FIGURE 5 OMITTED]
Figures 4-6 show the RMSFE across different forecast horizons. For
each forecast horizon the AveAve RMSFE is marked by an asterisk and the
AveM RMSFE for the longest estimation window by a circle. Since we
consider forecasts from all models estimated over all estimation
windows, we have 3456 forecasts at each forecast horizon. Again, the
AveAve forecast performs well compared to the RMSFE of individual models
while for inflation the AveM forecast for the longest window performs
almost as well as the AveAve inflation forecast. For output growth and
the interest rate averaging forecasts from models estimated over
different estimation windows results in a further improvement of
forecasts, especially at longer forecast horizons. Note, however, that
the AveM RMSFE for inflation is in the lowest quartile at all forecast
horizons already so that the scope for further improvement is small.
[FIGURE 6 OMITTED]
Averaging forecasts across different dimensions is an attractive
strategy to improve forecast performance. Though some models beat the
AveAve forecast, these models are not the same for the different
variables and also change with the estimation window. It is thus
apparent that the ex ante information needed to pick the best model is
not available in practice. By considering the average over different
windows, the forecaster is able to hedge against a bad forecasting
performance from a particular window. Since a priori one does not know
how the choice of the sample period will affect the forecasting
performance, averaging forecasts from models estimated over different
windows seems a useful practical way of dealing with this uncertainty.
4.4 Evaluating the AveAve forecast
While it is apparent that the AveM and the AveAve forecasts perform
well, it is interesting to know how much one would have gained if one
had picked the best model instead of using average forecasts. Two useful
measures are the percentage of models that have a lower RMSFE than the
AveM forecast and the difference between the average RMSFE of the models
with a lower RMSFE and the AveM RMSFE. Table 4 provides these summary
statistics for the performance of the AveM forecast relative to the
individual forecasts. Regarding the results for the different estimation
windows, only fewer than 20 per cent of the models manage to beat the
AveM forecast of inflation whereas less than 32 per cent of the models
outperform the AveM forecast for output growth. For most estimation
windows, however, these figures are considerably lower. For the interest
rate, the AveM forecast performs slightly worse but still beats at least
50 per cent of the individual model forecasts.
When it comes to the AveAve forecast, results are even more
supportive of the averaging strategy. For inflation and output only 11
per cent of the individual RMSFEs are lower than the RMSFE for the
AveAve forecast, whereas for the interest rate this figure rises to 32
per cent of the individual models. The average gain of using the better
performing models in terms of the percentage reduction in RMSFE is small
and amounts to about 15 per cent for output and the interest rate, and
25 per cent for inflation. One needs to keep in mind, however, that the
information needed to pick the best performing model/window is not known
ex ante.
We now turn to a comparison of the predictive accuracy of the
AveAve forecasts relative to the forecasts from the long-run restricted
VECX* (2,2) model, and an alternative simple benchmark model, namely a
univariate AR(1) model. (22) To assess whether the improvement in
forecasting accuracy is significant, we apply the test of predictive
accuracy proposed by Diebold and Mariano (1995). The test is based on a
comparison of forecast errors from two different models, i and j,
according to some loss function, [L.sub.t], of the forecast errors, and
tests whether the loss differential of two different forecasts is
significantly different from zero. We consider the squared loss
[L.sup.s.sub.ij,t] = ([x.sub.t] - [[[??].sub.t,h,i].sup.2] - [x.sub.t] -
[[[??].sub.t,h,j].sup.2] and the absolute loss, [L.sup.a.sub.ij,t] =
[absolute value of ([x.sub.t] - [[[??].sub.t,h,i]] - [absolute value of
[x.sub.t] - [x.sub.t,h,j]], where i is the AveAve forecast and j the
forecast from either the long-run restricted VECX* (2,2) model or a
univariate AR(1) model. When considering forecasts more than one-step
ahead, the loss differentials will be serially correlated. To estimate
the variance of the loss differential we therefore use a
heteroscedasticity and autocorrelation consistent estimate of the
variance and correct for serial correlation of order h-l, where h is the
forecast horizon. We follow Harvey, Leybourne and Newbold (1997), who
suggest applying a correction factor to the DieboldMariano test
statistic and evaluating significance relative to the critical value
from the Student's t distribution. We consider only forecasts up to
four steps ahead since for longer horizons the number of independent
observations becomes too small to expect significant results.
The upper panel of table 5 shows that the AveAve forecast
outperforms the forecast of the long-run restricted VECX* (2,2) model,
which is indicated by a negative test statistic. In particular, the
AveAve forecast is significantly better than the long-run restricted
VECX* (2,2) model when considering the squared forecast errors, except
for output growth three and four quarters ahead, and inflation one,
three and four quarters ahead. Deschamps (2007) notes that even for h=l
forecast errors need not be serially uncorrelated if the parameter
values of the true model are unknown, and hence a semiparametric
estimate of the variance may be necessary also in this case. Indeed, if
a correction for first-order autocorrelation is applied, the test
statistic becomes -1.807 and thus significant. Regarding the absolute
loss, the AveAve forecast is significantly better than the VECX* (2,2)
forecast for the interest rate and output growth up to two quarters
ahead, but not for inflation.
The lower panel of table 5 shows that, compared to the forecast
from a univariate AR(1) model, the AveAve forecast improves
significantly for inflation and the interest rate but not for output.
(23) Again, the one-step-ahead test statistic for the squared loss for
inflation becomes -3.869 and thus significant if serial correlation is
allowed for. The fact that the AveAve forecast does not lead to a better
prediction of output growth indicates that the additional information
coming from the other variables in the model does not help to improve
forecasts over the information embodied in past output growth. This,
however, might be a consequence of the particular forecast period
chosen, which includes a high degree of uncertainty in the financial
markets during 2001/2002 that subsequently led to a recession, and a
steep rise in the oil price in 2004 that coincided with an economic
recovery.
Summing up, averaging forecasts from different windows and models
seems to perform well and is worthy of further consideration.
4.5 Results for different weighting schemes
While up to now we have used equal weights, we next turn to the
question of how best to combine the forecasts from different models,
i.e., the effect of different weighting schemes on the average
forecasts. In addition to equal weights, we consider weighting by the
AIC criterion (see Pesaran, Schleicher, and Zaffaroni, 2007), the
weighting scheme proposed by Yang (2004) and the online weights
discussed in Sancetta (2006). A description of the weighting schemes can
be found in the appendix. First, we discuss the evolution of weights
during the forecast horizon before we look at the influence on the RMSFE
for the inflation forecast for up to four quarters ahead. The
alternative weighting schemes are compared with respect to the
conditional models only, and the uncertainty associated with the choice
of the marginal models is dealt with by simple averaging.
[FIGURE 7 OMITTED]
Different weighting schemes imply markedly different weights with
which the forecasts from a particular model enter the average. Figure 7
shows the evolution of the weights for the longest estimation window
over the forecast period. Since it is impossible to depict the weight
for each individual model, we show the sum of weights for the VECX*
(2,2), the VECX* = (2,1), the VECX * (1,2) and the VECX* (1,1) models.
The online weights stay close to the equal weights, whereas the AIC
weights tend to place most of the weight on the VECX * (1,2) model with
only the long-run output gap relation imposed. The weighting scheme by
Yang (2004) starts out with equally weighted models for the first period
but re-adjusts weighting quickly, favouring a single model type at the
time.
In choosing the weights, the forecaster faces a trade-off. On the
one hand, the worst (historically) performing models should be excluded
from the combined forecast. On the other hand, if model averaging is to
provide a hedge against the failure of a particular model, convergence
of the weights to a single model is not attractive. Since the AIC
weights use the exponential difference between model m's AIC and
the maximum AIC over all models, small differences in the log-likelihood
will result in a large change in the weight. There is no guarantee,
however, that the historically best model according to the AIC will
always produce good forecasts. Therefore, weighting schemes that retain
a broader portfolio of models, even if their performance was not among
the best ones, may work better in practice.
Table 6 shows the RMSFE for the inflation forecast up to four
quarters ahead with different weighting schemes. Apparently, equal
weights perform quite well when compared to more sophisticated weighting
schemes. (24) The online-weighting scheme is able to reduce the RMSFE
slightly as compared to equal weights for some of the estimation windows
but not for the Ave Ave forecast. By contrast, the AIC weights and the
weighting schemes by Yang (2004) are unable to outperform equal
weighting. This may be due to the fact that we consider quite similar
models so that the advantages of keeping a large portfolio of models
outweigh the benefit of excluding the worst performing ones.
5. Conclusions
In this paper, we developed a long-run structural model for
Switzerland and tested for long-run relationships derived from economic
theory. We found five cointegrating relations that we identified as PPP,
money demand, international output growth, uncovered interest parity and
the Fisher interest parity. We then investigated forecasting performance
of different versions of this model, maintaining different assumptions
with respect to the long-run relations, the lag length and the
specification of the marginal model. Furthermore, we considered
forecasts constructed from models that were estimated over different
estimation windows.
We found that forecast averaging tends to improve forecasting
performance and provides a hedge against poor forecast outcomes. While
averaging across different models lowers the RMSFE of forecasts,
averaging over estimation windows leads to an additional reduction in
the forecast error and is thus at least as important as model averaging.
Finally, we found that equal weights perform reasonably well when
aggregating forecasts. The rationale behind this finding is that
convergence of weights towards a single model is not attractive in
practice if the researcher does not know whether the true model is among
the set of models under consideration. In such cases, average forecasts
based on a portfolio of models estimated over different windows are
likely to perform better than forecasts based on a single model
considered to be the 'best' based on in-sample information.
Appendix: Weighting schemes
Let [f.sub.mth] be the [m.sup.th] model's h-step ahead
forecast of a scalar random variable, z, formed at date t for date t+h,
with m=1,2,..., M, t=1,2,.... Let [[omega].sub.mth] > 0, [M.summation
over (m=1)] [[omega].sub.mth]=1, be the weight to be attached to this
forecast at time t in arriving at the pooled forecast defined by
[f.sub.t,h)([omega]) = [M.summation over (m=1)]
[[omega].sub.mth][f.sub.mth].
Many different weighting schemes can be considered. One possibility
is to use equal weighted combinations defined as
[f.sub.t,h)(1/M) = 1/M [M.summation over (m=1)] [f.sub.mth].
Another one is to approximate Pr([M.sub.m]|Z.sub.T,T]) by Akaike
weights or Schwartz weights. The latter give a Bayesian approximation
when the estimation sample is sufficiently large (see Pesaran,
Schleicher and Zaffaroni, 2007).
Here, we will consider AIC weights that are computed as follows:
[[bar.[omega]].sub.m,t-1] = exp([[DELTA].sub.m,t-1])/[[summation
of].sup.M.sub.j=1] exp([[DELTA].sub.j,t-1])
where [[DELTA].sub.m,t-1] = [AIC.sub.m,t-1] -
[max.sub.j]([AIC.sub.j,t-1]) and [AIC.sub.m,t-1] = [LL.sub.m,t-1] -
[[theta].sub.m,t-1], and [LL.sub.m,t-1] indicates the maximised
logarithm of the likelihood function of model m with [[theta].sub.m]
parameters. (25)
Yang (2004) proposes the following weights for h=1 (see his
equation (4) on page 186)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where [f.sub.mt] is the one-step ahead forecast of [z.sub.t] formed
at time t, and the model priors, [[pi].sub.m], can be set to 1/M. This
formula uses an expanding window for the construction of weights and can
be modified to use a rolling window of size D,
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
An h-step ahead version can be written as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where [s.sup.2.sub.m[tau]h] are computed from an expanding window
(or a rolling window of size h' > h)
[s.sup.2.sub.m[tau]h] = [[tau]-h.summation over (i=[tau] - h'
- h + 1] [([z.sub.i] - [f.sub.mih]).sup.2]/h',
where h' = [tau] - h in the case of an expanding window.
Alternatively, weighting schemes from machine learning literature
can be used. One such scheme uses the following algorithm (Sancetta,
2006): Let t = [tau] be the initial forecast date and set
[[omega].sub.m[tau]h = 1/M. For date t = [tau] + h, [tau] + h + 1,...,
use the following formula to update the weights
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
where
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],
[z.sub.t-h] is the realised value of z at the end of date t - h,
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
Note that by construction the new weights satisfy [[??].sub.m,t -
-h,h] > 0, and [M.summation over (m=1)] [[??].sub.m,t-h,h] = 1].
In the empirical application we set A = [10.sup.5], [alpha] = 0.5
and [gamma] = 0.05.26
REFERENCES
Assenmacher-Wesche, K. and Pesaran, M.H. (2008), A VECX* model of
the Swiss economy', Swiss National Bank Economy Studies
(forthcoming).
Clark, T.E. and McCracken, M.W. (2004), 'Improving forecast
accuracy by combining recursive and rolling forecasts', Federal
Reserve Bank of Kansas City, mimeo.
--(2006), 'Averaging forecasts for VARs with uncertain
instabilities', Federal Reserve Bank of Kansas City, Research
Working Paper No. RWP-06-12.
Clements, M.P. and Hendry, D.F. (I 998), Forecasting Economic Time
Series, Cambridge, Cambridge University Press.
--(1999), Forecasting Non-Stationary Economic Time Series,
Cambridge (Mass.), MIT Press.
--(2006), 'Forecasting with breaks', in Elliot, G.,
Granger, C.W.J. and Timmermann, A. (eds), Handbook of Economic
Forecasting, Amsterdam, Elsevier, pp. 605-57.
Deschamps, P.J. (2007), 'Comparing smooth transition and
Markov switching autoregressive models of US unemployment',
Universite de Fribourg, mimeo.
Diebold, F.X. and Mariano, R.S. (1995), 'Comparing predictive
accuracy', Journal of Business and Economic Statistics, 13, pp.
134-44.
Elliot, G. and Timmermann, A. (2007), 'Economic
forecasting', CEPR Discussion Paper 6158.
Garratt, A., Lee, K., Pesaran, M.H. and Shin, Y. (2003a), 'A
long run structural macroeconometric model of the UK', Economic
Journal, 113, pp. 412-55.
--(2003b), 'Forecast uncertainties in macroeconomic modeling:
an application to the UK economy', Journal of the American
Statistical Association, 98, pp. 829-38.
--(2006), Global and National Macroeconometric Modelling: A Long
Run Structural Approach, Oxford, Oxford University Press.
Geweke, J. and Whiteman, C.H. (2006), 'Bayesian
forecasting', in Elliot, G., Granger, C.W.J. and Timmermann, A.
(eds), Handbook of Economic Forecasting, Amsterdam, Elsevier, pp. 3-80.
Harvey, D., Leybourne, S.J. and Newbold, P. (1997), 'Testing
the equality of prediction mean squared errors', International
Journal of Forecasting, 13, pp. 281-91.
Jordan, T.J. and Savioz, M.R. (2003), 'Does it make sense to
combine forecasts from VAR models? An empirical analysis with inflation
forecasts for Switzerland', Schweizerische Nationalbank,
Quartalsheft, 4, pp. 80-93.
Pesaran, M.H., Pettenuzzo, D. and Timmermann, A. (2006),
'Forecasting time series subject to multiple structural
breaks', Review of Economic Studies, 73, pp. 1057-84.
Pesaran, M.H. and Pick, A. (2007), 'Forecasting random walk
models under drift instability', University of Cambridge, mimeo.
Pesaran, M.H., Schleicher, C. and Zaffaroni, P. (2007), 'Model
averaging in risk management with an application to futures
markets', mimeo.
Pesaran, M.H. and Skouras, S. (2002), 'Decision-based methods
for forecast evaluation', in Clements, M.P. and Hendry, D.F. (eds),
A Companion to Economic Forecasting, Oxford, Basil Blackwell, pp.
241-67.
Pesaran, M.H. and Timmermann, A. (2007), 'Selection of
estimation window in the presence of breaks', Journal of
Econometrics, 137, pp. 134-61.
Sancetta, A. (2006), 'Online forecast combination for
dependent heterogeneous data', University of Cambridge, mimeo.
Smith, J. and Wallis, K.F. (2007), 'A simple explanation of
the forecast combination puzzle', University of Warwick, mimeo.
Timmermann, A. (2006), 'Forecast combinations', in
Elliot, G., Granger, C.W.J. and Timmermann, A. (eds), Handbook of
Economic Forecasting, Amsterdam, Elsevier, pp. 135-96. Yang, Y.
(2004), 'Combining forecasting procedures: some theoretical
results', Econometric Theory, 20, pp. 176-222.
NOTES
(1) For a recent review of the literature on forecasting see
Elliott and Timmermann (2007).
(2) We measure real money by the logarithm of M2, deflated with the
consumer price index (CPI).
(3) Interest rates are expressed as 0.251n(1+R/100) where R is the
interest rate in per cent per annum to make units of measurement compatible with the rate of inflation, which is computed as the first
difference of the logarithm of the quarterly price level.
(4) Data on imports and exports are from the 'Eidgenossische
Zollverwaltung'. We use trade data up to period t since these data
arrive in a timely fashion. In forecasting we therefore do not make use
of information that is not available at the time the forecasts are made.
(5) The interest rate and the exchange rate for the Euro Area are
linked to German data before 1999.
(6) The appendix in Assenmacher-Wesche and Pesaran (2008) contains
detailed information on how the data were constructed.
(7) Clark and McCracken (2006) investigate forecast averaging as a
method to deal with data revisions in real time.
(8) See, e.g., Garratt, Lee, Pesaran and Shin (2003b).
(9) Timmermann (2006) surveys the literature on forecast
combinations, while Geweke and Whiteman (2006) discuss forecast
combinations in a Bayesian setting. Clark and McCracken (2004) combine
rolling and recursive forecasts but do not average over forecasts
derived from different models estimated over different observation
windows.
(10) As discussed in Pesaran and Timmermann (2007), it is also
possible to combine forecasts from different estimation windows using
time-varying weights based on the past performance of different
forecasts using a cross-validation approach. However, such a procedure
is data intensive and does not seem suitable for quarterly
macroeconometric forecasting.
(11) Other possible loss functions are the bias, measuring how far
the mean of the forecast is from the mean of the actual series or the
proportion of correctly predicted directions of change in a variable.
Pesaran and Skouras (2002) discuss other decision-based methods for
forecast evaluation.
(12) We shall discuss the effects of using different marginal
models and estimation windows on the forecast performance later on.
(13) See Pesaran and Pick (2007) for some theoretical results on
the Ave W procedure.
(14) The weighting schemes are discussed in the appendix.
(15) Of course, it would be possible to consider other
alternatives, such as VECX * models in inflation and output growth but
with fewer or more variables than considered in this paper. However,
this particular strategy for generating alternative forecasting models
will not be pursued here.
(16) Precisely, there are [2.sup.5]-1 combinations since we exclude
the model without any long-run restrictions.
(17) We still consider a VECX * model with two lags of the
endogenous and the exogenous variables and the [M.sup.*.sub.a], marginal
model for the exogenous variables. Both models are estimated over the
longest estimation window, starting in 1965Q1.
(18) The advantages of averaging forecasts from different models
when forecasting Swiss inflation are documented in Jordan and Savioz
(2003).
(19) The [M.sup.*.sub.d] model corresponds to what is done in
so-called 'scenario forecasts' where the exogenous variables
are assumed to be known.
(20) Since the model contains a fairly large number of estimated
coefficients, a further reduction in sample size does not seem
appropriate.
(21) The constants in equation (3) are re-estimated together with
the short-run coefficients.
(22) In the literature univariate AR(I) models are often chosen as
benchmark for forecast evaluation since they are hard to outperform
despite their simplicity.
(23) These results remain unchanged when the average over
estimation windows (Ave W) for the AR(1) model is considered as
benchmark instead.
(24) This result is often found in the forecast combination
literature though not completely understood yet (see Timmermann 2006).
Smith and Wallis (2007) explain this 'forecast combination
puzzle' by the finite-sample error in estimating the combining
weights.
(25) For the exactly identified models, [[theta].sub.m] is given by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].
(26) We chose a value of A much higher than recommended by Sancetta
(2006) since otherwise the online weights were indistinguishable from
equal weights. It is interesting to note that results remain basically
unaffected if we change the weights ex-post by choosing [alpha] = {0.5,
0.4, 0.3, 0.2} and [gamma] = {0.05, 0.10}. Since our evaluation sample
with at most 27 observations is rather short, we are unlikely to benefit
from online weighting.
Katrin Assenmacher-Wesche * and M. Hashem Pesaran **
* Swiss National Bank, e-mail: katrin.assenmacher-wesche@snb.ch. **
Cambridge University, CIMF, and USC. The views expressed in this paper
are solely our own and not necessarily shared by the Swiss National
Bank. We are grateful to Mahdi Barakchian, Sylvia Kaufmann, James
Mitchell, Andreas Pick, Alessio Sancetta, Ron Smith and participants of
the Oxford Forecasting Workshop and the BuBa-OeNB-SNB Workshop for
helpful comments on an earlier version.
Table 1 . RMSFE for long-run restricted VECX* (2,2) model with
over-identified [beta]
RMSFE in % # [y.sub.t] [[pi].sub.t] [r.sub.t]
Horizon
1 step ahead 27 0.572 0.272 0.070
2 step ahead 26 0.457 0.155 0.068
3 step ahead 25 0.428 0.122 0.066
4 step ahead 24 0.402 0.101 0.068
8 step ahead 20 0.328 0.069 0.063
Note: Sequential out-of-sample forecasts from 2000Q1-2006Q3,
estimation period 1965Q4-1999Q4. The forecast statistics
pertain to forecasts for h steps ahead, divided by the
forecast horizon, h. Forecasts of the exogenous variables
come from the [m.sup.*.sub.a] marginal model. # indicates
the number of point forecasts available to compute the RMSFE.
Table 2. RMSFE for average forecast over different [beta] of
VEX* (2,2) model
RMSFE in % # [y.sub.t] [[pi].sub.t] [r.sub.t]
Horizon
1 step ahead 27 0.540 0.236 0.067
2 step ahead 26 0.407 0.113 0.062
3 step ahead 25 0.363 0.082 0.058
4 step ahead 24 0.327 0.066 0.060
8 step ahead 20 0.232 0.039 0.062
Note: Sequential out-of-sample forecasts from 2000Q1-2006Q3,
estimation period 1965Q4-1999Q4. The forecast statistics
pertain to forecasts for h steps ahead, divided by the
forecast horizon, h. Forecasts of the exogenous variables
come from the [m.sup.*.sub.a] marginal model. # indicates
the number of point forecasts available to compute the
RMSFE.
Table 3. RMSFE for average four-quarter-ahead forecast
across different model dimensions
[m.sup.*.sub.a] [m.sup.*.sub.b] Average
[y.sub.t]
VECX*(2,2) 0.327 0.318 0.315
VECX*(2,1) 0.315 0.307 0.302
VECX*(1,2) 0.352 0.336 0.342
VECX*(1,1) 0.331 0.313 0.319
Average 0.325 0.316 0.316
[[pi].sub.t]
VECX*(2,2) 0.066 0.069 0.067
VECX*(2,1) 0.068 0.069 0.068
VECX*(1,2) 0.072 0.075 0.073
VECX*(1,1) 0.075 0.077 0.076
Average 0.069 0.071 0.070
[r.sub.t]
VECX*(2,2) 0.060 0.058 0.058
VECX*(2,1) 0.056 0.053 0.054
VECX*(1,2) 0.063 0.060 0.061
VECX*(1,1) 0.060 0.056 0.058
Average 0.059 0.056 0.058
[m.sup.*.sub.c] [m.sup.*.sub.d]
[y.sub.t]
VECX*(2,2) 0.314 0.335
VECX*(2,1) 0.306 0.330
VECX*(1,2) 0.331 0.271
VECX*(1,1) 0.314 0.299
Average 0.313 0.305
[[pi].sub.t]
VECX*(2,2) 0.065 0.044
VECX*(2,1) 0.066 0.045
VECX*(1,2) 0.073 0.068
VECX*(1,1) 0.075 0.064
Average 0.069 0.052
[r.sub.t]
VECX*(2,2) 0.057 0.027
VECX*(2,1) 0.054 0.028
VECX*(1,2) 0.058 0.026
VECX*(1,1) 0.057 0.024
Average 0.056 0.025
Note: Sequential out-of-sample forecasts from 2000Q1 to 2006Q3.
The table shows the RMSFE of the average forecast over different
[beta] per quarter. [m.sup.*.sub.a] and [m.sup.*.sub.b] indicate
the marginal models described in Section 4.1, [m.sup.*.sub.c]
and [m.sup.*.sub.d] set the exogenous variables to their sample
mean or their realised value, average indicates the average over
the [m.sup.*.sub.a] and [m.sup.*.sub.b] marginal models. The
marginal models are estimated over the same sample as
the conditional model. All results are averaged over the
different choices for [beta].
Table 4. Summary of performance of AveM forecast
relative to individual forecasts across estimation windows
[y.sub.t] [[pi].sub.t]
Window Percent Exceedence Percent Exceedence
1965 Q4 13.542 0.057 16.667 0.021
1966 Q4 13.542 0.057 19.097 0.021
1967 Q4 11.458 0.057 15.972 0.022
1968 Q4 8.681 0.055 19.444 0.020
1969 Q4 13.889 0.061 18.403 0.019
1970 Q4 27.431 0.081 16.667 0.021
1971 Q4 31.250 0.086 13.194 0.025
1972 Q4 29.167 0.085 7.986 0.027
1973 Q4 16.667 0.070 18.403 0.040
1974 Q4 6.250 0.051 5.556 0.021
1975 Q4 14.236 0.025 4.861 0.021
1976 Q4 15.972 0.023 7.292 0.020
AveAve 10.619 0.054 10.735 0.017
AveAve RMSFE 0.313 0.069
[r.sub.t]
Window Percent Exceedence
1965 Q4 31.944 0.006
1966 Q4 30.903 0.007
1967 Q4 32.986 0.007
1968 Q4 30.556 0.008
1969 Q4 26.736 0.010
1970 Q4 30.208 0.008
1971 Q4 42.014 0.008
1972 Q4 46.181 0.007
1973 Q4 50.000 0.010
1974 Q4 30.556 0.010
1975 Q4 31.944 0.007
1976 Q4 31.944 0.007
AveAve 32.060 0.010
AveAve RMSFE 0.054
Note: Sequential out-of-sample forecasts from 2000Q1 to 2006Q3.
Forecasts are averaged over all models and pertain to the
four-quarter-ahead forecast. Per cent shows the share of models
whose RMSFE is below the AveW RMSFE. Exceedence gives the average
RMSFE loss of not using those models that perform better than the
AveW forecast. For comparison, the last row shows the RMSFE of
the AveAve forecast.
Table 5. Predictive accuracy of AveAve forecast
Squared loss
Horizontal [y.sub.t] [[pi].sub.t] [r.sub.t]
Against long-run restricted VECX*(2,2) model
1 -1.954# -0.766 -1.793#
2 -2.088# -2.167# -3.160#
3 -1.507 -1.670 -2.316#
4 -1.334 -1.421 -1.829#
Against AR(I) model
1 -0.251 -1.626 -3.173#
2 -0.743 -3.197# -2.407#
3 -0.130 -3.356# -1.900#
4 0.287 -4.594# -1.657
Absolute loss
Horizontal [y.sub.t] [[pi].sub.t] [r.sub.t]
Against long-run restricted VECX*(2,2) model
1 -2.737# -0.462 -2.317#
2 -1.900# -1.422 -3.632#
4 -1.273 -1.536 -2.449#
Against AR(I) model
1 0.546 -1.905# -4.801#
2 -0.558 -4.102# -2.930#
3 -0.372 -3.685# -2.538#
4 -0.249 -35.001# -2.423#
Note: Significant test statistics at the 5 per cent level are denoted
in boldface. A negative entry indicates that the AveAve forecast
outperforms the alternative model. The AR(I) model is estimated over
the longest estimation window.
Note: Significant test statistics at the 5 per cent
level are denoted in boldface indicated with #.
Table 6. RMSFE for inflation in per cent for AveM
forecast using different weights
Estimation Equal AIC Yang Online
window weights weights (2004) weights
1965 Q4 0.068 0.071 0.086 0.067
1966 Q4 0.069 0.072 0.079 0.068
1967 Q4 0.068 0.070 0.077 0.067
1968 Q4 0.069 0.072 0.085 0.068
1969 Q4 0.069 0.074 0.092 0.069
1970 Q4 0.068 0.074 0.076 0.068
1971 Q4 0.068 0.072 0.074 0.067
1972 Q4 0.069 0.071 0.078 0.067
1973 Q4 0.070 0.072 0.087 0.069
1974 Q4 0.080 0.073 0.091 0.076
1975 Q4 0.078 0.093 0.085 0.075
1976 Q4 0.077 0.082 0.082 0.075
AveAve 0.069 0.073 0.076 0.069
Note: Sequential out-of-sample forecasts from 2000Q1 to 2006Q3.
The table shows the RMSFE for the AveM forecast, estimated over
different estimation windows. Forecasts are averaged over the
[M.sub.a] and [M.sub.b] marginal models, applying equal weights.
The marginal models are estimated over the same window as the
conditional model.