The GDP fan charts: an empirical evaluation.
Dowd, Kevin
This paper evaluates the probability density forecasts reflected in
the Bank of England's real GDP growth fan charts. Evaluation is
carried out using tests that allow for data dependence and using two GDP
growth estimates. Results suggest there are problems with the shorter
horizon forecasts, but conclusions about the performance of longer-term
forecasts depend to some extent on the GDP estimates used in the
assessment.
Key words: GDP forecasting; density forecasting; fan charts
JEL Classifications: C4; N I
I. Introduction
Since 1996, the Bank of England has been publishing 'fan
charts' in its quarterly Inflation Report. These represent
forecasts of probability density functions for a chosen macroeconomic variable--which might be inflation or real GDP growth--and show the
Bank's forecasts of the most likely outcome surrounded by forecasts
of prediction intervals at various probability levels. Each interval is
shaded, with the 10 per cent interval darkest and the shading becoming
lighter as we move to broader intervals. Forecasts are given for
horizons ranging from the current quarter up to eight quarters ahead,
and typically 'fan out' and become more dispersed as the
horizon increases.
GDP fan charts have appeared in each Inflation Report since
November 1997. (1) Given the period that has elapsed since, it is
natural to ask how good the fan-chart forecasts have turned out to be,
and a number of recent papers have asked this question of the
better-known inflation fan charts (see, e.g., Wallis, 2003, 2004;
Clements, 2004; Cogley et al., 2005; Dowd, 2004, 2007a,c; and Elder et
al., 2005). (2) However, little attention has so far been paid to the
GDP fan charts. (3) This paper examines their performance by applying a
number of forecast evaluation procedures to them. In doing so, a major
complication is the need to carry out tests that take account of
dependence in the data, a problem that also arises when evaluating the
inflation fan charts. Evaluation of the GDP charts is also complicated
further because the realised GDP growth rate is never
'observed' in the same way in which we 'observe' an
inflation index. (4) This requires us to use estimates for realised GDP
growth, and this raises the issue of the robustness of our results to
the estimates used.
2. The real GDP fan charts
The Bank's fan charts are based on an assumption that real GDP
growth obeys a two-piece normal (2PN) density function. The 2PN pdf is
usually defined as:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)
where C = k[([[sigma].sub.1] + [[sigma].sub.2]).sup.-1], k =
[square root of 2/[pi]] and [mu] is the mode (see, e.g., John, 1982 or
Wallis, 2004). The distribution takes the lower half of a normal
distribution with parameters [mu] and [[sigma].sub.1], and the upper
half of a normal with parameters [mu] and [[sigma].sub.2] . These halves
are scaled to give the same mode value, and the distribution is
negatively (positively) skewed if [[sigma].sub.1] > [[sigma].sub.2]
([[sigma].sub.1] < [[sigma].sub.2]).
However, the Bank uses a 2PN based on an alternative
parameterisation:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2)
where -l < [gamma] < 1, and [gamma] and [sigma] are related
to [[sigma].sub.1] and [[sigma].sub.2] via:
(1 + [gamma])[[sigma].sup.] (3)
For each horizon, the Bank publishes forecasts of the mean, median
and mode ([mu]), a skew parameter (which is not [gamma], but the
difference between the mean and g) and uncertainty [sigma]. (5) This
uncertainty parameter is not the same as the standard deviation, except
where the skew is zero. Once the MPC specifies the values of [mu],
[sigma] and the skew parameter for each horizon, the model is complete
and the density forecasts can be ascertained from it. (6)
3. A framework to test density forecasts Let [x.sub.t] be the
realised value of real GDP growth for quarter t, and assume for the
moment that [x.sub.t] is observable. Each realised value is to be
compared against the relevant forecasted density over each forecast
horizon. Now let [p.sub.k,t] be the value of [x.sub.t] mapped to its
value on the k-period-ahead forecast cdf, where k = 0,1, ..., 8. This
mapping is known as the Probability Integral Transform (PIT).
Under the null hypothesis that the model is adequate, [p.sub.k,t]
should be uniformly distributed over the interval [0,1], so we can
evaluate the forecasts by comparing the empirical distribution of
[p.sub.k,t] against the predicted uniform distribution.
However, we cannot test the null by a naive application of standard
uniformity tests (e.g., KS tests), because these presuppose that the
[p.sub.k,t] are independent, and there are two reasons why we cannot
make this assumption here. First, since real GDP growth is measured as
the rate of growth of real GDP over the past four quarters, successive
growth rates thus measured will be rolling moving averages of quarterly
growth rates and lack independence. But even if quarterly GDP growth was
itself independent--and correlogram analysis suggests it is not--the
[p.sub.k,t] would not be, except where k = 0. For k > 0, successive
values of [p.sub.k,t] would share common random factors (i.e., the
quarterly growth rates), and these create further dependence in the
[p.sub.k,t]. We therefore need a testing framework that allows for
[p.sub.k,t] to be dependent.
Before addressing this problem, it is convenient if we follow
Berkowitz (2001) and run the [p.sub.k,t] through an inverse standard
normal function, i.e.:
[z.sub.k,t] = [[PHI].sub.-1]([p.sub.k,t]) (4)
Under the null, these Berkowitz-transformed [z.sub.k,t]
observations should be distributed as standard normal. However, they are
not predicted to be iid (except for k = 0), because lack of iid-ness in
the [p.sub.k,t] implies lack of iid-ness in the [z.sub.k,t].
We now face the problem of testing the [z.sub.k,t] for standard
normality in a context where they follow an unknown dependence
structure. Furthermore, because sample sizes are small, it would be
unwise to rely on tests derived from large-sample theory. (7)
One way forward is to postulate that [z.sub.k,t] follows an ARMA
process and use Box-Jenkins analysis to obtain a parsimonious fit. For
example, we might find that the dependence can be adequately represented
by a first order autoregression (AR(1)):
[z.sub.k,t] = [[mu].sub.k] + [[rho].sub.k][z.sub.k,t-1] +
[[epsilon].sub.k,t], [absolute value [[rho].sub.k]] < 1 (5)
where the errors [[epsilon].sub.k,t] are iid normal, the intercept
parameters [[mu].sub.k] are predicted to be 0 and the autoregression
parameters [[rho].sub.k] are not expected to be 0 except where k = 0.
(8) Alternatively, we might fit an MA (9) or a more general ARMA
process. (10)
We can now test the mean and variance predictions using the
following Monte Carlo procedure. For each value of k:
* We obtain the PITs and put these through the Berkowitz
transformation (4) to obtain a [z.sub.k,t] sample. Denote this by
[[??].sub.k,t].
* We fit a parsimonious ARMA process (i.e., (5) or its higher-order
AR, MA or ARMA equivalent) to the [[??].sub.k,t], taking care to ensure
that the residuals appear to be independent.
* We use the fitted ARMA process to simulate a large number m of
possible [[??].sub.k,t] series, each of which has the dependence
structure of the fitted ARMA process, and let us denote the ith such
series as [[??].sup.i.sub.k,t]. (11)
* We estimate the values of the mean and variance of each
[[??].sup.i.sub.k,t] series. (12) This gives us a 'sample' of
m mean values and a 'sample' of m variance values. The
mx0.025th and mx0.975th highest means then give us the 95 per cent
confidence interval for the mean test statistic, and the 0.025th and
0.975th highest variances give us the 95 per cent confidence interval
for the variance test statistic, under the null hypothesis that
[[??].sub.k,t] is standard normal but has the dependence structure of
the fitted ARMA process.
* We estimate the values of the mean and variance of our
'real' sample [[??].sub.k,t] ; for each of these two
statistics, the forecasts pass the test if the sample value lies within
the estimated confidence interval, and otherwise fail. (13)
4. Data issues
Testing the GDP fan charts requires quarterly observations, or
estimates, of real GDP growth between the quarter in question and the
comparable quarter a year before. Ideally, we would like a series that
is accurate and timely, in the sense that it was also available to the
MPC at or close to the time when the forecasts became due. The need for
an accurate series is obvious, but the timeliness criterion is also
important to avoid anachronism. (14) However, any available series
involves a trade-off between these two desirable characteristics.
From the accuracy point of view, the most natural series is the
latest 4-quarter growth (IHYR) series produced by National Statistics.
However, this series is subject to revisions, and these can be
substantial ones made years later. (15) Hence, there is no a priori guide over which vintage of IHYR series to use: the most accurate can
appear years later and have no timeliness. On the other hand, we can
also use timely series, but these may be inaccurate, because they cannot
take account of revisions made afterwards. Given that no series meets
both criteria, this study uses two polar opposite alternatives:
* The first is the latest available IHYR series, as of 29 October,
2007. (16) This has the advantage of being the most accurate series
available at the time of writing, but its disadvantage is its lack of
timeliness.
* The second is the MPC's 'best estimate' each
quarter, using the information then available to it, of that
quarter's year-on-year real GDP growth rate. (17) This series has
the advantage of being very timely, but its disadvantage is that it can
take no account of later revisions to the GDP growth rate. (18)
So the first series is good on accuracy, but poor on timeliness,
whilst the second series is good on timeliness, but poor on accuracy,
i.e., the two series are polar opposites on the accuracy and timeliness
criteria. They also make for a good comparison for another reason; a
priori, we might expect the first series to produce results that are
biased against the Bank's model (because the Bank cannot anticipate
later revisions to the GDP series), and we might expect the second
series to be biased in favour of the Bank's model (because this
series is itself a set of forecasts from the model being tested). (19)
Figure 1 provides plots of these series over our sample period,
which spans 1997Q4 to 2007Q3, and therefore has 40 observations in all.
The series exhibit roughly similar shapes, but the latest National
Statistics (NS) series shows greater economic growth over the earlier
part of our period; this reflects subsequent changes which have revised
growth upwards.
[FIGURE 1 OMITTED]
Table 1 provides some summary statistics. The NS series has a
higher mean (as we would expect). Both series have similar variances,
differ somewhat in their extremes, and are positively (but not highly
positively) correlated.
The other data used are the fan chart parameter forecasts. These
consist of 40 sets of parameter forecasts, one for each published chart.
Each set consists of nine values for each of the mode, skew and
uncertainty parameters, for k = 0,1, ..., 8. As explained in section 2,
we use these to obtain the density forecasts, and this gives us nine
sets of density forecasts for each quarter, for horizons ranging from k
= 0 to k = 8.
5. Results
Preliminary results (20)
We first report the results of some preliminary analysis. Figure 2
shows plots of the predicted and empirical [p.sub.k,t] for each horizon.
Under the null, we would expect the plots to be 'close' to the
45[degrees] line. The notes also show the least-squares slopes of the
empirical [p.sub.k,t] series, which should be close to I under the null.
However, most plots are not 'close' to these expectations, but
give the impression that plots are perhaps better for medium- and
long-horizon forecasts. For their part, the estimated slopes suggest
that performance is best for medium-horizon forecasts.
Summary statistics for the [[??].sub.k,t] sample moments are shown
in table 2. The sample values tend not to be 'close' to the
predicted values, and the low-horizon forecasts tend to perform more
poorly than the others. The variances also clearly suggest that the
better performing forecasts are the medium-horizon ones. However, the
two GDP series conflict in other ways, so it is difficult to draw clear
conclusions at this stage.
Test results
To investigate further, table 3 shows the 'best fitting'
ARMA processes, their estimated parameters and their standard errors,
and P-values of a portmanteau statistic of the ARMA residuals. The
best-fitting process is an AR(1) process if we use the NS series, and is
usually (but not always) an MA(1) if we use the MPC series. (The
exception is an iid process for k = 1.) The P-values for the portmanteau
statistic for the most part tend to suggest that the residuals have no
significant dependence structure, and confirm the goodness of the fits.
The table also shows the P-values of the mean and variance tests
obtained using the Monte-Carlo procedure outlined earlier.
[FIGURE 2 OMITTED]
In interpreting these results, we might note that under the null
hypothesis, the [[??].sub.0,t] (available only for the NS GDP proxy) is
predicted to be iid N(0,1), and this prediction is rejected; the AR(1)
parameter is significant (which rejects the iid prediction) and the
P-values of the mean and variance tests clearly reject the
'standard' aspect of the standard normality prediction. (21)
As for the predictions as they apply to other [[??].sub.k,t] series, the
forecasts easily pass the mean tests, but often have difficulty with the
variance ones. More precisely, the NS series fail the variance
prediction over very low horizons and the MPC series always (strongly)
fails the variance prediction over all horizons. Comparing results
across the two GDP estimates: if we use the National Statistics proxy,
the model usually performs adequately except for very short horizons;
and if we use the MPC proxy, the model performs poorly, especially
regarding the variance prediction.
These results are open to error in the event of a misspecified
dependence structure, so it is important to check their robustness.
Accordingly, table 4 reports the P-values for these two sets of tests
under each of three possible alternative dependence structures applied
to all [[??].sub.k,t] : no dependence (i.e., iid), AR(1) and MA(1). I am
not suggesting that these provide good fits in each case; I merely
postulate them here to assess the robustness of our earlier
'best-fit' results to changes in the assumed dependence
structure.
The results in table 4 suggest that the poor performance of the NS
[[??].sub.0,t] is fairly robust; these forecasts fail five of the six
tests as one looks across the first line of the table. These results
also confirm that there are problems with low-horizon NS forecasts, and
to a lesser extent also suggest problems with the NS forecasts over
other horizons as well. Turning to the MPC forecasts, these clearly and
robustly perform well when evaluated on the mean test, but (with only
two exceptions) always perform poorly on the variance test. However, we
need to interpret these results with some caution, because most of these
results will be based on mis-specified temporal dependence processes.
Nonetheless, they are useful in so far as they confirm the robustness of
the problems identified in table 3.
6. Conclusions
Any overall assessment of these results is a matter of judgement.
However, I would summarise the results as suggesting, first, that there
is strong evidence against at least some of the very low-horizon (i.e.,
k = 0 and k = 1) forecasts. Second, for longer horizon forecasts we get
a more mixed picture: we get fairly defensible results if we use the
National Statistics estimate for GDP, but the forecasts are problematic
(especially as regards the variance results) if we use the MPC estimate.
Thus, the general assessment one comes to depends to some extent on
which estimate one believes to be 'best'. (22)
REFERENCES
Akritidis, L. (2003), 'Revisions to quarterly GDP growth and
expenditure components', Economic Trends, December, pp. 69-85.
Berkowitz, J. (2001), 'Testing density forecasts, with
applications to risk management', Journal of Business and Economic
Statistics, 19, pp. 465-74.
Castle, J. and Ellis, C. (2002), 'Building a real-time
database for GDP(E)', Bank of England Quarterly Bulletin, Spring,
pp. 42-9.
Clements, M.P. (2004), 'Evaluating the Bank of England density
forecasts of inflation', Economic Journal, 114, pp. 855-77.
Cogley, T., Morozov, S. and Sargent, T.J. (2005), 'Bayesian
fan charts for UK inflation: forecasting and sources of uncertainty in
an evolving monetary system', Journal of Economic Dynamics and
Control, 29, pp. 1893-1925
Corradi, V. and Swanson, N.S. (2006), 'Predictive density
evaluation', Chapter 5 in Elliott, G., Granger, C.W.J. and
Timmermann, A. (eds), Handbook of Economic Forecasting, Volume I,
Amsterdam, Elsevier, pp. 197-284.
Dowd, K. (2004), 'The inflation 'fan charts': an
evaluation', Greek Economic Review, 23, pp. 99-111.
--(2007a), 'Too good to be true? The (in)credibility of the UK
inflation fan charts', Journal of Macroeconomics, 29, pp. 91-102.
(a)
--(2007b), 'Validating multiple-period density forecasting
models', Journal of Forecasting, 26, pp. 251-70.
--(2007c), 'Backtesting the RPIX inflation fan charts',
Journal of Risk Model Validation, 1 (3), pp. 1-19.
Elder, R., Kapetanios, G., Taylor, T. and Yates, T. (2005),
'Assessing the MPC's fan charts', Bank of England
Quarterly Bulletin, Autumn, pp. 326-48.
John, S. (1982), 'The three-parameter two-piece normal family
of distributions and its fitting', Communications in
Statistics--Theory and Methods 11, pp. 879-85.
Tsay, R.S. (2005), Analysis of Financial Time Series, Second
edition, Hoboken, NJ, Wiley.
Wallis, K.F. (2003), 'Chi-squared tests of interval and
density forecasts, and the Bank of England's fan charts',
International Journal of Forecasting, 19, pp. 165-75.
--(2004), 'An assessment of Bank of England and National
Institute inflation forecast uncertainties', National Institute
Economic Review, 189, July, pp. 64-71.
NOTES
(1) The Bank publishes two fan charts for each variable: these are
based on the alternative assumptions that short-term market interest
rates will remain constant or follow market expectations over the
forecast horizon. This paper uses data from the constant-rate version,
but we get similar results with the other.
(2) A number of these studies have reported that the inflation fan
charts performed well over very short forecast horizons, but performed
poorly over longer horizons. The exception is Elder et al. (2005);
although they report a set of P-values that suggest that some of the
longer-horizon forecasts are questionable (Elder et al. (2005, Table B),
they treat their results with caution and conclude that the RPIX fan
chart forecasts are "reasonably accurate" overall (Elder et
al., 2005, p. 342). However, their density-forecast tests were rather
limited and their test results do not contradict the evidence of poor
performance reported by other studies. I would therefore conclude that
the pre-existing literature suggests that there are problems with the
RPIX fan chart forecasts even though Elder et al. found no major
problems with them.
(3) Again, the exception is Elder et al. (2005). They apply
Kolmogorov-Smirnov (KS) and Berkowitz-LR tests to the GDP fan chart
forecasts, and this latter test was carried out under the assumption
that the relevant data follow an AR(I) process. However, the former test
is not appropriate because the data are not predicted to be independent
and, whereas the latter test may be reasonable, the AR(I) assumption is
arbitrary. By contrast, the present paper carries out tests suitable to
dependent data, identifies the best-fitting ARMA processes, and checks
the robustness of test results to the fitted processes. It also
addresses the issue of the 'unobservability' of real GDP
growth by carrying out tests on alternative real GDP estimates.
(4) Strictly speaking, one might argue that real GDP growth is
'observed', but only some time after the event. However, what
matters here is that real GDP growth is not observed in real time.
(5) The parameter forecast data are downloaded from the Bank of
England website at http://www.bankofengland.co.uk/
inflationreport/gdpinternet.xls. Note that growth is measured as the
growth rate of real GDP relative to real GDP four quarters previously;
it is not the quarterly rate of growth expressed as an annualised
percentage.
(6) Thus, [mu] and [sigma] are given by the Bank, but the value of
[gamma] needs to be derived from these two parameters and the skew
parameter. Details of how this can be done are given by Wallis (2004).
(7) These are demanding requirements. The first rules out most of
the standard textbook tests, and the second rules out the more recently
developed tests that can accommodate dependence, parameter risk, etc.
For a survey of these, see, e.g., Corradi and Swanson (2006).
(8) AR(I) processes are assumed by Dowd (2004) and Elder et al.
(2005) in their fan chart studies. However, the present paper allows for
more general ARMA processes and derives the best-fits.
(9) The possibility of MA features is suggested by the fact that
successive values of [z.sub.k.t] share common factors and by the fact
that real GDP growth is taken as a four-quarter average of quarterly
growth rates. However, these considerations do not guarantee that the
[z.sub.k,t] process will be a 'pure' MA because we do not know
the dependence structure of the quarterly growth rates.
(10) Whatever process we fit, the best we can do is to aim for a
parsimonious approximation to it. It is therefore important to check
that the residuals from our fitted process appear to be independent, and
to carry out checks of the robustness of our main results to any fitted
structure.
(11) Each such series is constructed as follows: we set suitable
initial values for time 0 parameters, simulate a value of
[[??].sup.i.sub.k,1] from the appropriate normal distribution and use
the fitted ARMA process to obtain the corresponding simulated value of
[[??].sup.i.sub.k,1]. We then simulate a value of [[??].sup.i.sub.k,2]
from the same distribution, and use the fitted ARMA process to obtain a
simulated value of [[??].sup.i.sub.k,2] ; and then repeat again and
again until we have a complete simulated [[??].sup.i.sub.k,t] path. So,
for example, if the ARMA process is an AR(I), we would simulate
[[??].sup.i.sub.k,t], using an estimate of (5) with a zero mean, i.e.,
using [[??].sup.i.sub.k,t] = [[??].sub.k][[??].sup.i.sub.k,t-1] +
[[??].sup.i.sub.k,t], where [[??].sub.k] is our estimate of [p.sub.k].
To do so, we set the initial value for [[??].sup.i.sub.k,0] equal to the
unconditional expected value of [[??].sup.i.sub.k,t], i.e., 0, simulate
a value of [[??].sup.i.sub.k,1] from a normal distribution with mean 0
and variance 1 - [[??].sup.2.sub.k], and then obtain
[[??].sup.i.sub.k,1] = [[??].sub.k] [[??].sup.i.sub.k,0] +
[[??].sup.i.sub.k,1] = [[??].sup.i.sub.k,1]. We then simulate a value of
[[??].sup.1.sub.k,2] from the same normal distribution and obtain
[[??].sup.i.sub.k,2] from [[??].sup.i.sub.k,2] =
[[??].sub.k[[??].sup.i.sub.k,2] + [[??].sup.i.sub.k,2], and proceed in
the same way to obtain [[??].sup.i.sub.k,3], [[??].sup.i.sub.k,4] etc.
Simulating the [[??].sup.i.sub.k,t] from a normal with mean 0 and
variance 1 - [[??].sup.2.sub.k] ensures that the [[??].sub.i.sub.k,t]
follow a standard normal, making use of the well-known relationship
between the variance of an AR(I) process and that of its residuals (see,
e.g., Tsay, 2005, p. 34).
(12) We focus on the mean and variance predictions because the ARMA
framework assumes that the [[epsilon].sub.k,t] are normal, and this
(arguably) undermines any rationale for using it to test for departures
from normality, i.e., skewness and excess kurtosis.
(13) The tests suggested in the text have the attractions that they
take account of the dependence structure of the data and suffer from no
discernible small sample problems. However, two alternatives should be
noted. (I) We could decompose the [[??].sub.k,t] sample into bins and
carry out a textbook chi-squared test of whether observed frequencies
within each bin are sufficiently close to their predicted values. I
preferred not to use this test because it does not take account of the
[[??].sup.i.sub.k,t] dependence structure (and the simulation-based
tests used in this paper do take account of it) and because of doubts
about its small sample properties when [[??].sup.i.sub.k,t] is
dependent; in particular, if [[??].sup.i.sub.k,t] is dependent and the
sample is small, then observations may be more clustered around the
initial starting value than they 'should' be under the
independence assumption on which the chi-squared test is predicated.
Applying a chi-squared test therefore fails to allow for this
clustering. (2) Another possible approach is an 'iid resample'
method recently proposed by Dowd (2007b); this test makes use of a
bootstrap algorithm which chooses resamples that are iid by
construction. Using this bootstrap would allow one to apply standard
lid-based tests to the resamples drawn from the original sample.
However, the 'cost' of this approach is loss of power, and
tests based on this approach turn out to have very little power in
samples as small as the ones available to us here.
(14) For example, it might be that the MPC has been forecasting
well using the data it had at the time, and some researcher comes to
different conclusions years later using a revised GDP series. The
researcher might be using a better series, but the relevance of the
exercise would be doubtful; this would be akin to criticising the
builders of the pyramids because they didn't use hydraulic cranes
that were invented long afterwards.
(15) The first preliminary estimates are made in the quarter
concerned. These are followed by the first release, seven weeks after
the end of a quarter, the National Accounts quarterly series twelve
weeks after the end of a quarter, and then the various Blue Book and
Post Blue Book estimates. To add to which, past data are sometimes
periodically revised later (e.g., to incorporate the change to the
European System of Accounts (ESA95) in 1998, to incorporate changes in
chain-weighting, and so on). These changes have led to notable increases
in estimates of real GDP growth over the period from late 1998 to late
2001. For more on these issues, see, e.g., Akritidis (2003) and Elder et
al. (2005).
(16) This series was downloaded on 29 October, 2007 from the
National Statistics website at www.statistics.gov.uk.
(17) This series is obtained from the fan chart parameters as the
current-period forecast of the real-growth mode for that quarter, the
mode being that of the constant-rate fan chart.
(18) However, when using this series as a proxy for GDP growth in
some quarter, we can only use it to evaluate forecasts made in earlier
periods. We cannot take the MPC contemporary forecast as a proxy for
realised GDP growth and then use this proxy to evaluate that same
quarter's GDP growth forecast, because we would be using the
MPC's contemporary forecast to check itself!
(19) The Bank has done work on the construction of real-time
databases for real GDP, and a collection of different
'vintages' of data is available on its website (at
www.213.225.136.206/ statistics/gdpdatabse/index.htm). Such a dataset
would in principle allow a more thorough comparison of different real
GDP growth series, but is of limited use for our purposes because it
ends in 2001Q4. For more details, see Castle and Ellis (2002).
(20) Calculations were carried out using specially written MATLAB functions, which are available on request.
(21) Standard textbook tests of these predictions are also
applicable and give similar results.
(22) One might also draw a third conclusion. The forecasts mostly
run into problems with the variance prediction, and this applies to some
extent to the National Statistics estimates as well as to MPC ones.
Where this occurs, it suggests that the forecasts have problems getting
the 'right' forecast dispersion. The sample mean [[??]sub.k,t]
results in table 2 would then suggest that forecasts assessed using the
National Statistics GDP estimates tend to under-estimate the dispersion
of future GDP growth over relatively low horizons, whereas forecasts
assessed using the MPC estimates tend to over-estimate the dispersion of
GDP growth over all horizons. (The reason is that low-horizon National
Statistics estimates produce sample [[??]sub.k,t] variances above the
predicted value of I implying that the forecasts under-estimate future
GDP growth dispersion, and the reverse is true for the MPC estimates.)
Once again, the message is that one's conclusions on the forecast
performance of the GDP fan charts depend, in part, on the GDP estimates
used to assess them.
Kevin Dowd, Centre for Risk and Insurance Studies, Nottingham
University Business School. e-mail: Kevin.Dowd@Nottingham.ac.uk. The
author would like to thank two anonymous referees, Ken Wallis and Martin
Weale, for helpful feedback, and the ESRC for support under grant
RES-000-27-0014. The usual caveat applies.
Table 1. Summary statistics for different real GDP
growth series
Summary statistic Latest NS series MPC series
Mean 2.855 2.430
Variance 0.454 0.516
Minimum 1.600 0.790
Maximum 4.300 4.020
N 40 40
Pearson correlation 0.465
Rank correlation 0.513
Notes: The headline series is the latest available data for National
Statistics' series IHYR; and the contemporary MPC estimates are the
MPC's 'best estimates' of contemporary real GDP growth, as given in
the Bank's GDP growth fan chart parameters. Both series refer to
growth of real GDP over the previous four quarters, and the sample
period spans 1998Q4 to 2007Q3.
Table 2. Summary statistics for standard
normal inverses of [[??].sub.k,t] series
Parameter k=0 k=1 k=2 k=3
(a) Using latest National Statistics real GDP growth series
Mean 0.743 0.602 0.443 0.333
Variance 1.966 1.625 1.372 1.171
Skewness -0.643 -0.220 -0.107 -0.211
Kurtosis 2.685 2.096 2.317 2.870
N 40 39 38 37
(b) Using MPC contemporaneous mode real GDP growth forecasts
mean NA 0.005 -0.052 -0.089
variance NA 0.329 0.498 0.578
skewness NA -0.537 0.288 0.417
kurtosis NA 3.643 2.863 3.370
n NA 39 38 37
Parameter k=4 k=5 k=6 k=7 k=8
(a) Using latest National Statistics real GDP growth series
Mean 0.296 0.222 0.174 0.155 0.133
Variance 0.949 0.710 0.527 0.428 0.341
Skewness -0.050 0.082 -0.002 -0.058 -0.096
Kurtosis 3.427 3.078 3.054 2.510 1.842
N 36 35 34 33 32
(b) Using MPC contemporaneous mode real GDP growth forecasts
mean -0.061 -0.100 -0.106 -0.085 -0.067
variance 0.533 0.460 0.399 0.345 0.299
skewness 0.266 0.282 0.236 -0.238 0.120
kurtosis 2.530 2.334 2.389 2.168 2.460
n 36 35 34 33 32
Notes: As for table 1. Results refer to sample parameters of PIT
series put through standard normal inverse transformations, where
PITs are obtained using Bank of England real GDP growth fan chart
forecasts over k quarters ahead, with sample sizes n. Under the
null we would expect the mean and skewnesses to be 0, the variances
to be 1, and the kurtoses to be 3.
Table 3. Results for mean and variance tests based on 'best
fitting'. ARMA representations of the[[??].sub.k,t] dependence
structure
Parameters
Horizon 'Best fit' const. AR(I) MA(I)
(a) Using latest National Statistics real GDP growth series
0 AR(I) 0.722 0.554 NA
(0.089) (0.125)
1 AR(I) 0.666 0.708 NA
(0.128) (0.116)
2 AR(I) 0.593 0.776 NA
(0.148) (0.107)
3 AR(I) 0.549 0.771 NA
(0.132) (0.103)
4 AR(I) 0.528 0.829 NA
(0.136) (0.087)
5 AR(I) 0.532 0.786 NA
(0.121) (0.102)
6 AR(I) 0.525 0.755 NA
(0.107) (0.112)
7 AR(I) 0.521 0.712 NA
(0.096) (0.123)
8 AR(I) 0.515 0.743 NA
(0.099) (0.119)
(b) Using MPC contemporaneous mode real GDP growth forecasts
I iid 0.507 NA NA
(0.031)
2 MA(I) 0.480 NA 0.784
(0.056) (0.096)
3 MA(I) 0.465 NA 0.511
(0.051) (0.145)
4 MA(I) 0.476 NA 0.733
(0.054) (0.106)
5 MA(I) 0.466 NA 0.560
(0.051) (0.151)
6 MA(I) 0.463 NA 0.634
(0.047) (0.139)
7 MA(I) 0.474 NA 0.596
(0.048) (0.139)
8 MA(I) 0.475 NA 0.483
(0.044) (0.160)
Test stat P-values
Q-stat
Horizon P-value Mean test Variance test
(a) Using latest National Statistics real GDP growth series
0 0.296 0.004 1 ** 0.0033 **
1 0.209 0.0514 0.0279 *
2 0.101 0.1467 0.0732
3 0.429 0.2181 0.1386
4 0.0 10 * 0.2696 0.2120
5 0.035 0.3098 0.4888
6 0.243 0.3400 0.2195
7 0.392 0.3532 0.0750
8 0.560 0.3785 0.0335 *
(b) Using MPC contemporaneous mode real GDP growth forecasts
1 0.830 0.4861 0 **
2 0.470 0.4055 0 **
3 0.122 0.3425 0 **
4 0.209 0.3945 0.0005 **
5 0.035 * 0.3361 0 **
6 0.016 * 0.3315 0 **
7 0.262 0.3603 0 **
8 0.129 0.3885 0 **
Notes: As per earlier tables. For columns 3-5, the table gives
estimated parameter values obtained using EViews 5 followed by
their estimated standard errors in brackets. Column 5 gives the
P-value for the portmanteau (or Q-stat) for the residuals taken
up to 3 lags. P-values are calculated using the relevant fitted
process using 20000 simulation trials. * indicates significance
at 5% level, ** indicates significance at 1% level.
Table 4. P-values for mean and variance tests based on alternative
assumed ARMA representations of the [[??.]sub.k,t] dependence structure
Assuming iid Assuming AR(I)
Horizon Mean test Variance test Mean test Variance test
(a) Using latest National Statistics real GDP growth series
0 0 ** 0.0002 ** 0.0041 ** 0.0033 **
I 0 ** 0.0084 ** 0.0514 0.0279 *
2 0.0045 ** 0.0687 0.1467 0.0732
3 0.0193 * 0.2225 0.2181 0.1386
4 0.0379 * 0.4415 0.2696 0.2120
5 0.0954 0.1070 0.3098 0.4888
6 0.1511 0.0109 * 0.3400 0.2195
7 0.1924 0.0020 ** 0.3532 0.0750
8 0.2246 0.0001 ** 0.3785 0.0335*
(b) Using NIPC contemporaneous mode real GDP growth forecasts
1 0.4861 0 ** 0.4870 0 **
2 0.3717 0.0047 ** 0.4137 0.0176 *
3 0.2947 0.0210 * 0.3766 0.0793
4 0.3567 0.0111 * 0.4164 0.0574
5 0.2802 0.0028 ** 0.3800 0.0333 *
6 0.2721 0.0013 ** 0.3793 0.0181 *
7 0.3105 0.0002 ** 0.3922 0.0037 **
8 0.3521 0.0001 ** 0.4194 0.0021 **
Assuming MA(I)
Horizon Mean test Variance test
(a) Using latest National Statistics real GDP growth series
0 0.0003 ** 0.3794
I 0.0032 ** 0.2468
2 0.0261 * 0.0883
3 0.0709 0.0347 *
4 0.1009 0.0040 **
5 0.1740 0.0004 **
6 0.2311 0 **
7 0.2628 0 **
8 0.2928 0 **
(b) Using NIPC contemporaneous mode real GDP growth forecasts
1 0.4869 0 **
2 0.4055 0 **
3 0.3425 0 **
4 0.3945 0.0005 **
5 0.3361 0 **
6 0.3315 0 **
7 0.3603 0 **
8 0.3885 0 **