文章基本信息

标题：The GDP fan charts: an empirical evaluation.
作者：Dowd, Kevin
期刊名称：National Institute Economic Review
印刷版ISSN：0027-9501
出版年度：2008
期号：January
语种：English
出版社：National Institute of Economic and Social Research
摘要：Key words: GDP forecasting; density forecasting; fan charts
关键词：Economic conditions;Economic forecasting;Gross domestic product

The GDP fan charts: an empirical evaluation.

Dowd, Kevin

This paper evaluates the probability density forecasts reflected in the Bank of England's real GDP growth fan charts. Evaluation is carried out using tests that allow for data dependence and using two GDP growth estimates. Results suggest there are problems with the shorter horizon forecasts, but conclusions about the performance of longer-term forecasts depend to some extent on the GDP estimates used in the assessment.

Key words: GDP forecasting; density forecasting; fan charts

JEL Classifications: C4; N I

I. Introduction

Since 1996, the Bank of England has been publishing 'fan charts' in its quarterly Inflation Report. These represent forecasts of probability density functions for a chosen macroeconomic variable--which might be inflation or real GDP growth--and show the Bank's forecasts of the most likely outcome surrounded by forecasts of prediction intervals at various probability levels. Each interval is shaded, with the 10 per cent interval darkest and the shading becoming lighter as we move to broader intervals. Forecasts are given for horizons ranging from the current quarter up to eight quarters ahead, and typically 'fan out' and become more dispersed as the horizon increases.

GDP fan charts have appeared in each Inflation Report since November 1997. (1) Given the period that has elapsed since, it is natural to ask how good the fan-chart forecasts have turned out to be, and a number of recent papers have asked this question of the better-known inflation fan charts (see, e.g., Wallis, 2003, 2004; Clements, 2004; Cogley et al., 2005; Dowd, 2004, 2007a,c; and Elder et al., 2005). (2) However, little attention has so far been paid to the GDP fan charts. (3) This paper examines their performance by applying a number of forecast evaluation procedures to them. In doing so, a major complication is the need to carry out tests that take account of dependence in the data, a problem that also arises when evaluating the inflation fan charts. Evaluation of the GDP charts is also complicated further because the realised GDP growth rate is never 'observed' in the same way in which we 'observe' an inflation index. (4) This requires us to use estimates for realised GDP growth, and this raises the issue of the robustness of our results to the estimates used.

2. The real GDP fan charts

The Bank's fan charts are based on an assumption that real GDP growth obeys a two-piece normal (2PN) density function. The 2PN pdf is usually defined as:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (1)

where C = k[([[sigma].sub.1] + [[sigma].sub.2]).sup.-1], k = [square root of 2/[pi]] and [mu] is the mode (see, e.g., John, 1982 or Wallis, 2004). The distribution takes the lower half of a normal distribution with parameters [mu] and [[sigma].sub.1], and the upper half of a normal with parameters [mu] and [[sigma].sub.2] . These halves are scaled to give the same mode value, and the distribution is negatively (positively) skewed if [[sigma].sub.1] > [[sigma].sub.2] ([[sigma].sub.1] < [[sigma].sub.2]).

However, the Bank uses a 2PN based on an alternative parameterisation:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2)

where -l < [gamma] < 1, and [gamma] and [sigma] are related to [[sigma].sub.1] and [[sigma].sub.2] via:

(1 + [gamma])[[sigma].sup.] (3)

For each horizon, the Bank publishes forecasts of the mean, median and mode ([mu]), a skew parameter (which is not [gamma], but the difference between the mean and g) and uncertainty [sigma]. (5) This uncertainty parameter is not the same as the standard deviation, except where the skew is zero. Once the MPC specifies the values of [mu], [sigma] and the skew parameter for each horizon, the model is complete and the density forecasts can be ascertained from it. (6)

3. A framework to test density forecasts Let [x.sub.t] be the realised value of real GDP growth for quarter t, and assume for the moment that [x.sub.t] is observable. Each realised value is to be compared against the relevant forecasted density over each forecast horizon. Now let [p.sub.k,t] be the value of [x.sub.t] mapped to its value on the k-period-ahead forecast cdf, where k = 0,1, ..., 8. This mapping is known as the Probability Integral Transform (PIT).

Under the null hypothesis that the model is adequate, [p.sub.k,t] should be uniformly distributed over the interval [0,1], so we can evaluate the forecasts by comparing the empirical distribution of [p.sub.k,t] against the predicted uniform distribution.

However, we cannot test the null by a naive application of standard uniformity tests (e.g., KS tests), because these presuppose that the [p.sub.k,t] are independent, and there are two reasons why we cannot make this assumption here. First, since real GDP growth is measured as the rate of growth of real GDP over the past four quarters, successive growth rates thus measured will be rolling moving averages of quarterly growth rates and lack independence. But even if quarterly GDP growth was itself independent--and correlogram analysis suggests it is not--the [p.sub.k,t] would not be, except where k = 0. For k > 0, successive values of [p.sub.k,t] would share common random factors (i.e., the quarterly growth rates), and these create further dependence in the [p.sub.k,t]. We therefore need a testing framework that allows for [p.sub.k,t] to be dependent.

Before addressing this problem, it is convenient if we follow Berkowitz (2001) and run the [p.sub.k,t] through an inverse standard normal function, i.e.:

[z.sub.k,t] = [[PHI].sub.-1]([p.sub.k,t]) (4)

Under the null, these Berkowitz-transformed [z.sub.k,t] observations should be distributed as standard normal. However, they are not predicted to be iid (except for k = 0), because lack of iid-ness in the [p.sub.k,t] implies lack of iid-ness in the [z.sub.k,t].

We now face the problem of testing the [z.sub.k,t] for standard normality in a context where they follow an unknown dependence structure. Furthermore, because sample sizes are small, it would be unwise to rely on tests derived from large-sample theory. (7)

One way forward is to postulate that [z.sub.k,t] follows an ARMA process and use Box-Jenkins analysis to obtain a parsimonious fit. For example, we might find that the dependence can be adequately represented by a first order autoregression (AR(1)):

[z.sub.k,t] = [[mu].sub.k] + [[rho].sub.k][z.sub.k,t-1] + [[epsilon].sub.k,t], [absolute value [[rho].sub.k]] < 1 (5)

where the errors [[epsilon].sub.k,t] are iid normal, the intercept parameters [[mu].sub.k] are predicted to be 0 and the autoregression parameters [[rho].sub.k] are not expected to be 0 except where k = 0. (8) Alternatively, we might fit an MA (9) or a more general ARMA process. (10)

We can now test the mean and variance predictions using the following Monte Carlo procedure. For each value of k:

* We obtain the PITs and put these through the Berkowitz transformation (4) to obtain a [z.sub.k,t] sample. Denote this by [[??].sub.k,t].

* We fit a parsimonious ARMA process (i.e., (5) or its higher-order AR, MA or ARMA equivalent) to the [[??].sub.k,t], taking care to ensure that the residuals appear to be independent.

* We use the fitted ARMA process to simulate a large number m of possible [[??].sub.k,t] series, each of which has the dependence structure of the fitted ARMA process, and let us denote the ith such series as [[??].sup.i.sub.k,t]. (11)

* We estimate the values of the mean and variance of each [[??].sup.i.sub.k,t] series. (12) This gives us a 'sample' of m mean values and a 'sample' of m variance values. The mx0.025th and mx0.975th highest means then give us the 95 per cent confidence interval for the mean test statistic, and the 0.025th and 0.975th highest variances give us the 95 per cent confidence interval for the variance test statistic, under the null hypothesis that [[??].sub.k,t] is standard normal but has the dependence structure of the fitted ARMA process.

* We estimate the values of the mean and variance of our 'real' sample [[??].sub.k,t] ; for each of these two statistics, the forecasts pass the test if the sample value lies within the estimated confidence interval, and otherwise fail. (13)

4. Data issues

Testing the GDP fan charts requires quarterly observations, or estimates, of real GDP growth between the quarter in question and the comparable quarter a year before. Ideally, we would like a series that is accurate and timely, in the sense that it was also available to the MPC at or close to the time when the forecasts became due. The need for an accurate series is obvious, but the timeliness criterion is also important to avoid anachronism. (14) However, any available series involves a trade-off between these two desirable characteristics.

From the accuracy point of view, the most natural series is the latest 4-quarter growth (IHYR) series produced by National Statistics. However, this series is subject to revisions, and these can be substantial ones made years later. (15) Hence, there is no a priori guide over which vintage of IHYR series to use: the most accurate can appear years later and have no timeliness. On the other hand, we can also use timely series, but these may be inaccurate, because they cannot take account of revisions made afterwards. Given that no series meets both criteria, this study uses two polar opposite alternatives:

* The first is the latest available IHYR series, as of 29 October, 2007. (16) This has the advantage of being the most accurate series available at the time of writing, but its disadvantage is its lack of timeliness.

* The second is the MPC's 'best estimate' each quarter, using the information then available to it, of that quarter's year-on-year real GDP growth rate. (17) This series has the advantage of being very timely, but its disadvantage is that it can take no account of later revisions to the GDP growth rate. (18)

So the first series is good on accuracy, but poor on timeliness, whilst the second series is good on timeliness, but poor on accuracy, i.e., the two series are polar opposites on the accuracy and timeliness criteria. They also make for a good comparison for another reason; a priori, we might expect the first series to produce results that are biased against the Bank's model (because the Bank cannot anticipate later revisions to the GDP series), and we might expect the second series to be biased in favour of the Bank's model (because this series is itself a set of forecasts from the model being tested). (19)

Figure 1 provides plots of these series over our sample period, which spans 1997Q4 to 2007Q3, and therefore has 40 observations in all. The series exhibit roughly similar shapes, but the latest National Statistics (NS) series shows greater economic growth over the earlier part of our period; this reflects subsequent changes which have revised growth upwards.

[FIGURE 1 OMITTED]

Table 1 provides some summary statistics. The NS series has a higher mean (as we would expect). Both series have similar variances, differ somewhat in their extremes, and are positively (but not highly positively) correlated.

The other data used are the fan chart parameter forecasts. These consist of 40 sets of parameter forecasts, one for each published chart. Each set consists of nine values for each of the mode, skew and uncertainty parameters, for k = 0,1, ..., 8. As explained in section 2, we use these to obtain the density forecasts, and this gives us nine sets of density forecasts for each quarter, for horizons ranging from k = 0 to k = 8.

5. Results

Preliminary results (20)

We first report the results of some preliminary analysis. Figure 2 shows plots of the predicted and empirical [p.sub.k,t] for each horizon. Under the null, we would expect the plots to be 'close' to the 45[degrees] line. The notes also show the least-squares slopes of the empirical [p.sub.k,t] series, which should be close to I under the null. However, most plots are not 'close' to these expectations, but give the impression that plots are perhaps better for medium- and long-horizon forecasts. For their part, the estimated slopes suggest that performance is best for medium-horizon forecasts.

Summary statistics for the [[??].sub.k,t] sample moments are shown in table 2. The sample values tend not to be 'close' to the predicted values, and the low-horizon forecasts tend to perform more poorly than the others. The variances also clearly suggest that the better performing forecasts are the medium-horizon ones. However, the two GDP series conflict in other ways, so it is difficult to draw clear conclusions at this stage.

Test results

To investigate further, table 3 shows the 'best fitting' ARMA processes, their estimated parameters and their standard errors, and P-values of a portmanteau statistic of the ARMA residuals. The best-fitting process is an AR(1) process if we use the NS series, and is usually (but not always) an MA(1) if we use the MPC series. (The exception is an iid process for k = 1.) The P-values for the portmanteau statistic for the most part tend to suggest that the residuals have no significant dependence structure, and confirm the goodness of the fits. The table also shows the P-values of the mean and variance tests obtained using the Monte-Carlo procedure outlined earlier.

[FIGURE 2 OMITTED]

In interpreting these results, we might note that under the null hypothesis, the [[??].sub.0,t] (available only for the NS GDP proxy) is predicted to be iid N(0,1), and this prediction is rejected; the AR(1) parameter is significant (which rejects the iid prediction) and the P-values of the mean and variance tests clearly reject the 'standard' aspect of the standard normality prediction. (21) As for the predictions as they apply to other [[??].sub.k,t] series, the forecasts easily pass the mean tests, but often have difficulty with the variance ones. More precisely, the NS series fail the variance prediction over very low horizons and the MPC series always (strongly) fails the variance prediction over all horizons. Comparing results across the two GDP estimates: if we use the National Statistics proxy, the model usually performs adequately except for very short horizons; and if we use the MPC proxy, the model performs poorly, especially regarding the variance prediction.

These results are open to error in the event of a misspecified dependence structure, so it is important to check their robustness. Accordingly, table 4 reports the P-values for these two sets of tests under each of three possible alternative dependence structures applied to all [[??].sub.k,t] : no dependence (i.e., iid), AR(1) and MA(1). I am not suggesting that these provide good fits in each case; I merely postulate them here to assess the robustness of our earlier 'best-fit' results to changes in the assumed dependence structure.

The results in table 4 suggest that the poor performance of the NS [[??].sub.0,t] is fairly robust; these forecasts fail five of the six tests as one looks across the first line of the table. These results also confirm that there are problems with low-horizon NS forecasts, and to a lesser extent also suggest problems with the NS forecasts over other horizons as well. Turning to the MPC forecasts, these clearly and robustly perform well when evaluated on the mean test, but (with only two exceptions) always perform poorly on the variance test. However, we need to interpret these results with some caution, because most of these results will be based on mis-specified temporal dependence processes. Nonetheless, they are useful in so far as they confirm the robustness of the problems identified in table 3.

6. Conclusions

Any overall assessment of these results is a matter of judgement. However, I would summarise the results as suggesting, first, that there is strong evidence against at least some of the very low-horizon (i.e., k = 0 and k = 1) forecasts. Second, for longer horizon forecasts we get a more mixed picture: we get fairly defensible results if we use the National Statistics estimate for GDP, but the forecasts are problematic (especially as regards the variance results) if we use the MPC estimate. Thus, the general assessment one comes to depends to some extent on which estimate one believes to be 'best'. (22)

REFERENCES

Akritidis, L. (2003), 'Revisions to quarterly GDP growth and expenditure components', Economic Trends, December, pp. 69-85.

Berkowitz, J. (2001), 'Testing density forecasts, with applications to risk management', Journal of Business and Economic Statistics, 19, pp. 465-74.

Castle, J. and Ellis, C. (2002), 'Building a real-time database for GDP(E)', Bank of England Quarterly Bulletin, Spring, pp. 42-9.

Clements, M.P. (2004), 'Evaluating the Bank of England density forecasts of inflation', Economic Journal, 114, pp. 855-77.

Cogley, T., Morozov, S. and Sargent, T.J. (2005), 'Bayesian fan charts for UK inflation: forecasting and sources of uncertainty in an evolving monetary system', Journal of Economic Dynamics and Control, 29, pp. 1893-1925

Corradi, V. and Swanson, N.S. (2006), 'Predictive density evaluation', Chapter 5 in Elliott, G., Granger, C.W.J. and Timmermann, A. (eds), Handbook of Economic Forecasting, Volume I, Amsterdam, Elsevier, pp. 197-284.

Dowd, K. (2004), 'The inflation 'fan charts': an evaluation', Greek Economic Review, 23, pp. 99-111.

--(2007a), 'Too good to be true? The (in)credibility of the UK inflation fan charts', Journal of Macroeconomics, 29, pp. 91-102. (a)

--(2007b), 'Validating multiple-period density forecasting models', Journal of Forecasting, 26, pp. 251-70.

--(2007c), 'Backtesting the RPIX inflation fan charts', Journal of Risk Model Validation, 1 (3), pp. 1-19.

Elder, R., Kapetanios, G., Taylor, T. and Yates, T. (2005), 'Assessing the MPC's fan charts', Bank of England Quarterly Bulletin, Autumn, pp. 326-48.

John, S. (1982), 'The three-parameter two-piece normal family of distributions and its fitting', Communications in Statistics--Theory and Methods 11, pp. 879-85.

Tsay, R.S. (2005), Analysis of Financial Time Series, Second edition, Hoboken, NJ, Wiley.

Wallis, K.F. (2003), 'Chi-squared tests of interval and density forecasts, and the Bank of England's fan charts', International Journal of Forecasting, 19, pp. 165-75.

--(2004), 'An assessment of Bank of England and National Institute inflation forecast uncertainties', National Institute Economic Review, 189, July, pp. 64-71.

NOTES

(1) The Bank publishes two fan charts for each variable: these are based on the alternative assumptions that short-term market interest rates will remain constant or follow market expectations over the forecast horizon. This paper uses data from the constant-rate version, but we get similar results with the other.

(2) A number of these studies have reported that the inflation fan charts performed well over very short forecast horizons, but performed poorly over longer horizons. The exception is Elder et al. (2005); although they report a set of P-values that suggest that some of the longer-horizon forecasts are questionable (Elder et al. (2005, Table B), they treat their results with caution and conclude that the RPIX fan chart forecasts are "reasonably accurate" overall (Elder et al., 2005, p. 342). However, their density-forecast tests were rather limited and their test results do not contradict the evidence of poor performance reported by other studies. I would therefore conclude that the pre-existing literature suggests that there are problems with the RPIX fan chart forecasts even though Elder et al. found no major problems with them.

(3) Again, the exception is Elder et al. (2005). They apply Kolmogorov-Smirnov (KS) and Berkowitz-LR tests to the GDP fan chart forecasts, and this latter test was carried out under the assumption that the relevant data follow an AR(I) process. However, the former test is not appropriate because the data are not predicted to be independent and, whereas the latter test may be reasonable, the AR(I) assumption is arbitrary. By contrast, the present paper carries out tests suitable to dependent data, identifies the best-fitting ARMA processes, and checks the robustness of test results to the fitted processes. It also addresses the issue of the 'unobservability' of real GDP growth by carrying out tests on alternative real GDP estimates.

(4) Strictly speaking, one might argue that real GDP growth is 'observed', but only some time after the event. However, what matters here is that real GDP growth is not observed in real time.

(5) The parameter forecast data are downloaded from the Bank of England website at http://www.bankofengland.co.uk/ inflationreport/gdpinternet.xls. Note that growth is measured as the growth rate of real GDP relative to real GDP four quarters previously; it is not the quarterly rate of growth expressed as an annualised percentage.

(6) Thus, [mu] and [sigma] are given by the Bank, but the value of [gamma] needs to be derived from these two parameters and the skew parameter. Details of how this can be done are given by Wallis (2004).

(7) These are demanding requirements. The first rules out most of the standard textbook tests, and the second rules out the more recently developed tests that can accommodate dependence, parameter risk, etc. For a survey of these, see, e.g., Corradi and Swanson (2006).

(8) AR(I) processes are assumed by Dowd (2004) and Elder et al. (2005) in their fan chart studies. However, the present paper allows for more general ARMA processes and derives the best-fits.

(9) The possibility of MA features is suggested by the fact that successive values of [z.sub.k.t] share common factors and by the fact that real GDP growth is taken as a four-quarter average of quarterly growth rates. However, these considerations do not guarantee that the [z.sub.k,t] process will be a 'pure' MA because we do not know the dependence structure of the quarterly growth rates.

(10) Whatever process we fit, the best we can do is to aim for a parsimonious approximation to it. It is therefore important to check that the residuals from our fitted process appear to be independent, and to carry out checks of the robustness of our main results to any fitted structure.

(11) Each such series is constructed as follows: we set suitable initial values for time 0 parameters, simulate a value of [[??].sup.i.sub.k,1] from the appropriate normal distribution and use the fitted ARMA process to obtain the corresponding simulated value of [[??].sup.i.sub.k,1]. We then simulate a value of [[??].sup.i.sub.k,2] from the same distribution, and use the fitted ARMA process to obtain a simulated value of [[??].sup.i.sub.k,2] ; and then repeat again and again until we have a complete simulated [[??].sup.i.sub.k,t] path. So, for example, if the ARMA process is an AR(I), we would simulate [[??].sup.i.sub.k,t], using an estimate of (5) with a zero mean, i.e., using [[??].sup.i.sub.k,t] = [[??].sub.k][[??].sup.i.sub.k,t-1] + [[??].sup.i.sub.k,t], where [[??].sub.k] is our estimate of [p.sub.k]. To do so, we set the initial value for [[??].sup.i.sub.k,0] equal to the unconditional expected value of [[??].sup.i.sub.k,t], i.e., 0, simulate a value of [[??].sup.i.sub.k,1] from a normal distribution with mean 0 and variance 1 - [[??].sup.2.sub.k], and then obtain [[??].sup.i.sub.k,1] = [[??].sub.k] [[??].sup.i.sub.k,0] + [[??].sup.i.sub.k,1] = [[??].sup.i.sub.k,1]. We then simulate a value of [[??].sup.1.sub.k,2] from the same normal distribution and obtain [[??].sup.i.sub.k,2] from [[??].sup.i.sub.k,2] = [[??].sub.k[[??].sup.i.sub.k,2] + [[??].sup.i.sub.k,2], and proceed in the same way to obtain [[??].sup.i.sub.k,3], [[??].sup.i.sub.k,4] etc. Simulating the [[??].sup.i.sub.k,t] from a normal with mean 0 and variance 1 - [[??].sup.2.sub.k] ensures that the [[??].sub.i.sub.k,t] follow a standard normal, making use of the well-known relationship between the variance of an AR(I) process and that of its residuals (see, e.g., Tsay, 2005, p. 34).

(12) We focus on the mean and variance predictions because the ARMA framework assumes that the [[epsilon].sub.k,t] are normal, and this (arguably) undermines any rationale for using it to test for departures from normality, i.e., skewness and excess kurtosis.

(13) The tests suggested in the text have the attractions that they take account of the dependence structure of the data and suffer from no discernible small sample problems. However, two alternatives should be noted. (I) We could decompose the [[??].sub.k,t] sample into bins and carry out a textbook chi-squared test of whether observed frequencies within each bin are sufficiently close to their predicted values. I preferred not to use this test because it does not take account of the [[??].sup.i.sub.k,t] dependence structure (and the simulation-based tests used in this paper do take account of it) and because of doubts about its small sample properties when [[??].sup.i.sub.k,t] is dependent; in particular, if [[??].sup.i.sub.k,t] is dependent and the sample is small, then observations may be more clustered around the initial starting value than they 'should' be under the independence assumption on which the chi-squared test is predicated. Applying a chi-squared test therefore fails to allow for this clustering. (2) Another possible approach is an 'iid resample' method recently proposed by Dowd (2007b); this test makes use of a bootstrap algorithm which chooses resamples that are iid by construction. Using this bootstrap would allow one to apply standard lid-based tests to the resamples drawn from the original sample. However, the 'cost' of this approach is loss of power, and tests based on this approach turn out to have very little power in samples as small as the ones available to us here.

(14) For example, it might be that the MPC has been forecasting well using the data it had at the time, and some researcher comes to different conclusions years later using a revised GDP series. The researcher might be using a better series, but the relevance of the exercise would be doubtful; this would be akin to criticising the builders of the pyramids because they didn't use hydraulic cranes that were invented long afterwards.

(15) The first preliminary estimates are made in the quarter concerned. These are followed by the first release, seven weeks after the end of a quarter, the National Accounts quarterly series twelve weeks after the end of a quarter, and then the various Blue Book and Post Blue Book estimates. To add to which, past data are sometimes periodically revised later (e.g., to incorporate the change to the European System of Accounts (ESA95) in 1998, to incorporate changes in chain-weighting, and so on). These changes have led to notable increases in estimates of real GDP growth over the period from late 1998 to late 2001. For more on these issues, see, e.g., Akritidis (2003) and Elder et al. (2005).

(16) This series was downloaded on 29 October, 2007 from the National Statistics website at www.statistics.gov.uk.

(17) This series is obtained from the fan chart parameters as the current-period forecast of the real-growth mode for that quarter, the mode being that of the constant-rate fan chart.

(18) However, when using this series as a proxy for GDP growth in some quarter, we can only use it to evaluate forecasts made in earlier periods. We cannot take the MPC contemporary forecast as a proxy for realised GDP growth and then use this proxy to evaluate that same quarter's GDP growth forecast, because we would be using the MPC's contemporary forecast to check itself!

(19) The Bank has done work on the construction of real-time databases for real GDP, and a collection of different 'vintages' of data is available on its website (at www.213.225.136.206/ statistics/gdpdatabse/index.htm). Such a dataset would in principle allow a more thorough comparison of different real GDP growth series, but is of limited use for our purposes because it ends in 2001Q4. For more details, see Castle and Ellis (2002).

(20) Calculations were carried out using specially written MATLAB functions, which are available on request.

(21) Standard textbook tests of these predictions are also applicable and give similar results.

(22) One might also draw a third conclusion. The forecasts mostly run into problems with the variance prediction, and this applies to some extent to the National Statistics estimates as well as to MPC ones. Where this occurs, it suggests that the forecasts have problems getting the 'right' forecast dispersion. The sample mean [[??]sub.k,t] results in table 2 would then suggest that forecasts assessed using the National Statistics GDP estimates tend to under-estimate the dispersion of future GDP growth over relatively low horizons, whereas forecasts assessed using the MPC estimates tend to over-estimate the dispersion of GDP growth over all horizons. (The reason is that low-horizon National Statistics estimates produce sample [[??]sub.k,t] variances above the predicted value of I implying that the forecasts under-estimate future GDP growth dispersion, and the reverse is true for the MPC estimates.) Once again, the message is that one's conclusions on the forecast performance of the GDP fan charts depend, in part, on the GDP estimates used to assess them.

Kevin Dowd, Centre for Risk and Insurance Studies, Nottingham University Business School. e-mail: Kevin.Dowd@Nottingham.ac.uk. The author would like to thank two anonymous referees, Ken Wallis and Martin Weale, for helpful feedback, and the ESRC for support under grant RES-000-27-0014. The usual caveat applies.

Table 1. Summary statistics for different real GDP
growth series

Summary statistic Latest NS series MPC series

Mean 2.855 2.430
Variance 0.454 0.516
Minimum 1.600 0.790
Maximum 4.300 4.020
N 40 40
Pearson correlation 0.465
Rank correlation 0.513

Notes: The headline series is the latest available data for National
Statistics' series IHYR; and the contemporary MPC estimates are the
MPC's 'best estimates' of contemporary real GDP growth, as given in
the Bank's GDP growth fan chart parameters. Both series refer to
growth of real GDP over the previous four quarters, and the sample
period spans 1998Q4 to 2007Q3.

Table 2. Summary statistics for standard
normal inverses of [[??].sub.k,t] series

Parameter k=0 k=1 k=2 k=3

(a) Using latest National Statistics real GDP growth series

Mean 0.743 0.602 0.443 0.333
Variance 1.966 1.625 1.372 1.171
Skewness -0.643 -0.220 -0.107 -0.211
Kurtosis 2.685 2.096 2.317 2.870
N 40 39 38 37

(b) Using MPC contemporaneous mode real GDP growth forecasts

mean NA 0.005 -0.052 -0.089
variance NA 0.329 0.498 0.578
skewness NA -0.537 0.288 0.417
kurtosis NA 3.643 2.863 3.370
n NA 39 38 37

Parameter k=4 k=5 k=6 k=7 k=8

(a) Using latest National Statistics real GDP growth series

Mean 0.296 0.222 0.174 0.155 0.133
Variance 0.949 0.710 0.527 0.428 0.341
Skewness -0.050 0.082 -0.002 -0.058 -0.096
Kurtosis 3.427 3.078 3.054 2.510 1.842
N 36 35 34 33 32

(b) Using MPC contemporaneous mode real GDP growth forecasts

mean -0.061 -0.100 -0.106 -0.085 -0.067
variance 0.533 0.460 0.399 0.345 0.299
skewness 0.266 0.282 0.236 -0.238 0.120
kurtosis 2.530 2.334 2.389 2.168 2.460
n 36 35 34 33 32

Notes: As for table 1. Results refer to sample parameters of PIT
series put through standard normal inverse transformations, where
PITs are obtained using Bank of England real GDP growth fan chart
forecasts over k quarters ahead, with sample sizes n. Under the
null we would expect the mean and skewnesses to be 0, the variances
to be 1, and the kurtoses to be 3.

Table 3. Results for mean and variance tests based on 'best
fitting'. ARMA representations of the[[??].sub.k,t] dependence
structure

 Parameters
Horizon 'Best fit' const. AR(I) MA(I)

(a) Using latest National Statistics real GDP growth series

0 AR(I) 0.722 0.554 NA
 (0.089) (0.125)

1 AR(I) 0.666 0.708 NA
 (0.128) (0.116)

2 AR(I) 0.593 0.776 NA
 (0.148) (0.107)

3 AR(I) 0.549 0.771 NA
 (0.132) (0.103)

4 AR(I) 0.528 0.829 NA
 (0.136) (0.087)

5 AR(I) 0.532 0.786 NA
 (0.121) (0.102)

6 AR(I) 0.525 0.755 NA
 (0.107) (0.112)

7 AR(I) 0.521 0.712 NA
 (0.096) (0.123)

8 AR(I) 0.515 0.743 NA
 (0.099) (0.119)

(b) Using MPC contemporaneous mode real GDP growth forecasts

I iid 0.507 NA NA
 (0.031)

2 MA(I) 0.480 NA 0.784
 (0.056) (0.096)

3 MA(I) 0.465 NA 0.511
 (0.051) (0.145)

4 MA(I) 0.476 NA 0.733
 (0.054) (0.106)

5 MA(I) 0.466 NA 0.560
 (0.051) (0.151)

6 MA(I) 0.463 NA 0.634
 (0.047) (0.139)

7 MA(I) 0.474 NA 0.596
 (0.048) (0.139)

8 MA(I) 0.475 NA 0.483
 (0.044) (0.160)

 Test stat P-values
 Q-stat
Horizon P-value Mean test Variance test

(a) Using latest National Statistics real GDP growth series

0 0.296 0.004 1 ** 0.0033 **

1 0.209 0.0514 0.0279 *

2 0.101 0.1467 0.0732

3 0.429 0.2181 0.1386

4 0.0 10 * 0.2696 0.2120

5 0.035 0.3098 0.4888

6 0.243 0.3400 0.2195

7 0.392 0.3532 0.0750

8 0.560 0.3785 0.0335 *

(b) Using MPC contemporaneous mode real GDP growth forecasts

1 0.830 0.4861 0 **

2 0.470 0.4055 0 **

3 0.122 0.3425 0 **

4 0.209 0.3945 0.0005 **

5 0.035 * 0.3361 0 **

6 0.016 * 0.3315 0 **

7 0.262 0.3603 0 **

8 0.129 0.3885 0 **

Notes: As per earlier tables. For columns 3-5, the table gives
estimated parameter values obtained using EViews 5 followed by
their estimated standard errors in brackets. Column 5 gives the
P-value for the portmanteau (or Q-stat) for the residuals taken
up to 3 lags. P-values are calculated using the relevant fitted
process using 20000 simulation trials. * indicates significance
at 5% level, ** indicates significance at 1% level.

Table 4. P-values for mean and variance tests based on alternative
assumed ARMA representations of the [[??.]sub.k,t] dependence structure

 Assuming iid Assuming AR(I)

Horizon Mean test Variance test Mean test Variance test

(a) Using latest National Statistics real GDP growth series

0 0 ** 0.0002 ** 0.0041 ** 0.0033 **
I 0 ** 0.0084 ** 0.0514 0.0279 *
2 0.0045 ** 0.0687 0.1467 0.0732
3 0.0193 * 0.2225 0.2181 0.1386
4 0.0379 * 0.4415 0.2696 0.2120
5 0.0954 0.1070 0.3098 0.4888
6 0.1511 0.0109 * 0.3400 0.2195
7 0.1924 0.0020 ** 0.3532 0.0750
8 0.2246 0.0001 ** 0.3785 0.0335*

(b) Using NIPC contemporaneous mode real GDP growth forecasts

1 0.4861 0 ** 0.4870 0 **
2 0.3717 0.0047 ** 0.4137 0.0176 *
3 0.2947 0.0210 * 0.3766 0.0793
4 0.3567 0.0111 * 0.4164 0.0574
5 0.2802 0.0028 ** 0.3800 0.0333 *
6 0.2721 0.0013 ** 0.3793 0.0181 *
7 0.3105 0.0002 ** 0.3922 0.0037 **
8 0.3521 0.0001 ** 0.4194 0.0021 **

 Assuming MA(I)

Horizon Mean test Variance test

(a) Using latest National Statistics real GDP growth series

0 0.0003 ** 0.3794
I 0.0032 ** 0.2468
2 0.0261 * 0.0883
3 0.0709 0.0347 *
4 0.1009 0.0040 **
5 0.1740 0.0004 **
6 0.2311 0 **
7 0.2628 0 **
8 0.2928 0 **

(b) Using NIPC contemporaneous mode real GDP growth forecasts

1 0.4869 0 **
2 0.4055 0 **
3 0.3425 0 **
4 0.3945 0.0005 **
5 0.3361 0 **
6 0.3315 0 **
7 0.3603 0 **
8 0.3885 0 **