文章基本信息

标题：Adjusted P and F tests and the Keynesian-classical debate.
作者：Bischoff, Charles W.
期刊名称：Southern Economic Journal
印刷版ISSN：0038-4038
出版年度：1994
期号：October
语种：English
出版社：Southern Economic Association
摘要：The debate over the manner in which changes in money growth rates affect real economic activity, and over the role that systematic monetary policy plays in this relationship, has been a protracted one, yet it remains unresolved. This paper shows that a model proposed by Fischer [13] in which multi-period-ahead expectational errors in money growth are crucial for the determination of real economic aggregates is empirically superior to two leading alternatives when the three are estimated on a sample of U.S. quarterly data spanning the 1970s. In addition, we show that the test used to demonstrate these results, a version of Davidson and MacKinnon's [9] P test which we adjust experimentally to remove small sample biases in its size, is, in this application, more powerful than the more commonly-used F-ratio form of the likelihood ratio test.(1) We use these results on relative power to explain and reconcile some conflicting decisions to which the two alternative tests lead.
关键词：Keynesian economics;Money

Adjusted P and F tests and the Keynesian-classical debate.

Bischoff, Charles W.

I. Introduction

The debate over the manner in which changes in money growth rates affect real economic activity, and over the role that systematic monetary policy plays in this relationship, has been a protracted one, yet it remains unresolved. This paper shows that a model proposed by Fischer [13] in which multi-period-ahead expectational errors in money growth are crucial for the determination of real economic aggregates is empirically superior to two leading alternatives when the three are estimated on a sample of U.S. quarterly data spanning the 1970s. In addition, we show that the test used to demonstrate these results, a version of Davidson and MacKinnon's [9] P test which we adjust experimentally to remove small sample biases in its size, is, in this application, more powerful than the more commonly-used F-ratio form of the likelihood ratio test.(1) We use these results on relative power to explain and reconcile some conflicting decisions to which the two alternative tests lead.

The alternative specifications against which the Fischer model is tested are an equilibrium business cycle model in which only expectational errors over a short time horizon matter for real variables,(2) and a model in which actual money growth affects the real economy regardless of whether it is anticipated or not.(3) The direct comparison of these three models is carried out using true ex ante forecasts of money growth rather than an ex post decomposition of money growth into an anticipated and an unanticipated part by way of the frequently-used auxiliary equation approach.

Despite the many studies evaluating the New Classical and the AUDI models, neither the view that only unanticipated money growth matters nor the view that only actual money growth matters has appeared persuasive. It is well-known that these models and the Fischer model have radically divergent implications regarding the efficacy of policy conducted according to systematic rules,(4) rendering the choice of specification crucial from a policy analysis perspective. Our evidence of the superiority of Fischer's model thus supports the view that anticipatable monetary policy can have real effects, but that these effects differ from those following unanticipated policy changes.

II. Model Specifications and Alternative Tests

The models evaluated in this study are single-equation reduced form expressions which explain deviations in unemployment from its natural rate by means of a distributed lag on alternative measures of money growth. The forms of the three models of unemployment, which we identify as the Barro, Fischer, and AUDI models respectively, are given by the following equations:

U[N.sub.t] = U[N.sub.nt] + [summation of] [[Beta].sub.1i](t - i - 1[m.sub.t - i] - [E.sub.t - i - 1][m.sub.t - i]) + [u.sub.1t] where i = 0 to [k.sub.1],(1)

U[N.sub.t] = U[N.sub.nt] + [summation of] [[Beta].sub.2i](t - i - 1[m.sub.t] - [E.sub.t - i - 1][m.sub.t]) + [u.sub.2t] where i = 0 to [k.sub.2],(2)

U[N.sub.t] = U[N.sub.nt] + [summation of] [[Beta].sub.3i] t - i - 1[m.sub.t - i] [u.sub.3t] where i = 0 to [k.sub.3],(3)

where U[N.sub.t] is a transformation of the rate of unemployment, U[N.sub.nt] is the natural rate of the transformed unemployment rate, t - i - 1 [m.sub.t-j] is the annualized rate of change in the money supply between periods t - i - 1 and t - j, and [E.sub.t - i - 1][m.sub.t - j] is the anticipated value of t - i - 1[m.sub.t - j] based on information available in period t - i - 1. In each case, U[N.sub.nt] is represented by including a constant and the fiscal variable G/y, the ratio of federal government purchases to output,(5) in the estimated equations. The residuals [u.sub.jt] are specified to be second-order autoregressive transformations of independent and identically distributed normal random variables, a specification that adequately captures the serial correlation present in the quarterly data used.

There are many studies which have tested versions of the Barro model in equation (1) and the AUDI model in equation (3), beginning with the papers by Barro [1; 2], but the evidence has been quite mixed. The conclusions reached by Barro include the findings that ". . . the hypothesis that only the unanticipated part of money growth is relevant to unemployment is accepted . . ." while ". . . the reverse hypothesis that the [unanticipated money growth] values are irrelevant to unemployment, given the [anticipated money growth] values, can easily be rejected [1,109]." Mishkin, however, claims that ". . . anticipated monetary policy does not appear to be less important than unanticipated monetary policy. In fact, the opposite seems to be the case [23, 118],"(6) although he does not directly test for the significance of unanticipated money growth given its anticipated component. Carns and Lombra [6] find that an ex ante measure of unanticipated money growth serves to overturn Barro's finding of insignificance of the anticipated part while supporting the conclusion that unanticipated money matters.

Frydman and Rappaport consider the relative importance of unanticipated and anticipated money growth and claim that "raw money growth affects real output in the short run, irrespective of whether it is rationally anticipated or not [14, 702]." However, though they demonstrate that a model characterized by their AUDI hypothesis is not rejected by an equilibrium business cycle model, their testing procedure does not allow a test of whether the equilibrium business cycle model is rejected by the AUDI hypothesis.

McAleer, Pesaran, and Beta [20] find that their "New Classical" model does not reject a "Keynesian" specification, and as their approach allows for a switching of the roles that the models play in the hypothesis test, they can also claim some support for an apparent rejection of the New Classical model by the Keynesian model. They point out, though, that this conclusion is suspect because of biases in their test statistic. In fact, in a related study, McAleer and McKenzie [21] present evidence that neither of two versions of the New Classical model can be rejected by Keynesian models on the basis of at least one test.(7) The recent exchange between Pesaran [27; 28] and Rush and Waldo [30] also serves to illustrate the tenuousness of the conclusions when these models are tested against each other.

The empirical methods used in the above articles provide illustrations of the types of tests of non-nested non-linear models mentioned in the introduction. One approach involves embedding any two, or all three, of the above competing models in an artificial composite model, and then testing each of the original models against it. Such a procedure therefore yields a set of linear restrictions that the null model imposes on some of the parameters in the artificial composite model. For example, a test of the Barro model as the null hypothesis against the Fischer alternative(8) would require estimating the artificial model

U[N.sub.t] = U[N.sub.nt] + [summation of] [[Beta].sub.1i](t - i - 1[m.sub.t - i] - [E.sub.t - i - 1][m.sub.t - i]) where i = 0 to [k.sub.1] + [summation of] [[Beta].sub.2j](t - j - 1[m.sub.t] - [E.sub.t - j - 1][m.sub.t]) + [u.sub.t] where j = 1 to [k.sub.2] (4)

and then testing the restrictions [[Beta].sub.2j] = 0 for j = 1 through [k.sub.2]. The composite equations necessary for testing other competing specifications, and the implied restrictions, would be constructed similarly.

Various tests of such restrictions are available, with those based on the likelihood ratio principle being frequently used in this literature. These tests involve a comparison of the values of the sum of squared residuals from the restricted and the unrestricted or composite models, based either on their difference or on a ratio of the two. One form of the likelihood ratio test is based on the F-ratio(9) F = [(RSSR - USSR)/q]/[USSR/(n - k)] where RSSR and USSR are the restricted and unrestricted sum of squared residuals respectively, q is the number of restrictions imposed by the null model on the composite equation, and n - k is the degrees of freedom in the composite equation. No exact small sample test statistic is available in the non-linear case, and the second-order autoregressive error process makes our models non-linear.(10) Since the chi-square form tends to reject more frequently than does the F form, the F-ratio was used as the likelihood ratio test. Tests were conducted in which each of the three models served as the null against alternative composite models which embedded the null and each of the two competing models individually. In addition to these six tests, each model served as a null against an alternative embedding all three models simultaneously. Thus, we carried out a total of nine F tests.

The other procedure which we considered employs the P test appropriate for testing non-linear non-nested hypotheses, a test based on the suggestions of Cox [7; 8] and proposed by Davidson and MacKinnon [9], in which each specification is evaluated by its respective ability to explain the variation in the dependent variable left unexplained by the other specification. Examples of applications of the P test to situations in which the non-linearities in estimation arise from autoregressive specifications of the residuals are found in Bernanke, Bohn, and Reiss [5] and McAleer, Peseran, and Bera [20]. The test involves first using an appropriate non-linear least-squares estimation procedure to obtain estimates of the null and alternative models. Then ordinary least squares estimates of the coefficient [Alpha] and the vector b in the auxiliary equation

[Mathematical Expression Omitted],

which can be more conveniently written as

[Mathematical Expression Omitted],

are obtained. In these equations, [Mathematical Expression Omitted] and [Mathematical Expression Omitted] are the predicted values of unemployment under the null and alternative models respectively, and [H.sub.t] is the vector of derivatives of [Mathematical Expression Omitted] with respect to each of the coefficients in the null model evaluated at its non-linear least-squares estimates. When [Alpha] is set to zero in either equation (5) or (6), the result is a Taylor-series approximation around the non-linear estimate of the null model, suggesting that a test of the null model can be based on a test of the significance of the OLS estimate of [Alpha], [Mathematical Expression Omitted], using its "t-ratio." A finding that [Mathematical Expression Omitted] is statistically significant would then lead to a rejection of the null by the alternative on the basis of this P test procedure.

The same nine hypothesis tests were conducted using the P test as were conducted with the F test procedure, six pairwise comparisons and three tests of each null against the two alternatives simultaneously. These last three tests of a null against a composite alternative were conducted on the basis of a suggestion by Davidson and MacKinnon [9, 783](11) which involves a strategy similar to that just outlined, but applied to the extended auxiliary equation

[Mathematical Expression Omitted]

where all previously defined variables are as before and [Mathematical Expression Omitted] represents the predicted value from the second alternative specification. The hypothesis that both [[Alpha].sub.1] and [[Alpha].sub.2] are zero is tested using a standard F test with two numerator-degrees-of-freedom, and a rejection of the null hypothesis results when the F statistic exceeds its conventional critical value. This extension of the non-nested testing procedure has heretofore not been applied in any empirical studies.

As with the asymptotic F test procedure, the P test is strictly valid only asymptotically since the "t-statistic" on [Mathematical Expression Omitted] has a standard normal distribution in the limit but an unknown distribution in small samples. An evaluation of the small-sample properties of each of the tests and a comparison of the two based on this evidence were therefore conducted on the basis of Monte Carlo evidence. This experimental evidence also allowed straight-forward adjustments in the critical regions of the various tests which yield a uniform 5% probability of type I error.

These Monte Carlo comparisons of the small sample properties of the P and F tests of non-linear non-nested models are the first to appear in the literature. Pesaran [26] and Godfrey and Pesaran [15] provide comparisons of the properties of the (unadjusted) J test, similar to the P test, and the F test of a number of non-nested linear models. Rather than rely on a comparison of our results to experimental evidence strictly applicable only under different circumstances, we chose to conduct our own experiments.

Since all of the eighteen tests we studied were found to over-reject the null hypothesis when decisions were made on the basis of asymptotic critical values, adjustments which reduce the frequency of rejection would affect the power of all of the tests adversely. Other studies, which considered only linear models, have found that non-nested tests closely related to the P test are more powerful in small samples than F tests when tests are based on asymptotic critical values.(12) However, as the P tests as conducted here required more substantial small-sample adjustments in their critical regions, the question of relative power after such adjustments are made is raised. The findings presented below suggest that even after the adjustments in size are made, the P tests are still uniformly more powerful, often substantially so.

III. Data

The available raw data included the anticipations of the old M1 money supply for the current through nine quarters ahead for the sample period 1970II through 1979IV.(13) From these, annualized expected growth rates over an (i - j + 1)-quarter horizon, denoted above by [E.sub.t - i - 1][m.sub.t - j], were generated. Actual money growth rates for corresponding time periods, the t - i - 1[m.sub.t - j]'s , were based on revised figures for the old M1 series, used in order to maintain consistency with the anticipations data. The unemployment rate U, real output y, and the fiscal variable G used were the total labor force unemployment rate, the level of real GNP (billions of 1972 dollars), and the level of real federal government purchases of goods and services (billions of 1972 dollars), respectively. Regressions were of the transformed variable UN = ln(U/(1 - U)) on a constant, the ratio G/y, and the money growth rates relevant to each model.(14)

In order to preserve degrees of freedom and to allow a more extensive comparison to the literature, we considered specifications with distributed lags of length eight on the various measures of money growth, so that in equations (1), (2) and (3), [k.sup.i] = 7 for i = 1, 2, 3.(15) We also found evidence in all models of a second-order autoregressive error process, which also coincides with much of the literature. Thus, nine quarters were dropped from the beginning of the sample, leaving thirty observations on which to estimate models involving twelve coefficients, and composite models involving between nineteen and twenty-seven coefficients.

While more degrees of freedom would be desirable, we proceeded with the available sample for a number of reasons. First, the expectations we used were not being generated prior to 1970II, precluding any extension of the sample backwards in time. Second, the Chase data were proprietary information and were not available to us beyond 1979IV.

In addition, the announced change in 1979IV in the way the Federal Reserve conducted monetary policy is likely to subject the estimated equations to structural shifts akin to "Lucas critique" complications if the estimation spanned this period and the coefficients connecting money growth (whether anticipated or unanticipated) to unemployment changed as a result of the 1979 regime change. If one uses actual ex ante expectations of money growth during a time of change in the conduct of monetary policy, as occurred under Volcker in 1979IV and 1982III, one does not need to worry about the shift being properly reflected in the new specification of the monetary rule, but only that the expectations are the person's best guess at the time they are formed and recorded. However, the coefficients connecting errors in expectations to the unemployment rate may shift as a result of the regime change.(16) Thus we would expect that our equations, as estimated on data from the 1970's, should not apply to the period between 1979IV and 1982III, nor to the period after 1982III.(17)

It is also not possible to go beyond 1979IV without splicing two different data series due to the change in the definition of the money supply being forecasted. Although the behavior of M1 during the 1980's has raised questions about its appropriateness, most studies spanning periods through the seventies used M1 on the basis, in part, of the better fit it provided compared to M2.(18) Thus, M1 is arguably the better monetary aggregate to use, and limiting attention to the seventies avoids the possibility of a structural shift due to the change in its definition.

A similar analysis of the period since 1982III using M2 growth would be of interest, but that is another study. The 1970s are not so far in the past that it is not of interest what monetary model applied then. If frequent changes in monetary policy occur, only relatively short and intermittent episodes may be available for econometric analysis since, in view of the "Lucas critique," precise quantitative results should not be expected to carry over from one regime to another.

In any case, small sample sizes are often crucial only to the extent that they do not permit the rejection of hypotheses, and when tests' critical regions are based on asymptotic theory. Neither of these concerns is applicable to this study. Our sample, according to the size-adjusted P test procedure, is large enough to reject the Barro and AUDI models when the Fischer model serves as the alternative.(19) Furthermore, the tests are based on empirical distributions appropriate to the available sample size, not on some inappropriate asymptotic distribution. In light of these arguments, we felt it potentially very informative to proceed with the data available.

IV. Estimation and Monte Carlo Experiments

The hypothesis testing procedures first required estimating each of the models represented in equations (1), (2), and (3). From these results the [Mathematical Expression Omitted] required for the P tests and the restricted sums-of-squared-residuals used in the F tests were obtained. The estimation results(20) are given in equations (1[prime]), (2[prime]), and (3[prime]).

[Mathematical Expression Omitted];

[Mathematical Expression Omitted]

[Mathematical Expression Omitted];

[Mathematical Expression Omitted]

[Mathematical Expression Omitted];

[Mathematical Expression Omitted]

A ranking of the estimated standard errors of these regressions, the [Mathematical Expression Omitted], suggests choosing the Fischer model over the others. Indeed, since the models contain the same number of parameters and are estimated on the same sample, conventional discrimination criteria based on goodness-of-fit such as the adjusted coefficient of determination [Mathematical Expression Omitted] and the Akaike Information Criterion would lead to this decision. Nonetheless, this evidence is merely indicative, while the hypothesis tests outlined above can provide more exacting evidence upon which to base such a decision.

The test statistic values resulting from these two procedures are presented in the first row of figures in Table I.

It can be seen from equation (2[prime]), the estimate of the Fischer model, that the largest and most significant coefficients are those on expectational errors over a time horizon of six and seven quarters. This implies that if multi-period-ahead expectational errors are an important explanatory variable in an unemployment equation, it is largely due to the existence of contracts which are fairly long-term in nature. That nominal wage rigidities that persist for this length of time are an important source of variations in unemployment in tum suggests that policy actions which are implemented within this interval of time can have real effects.

Upon specifying one of the three models as the null hypothesis, the point estimates of this null model were used to generate 10,000 simulated realizations for the unemployment variable used as the measure of real economic activity.(21) These data were then used to generate 10,000 values each of the three P test statistics and the three F test statistics that were calculated under this null model. This was then repeated posing each of the other two models as the null model in turn.

The proportion of times that the simulated test statistics exceed the realized values in the first row of Table I provides estimates of the small-sample-adjusted p-values associated with the observed test statistics. These are reported in the second row of Table I. The 10,000 repetitions allow us to associate with an estimated p-value of 0.05 a standard deviation of approximately [(0.05 * 0.95/10,000).sup.1/2] = 0.0022 and thus a fairly narrow confidence interval around our point estimates [10,739]. If this adjusted p-value is less than the conventional 0.05,(22) we reject the null model.

An alternative way of adjusting the tests for any small-sample biases is to calculate the 95th percentile of the empirical distribution and use this as an estimate of the adjusted critical value.(23) Comparing the actual test statistic's outcome to this adjusted critical value then determines the statistical decision. The adjusted critical values are contained in the third row of Table 1. Critical values drawn from relevant asymptotic distributions and the corresponding unadjusted p-values implied by these are included in the lower half of Table I for comparison.

Following the above procedure ensures that our various hypothesis tests conform to the conventional 5% significance level standard.(24) In order to evaluate both the relative merits of the alternative testing procedures and to interpret the possible conflicting decisions that the tests yield, the power of the various adjusted tests was then estimated. To accomplish this, several sets of TABULAR DATA OMITTED 1000 simulated realizations for unemployment were generated as described in footnote 21, but using the data and estimated coefficients of the models under the alternative hypotheses as the base to which the generated errors were added. These data were then used to generate 1000 observations each on the P and F test statistics as applied to each of the hypotheses when the alternative hypotheses were "true". The empirical distributions then have upper tail areas above the corresponding adjusted critical value from the previous stage of analysis which approximate the power of that particular test against the specified alternative. These estimates of the power of the adjusted tests are in rows four through six of Table 1. The power levels of the unadjusted tests are also presented in the lower half of the table in order to provide an indication of the consequences for power of the size-adjustments that were performed.

V. Empirical Results

The size-adjusted test results presented in Table I indicate three rejections of the null hypothesis at the 5% level, all of these being based on P tests. These rejections are of Barro's model by Fischer's (at a 0.53% level), the AUDI model by Fischer's (at a 0.96% level), and Barro's by the others jointly (at a 2.62% level). The remaining case where Fischer's model serves as an alternative, that where the AUDI null is tested against the other two jointly, fails to reject the null at a 5% level but does yield a rejection at the 9.41% level. Thus only when Fischer's model serves as an alternative hypothesis does a rejection occur, and then only when the P test is used. None of the tests where the Barro model or the AUDI model (or both jointly) serve as the alternative hypothesis yield a rejection of the null model, and in particular, in those cases where the Fischer model acts as the null hypothesis, most of the p-values are over 80%. The P test results thus strongly support the Fischer model over the other two specifications, even after adjusting for the P test's propensity towards over-rejection.(25)

The conflicting decisions implied by the two tests in those cases where the P test leads to a rejection can largely be resolved by the experimental evidence on their relative powers. As seen in Table I, when the Barro model serves as the null against either both alternatives or the Fischer alternative, it is rejected at the conventional significance level according to the adjusted P statistic but not according to the adjusted F statistic. Similarly, the AUDI model is rejected by the Fischer model according to the adjusted P statistic but not the adjusted F statistic. In each of these cases, the estimated power levels shown in line five of Table I indicate that the probability of a type II error, that of failing to reject a false null model, is at least five times as great for the adjusted F test when the Fischer model is taken to be the true alternative.

A comparison of the adjusted power levels for the P and F tests overall indicates that in all cases the P test is substantially more powerful. Nine of the twelve estimates of the P test power exceed 50% and six exceed 75%, while seven of the power estimates of the F tests do not even achieve a level of 10%, and only two exceed 50%.(26)

These test results can also clarify some of the results presented in other studies on the basis of alternative tests. Frydman and Rappaport's [14] failure to reject their AUDI model by a Barro and Rush [4] alternative model, for example, is supported by our P test results.(27) However, we also find that neither can the AUDI model reject the Barro and Rush model, a result that Frydman and Rappaport's procedure does not allow them to consider. Moreover, the failure of our adjusted P test to yield a rejection of the Barro model is not likely to be due to low power since by our experimental determination the power of the test is over 87% in this case. This raises doubts regarding Frydman and Rappaport's conclusion that "the AUDI hypothesis receives substantial empirical support [14, 693]."

McAleer and McKenzie [21] claim a rejection of their New Classical model by their Keynesian model on the basis of a J test (a test identical to our P test in the case of linear models) but fail to reject the New Classical specification on the basis of an asymptotic F test. They then speculate that

[g]iven the published results on asymptotic local power of various non-nested tests, the failure of the asymptotic F test to reject the null may simply reflect lower power relative to the J and JA tests [21, 374].

Thus the conflict in decisions indicated by their tests is attributed by them to a likely underrejection of the New Classical model on the basis of the F test, implying that the Keynesian model is the correct one. Our results suggest otherwise.

When the Barro null was tested against the AUDI alternative, which most closely approximates McAleer and McKenzie's test of the New Classical null versus the Keynesian alternative, we also found that, before adjusting the two tests, the decisions they led to were in conflict.(28) We also found, though, that the bias in the size of our unadjusted P test (and by extension their J test) was very large, requiring an adjustment which in fact reversed the initial decision to reject the null. Thus, by our adjusted tests, the Barro model was not rejected by the Keynesian (AUDI) model. Not only was the conflict between the two tests resolved, but the adjustments also yielded estimated significance levels for the two tests which were essentially identical. This clearly raises the possibility that it is the properties of the J test, and not the F test, that McAleer and McKenzie use, that account for the disagreement between their tests.

Our interpretation, in conjunction with the failure of the New Classical model to reject the Keynesian model, presents a different conclusion. These adjusted test results do not seem to be the product either of size biases or, in the case of the adjusted P test, lack of power. Instead, we suggest that they result from neither of these two models being the appropriate specification. The New Keynesian (Fischer) model is suggested to be the closest to the appropriate model.

VI. Summary and Conclusions

This paper has reconsidered the empirical relevance of unanticipated money growth as a determinant of real economic behavior as measured by the rate of unemployment. Attention has been paid in particular to the role that multi-period-ahead expectational errors might play, as suggested by Fischer's important contracting-based model. The Fischer model was tested against two competing alternatives, a New Classical model in which only short-term expectational errors matter and a Keynesian model where actual money matters, both of which have been frequently and recently evaluated empirically.

Our statistical testing of these models has diverged from the conventional approach in a number of ways. Most importantly, two different testing techniques, Davidson and MacKinnon's [9] P test and the conventional F test, were applied, and their small-sample properties were assessed on the basis of Monte Carlo experiments. These experiments provide a means of adjusting for biases in size, and allow a comparison of the power levels of the size-adjusted tests.

In addition, this paper has avoided some econometric difficulties that have plagued other similar studies by its use of actual ex ante forecasts of money growth rates in the estimation and testing of the Fischer and New Classical models, rather than using proxy measures of anticipations derived as predicted values from an auxiliary equation.

The test results presented here lend strong support to the model in which a mechanism operating through multi-period-ahead expectational errors on money growth explains the behavior of unemployment. This conclusion is reinforced when Monte Carlo evidence on the properties of small-sample tests used by others to evaluate similar models is used to interpret their mixed results. We reconfirm the view that pre-announced monetary policy actions can have real effects, and that the effects of these anticipated changes differ from those due to similar but unanticipated changes. This view directly conflicts with the policy implications drawn from the other two models, both of which have received a great deal more attention in the empirical literature.

Our results also provide a stark contrast between the properties of the traditional F test and the less familiar P test. Even after adjustments are made to their respective critical regions to ensure that they have comparable size, the P test is found to be substantially more powerful than the F test. Given the ease with which the P test can be implemented and adjusted as described here, we suggest the more frequent use of such a procedure when confronted with competing non-linear non-nested specifications, particularly when samples are of the size commonly found in applied macroeconometric studies.

1. Examples of related studies which use the likelihood ratio principle include those by Barro [1; 2], Barro and Rush [4], Mishkin [23], Carns and Lombra [6], and Frydman and Rappaport [14].

2. Such models, also referred to in the literature as "New Classical" or "monetary misperceptions" models, were first evaluated empirically by Barro [1; 2] in prominent studies which have produced numerous responses, both supportive and critical of Barro's initial findings. Although real business cycle models are also commonly specified as equilibrium models, we do not consider any version of such a model here as money plays no crucial role in them.

3. Models of this type have been referred to as "AUDI" (an acronym for Anticipated-Unanticipated Distinction Irrelevent) by Frydman and Rappaport [14], and "Keynesian" by Pesaran [27; 28], McAleer and McKenzie [21], and others, although they could easily be given a "monetarist" interpretation.

4. Mishkin [24, 718-33] provides a non-technical assessment of the policy implications of each of the three models considered here.

5. This fiscal variable is the one used by Barro and Rush and is included to capture shifts in the natural rate of their transformed unemployment rate [4, 31-2], not as an aggregate demand-management policy variable. As is reported below, when G/y is included in the estimable equations as part of U[N.sub.nt], the resulting coefficient is always insignificantly different from zero by conventional standards. Nevertheless, we retained this variable in the specifications to maintain comparability with Barro and Rush.

6. We note that this particular quotation applies to results based on equations with longer distributed lags, polynomial distributed lag smoothness constraints imposed on the coefficients, a different measure of unemployment, and different variables reflecting the natural rate, relative to Barro's and our papers. When an equation with seven lags is considered, the claim that only unanticipated money matters is not rejected at the 5% level, as stated in Mishkin's Table 6.1, Column 2.2 [23, 117].

7. Rao [29], in a derivative study, also finds that the Keynesian model fails to reject the New Classical model at the 5% level in one of four cases after his test statistic is adjusted for bias.

8. Although, strictly speaking, the test is not against the Fischer model alone but against an artificial model which includes the Fischer model as well as the Barro null.

9. Davidson and MacKinnon [10, 92] recognize that "it may be stretching terminology somewhat to call [the F-ratio] an LR test, but it is certainly reasonable to say that [it] is based on the likelihood ratio principle if the latter is broadly defined to mean basing a test on the difference fetween the values of an objective function at the restricted and unrestricted estimates." This is the interpretation employed in this paper.

10. In the case of models and restrictions which are linear, the F-ratio provides an exact test. Evans and Savin [12] present some evidence on the small-sample properties of the asymptotic forms based on the chi-square distribution venus exact forms of the likelihood ratio tests. When the model is non-linear, however, the F-ratio is distributed according to the F distribution only asymptotically. This leads Davidson and MacKinnon to refer to the F-ratio calculated for non-linear models as a pseudo-F statistic [10, 92].

11. Also see MacKinnon [19, 96-7].

12. Pesaran [25; 26] finds this to be the case when comparing the Cox-type N test and the Davidson-MacKinnon J test to the F test, and Godfrey and Pesaran [15] report similar results for an adjusted N test and the W test.

13. The anticipations were ex ante forecasts made by Chase Econometric Associates, Inc., and its predecessor organizations. We are grateful to Michael K. Evans, Steven J. Elgart, and Leon W. Taub for making these forecasts available to us.

All other studies of the effects of unanticipated money, with the exception of Carns and Lombra [6], use auxiliary equations to decompose actual money growth into anticipated and unanticipated components. The proxies generated by this approach are estimates of actual forecasts and as such are not without their difficulties. They require, for one, an ex post specification by the econometrician of the information set available to forecasters at the time their forecasts were formed. Empirical results have frequently been found, however, to lack robustness to changes in the content of the information set or to changes in the measurement of variables included. See the exchanges, for example, between Barro [1; 3] and Small [31] and between Pesaran [27; 28] and Rush and Waldo [30]. Also, the processes by which the variables in the information set are used by forecasters to generate expectations are assumed to be linear and time-invariant. If in fact the actual expectations formulation process is altered in response to structural shifts or trends in the economic environment, the proxies generated by the auxiliary equation will be subject to further misspecification. This would be the case, for example, if the model were estimated over a period during which there were a change in policy regime which altered the process by which rational agents formed expectations.

The use of ex ante forecasts, being actual observations rather than econometric estimates, avoids these potential problems. The forecasts used here were formed by a major forecasting firm. Chase Econometrics Incorporated, using the information set it felt was most relevant at the time that each of its forecasts were formed. Furthermore, it was quite possible for Chase to change its forecasting procedure as and if it felt changing conditions warranted. Thus the use of these actual forecasts avoids misspecification by the econometrician of the actual period-by-period expectations-generating process.

The use of an auxiliary equation, which often takes the form of a difference equation with additional explanatory variables which must also be forecasted, is likely to produce errors in estimating ex ante forecasts which increase in variance with the forecast horizon. In an equation representing the Fischer model in which multi-period forecast errors appear, this will result in explanatory variables which are prone to measurement error. The use of actual ex ante forecasts, even from a sample of size one, should lessen this problem.

The Chase forecasts are at least equal in accuracy and other desirable properties to other measures found in the literature. In a study comparing the forecasts of Chase, Data Resources Inc., and Wharton EFA, McNees [22, 41] finds that in terms of mean absolute errors of their money forecasts, Chase does as well as DRI and better than Wharton. An empirical comparison of Chase's one-period-ahead errors with those contained in Burro and Rush's [4, 40-6] Table 2.3 also favors the Chase measure using either mean absolute errors or root mean square errors. Using the test by which Carns and Lombra [6] test for the rationality of DRI's one-period-ahead forecasts, we found that the Chase forecasts are also rational. In addition to these features of the ex ante forecasts we used, support for their use can also be based on Carns and Lombra's argument that they provide a check on previous studies' results.

14. This specification coincides with that reported in Column (9) of Table 2.1 in Barro and Rush [4].

15. Barro and Rush [4], Mishkin [23], Carns and Lombra [6], and Frydman and Rappaport [14] all considered distributed lags of this length when estimating models on quarterly data. This choice of lag length is particularly useful in contrasting our results with these studies since it is under this specification that Frydman and Rappaport claim the strongest support for their AUDI model relative to the Barro model while Mishkin finds less evidence to support his rejection of the Barro model using a short lag length.

16. Our equations (1), (2), and (3) are essentially Phillips Curves, and the slopes of these relationships are not invariant to changes in the monetary regime, a point which is emphasized by Lucas [16; 17; 18].

17. Indications of coefficient instability in samples spanning these regime changes are found in at least two studies, although neither reports formal test statistics. Both Rush and Waldo [30, 501] and Pesaran [28, 506] report evidence of instability when their samples are extended through 1985.

18. For instance, Barro [1, 108] and Mishkin [23, 124] both refer to a deterioration of fit with M2.

19. It is true that a larger sample, if available, might lead to rejection of the Fischer model with either of the other two models as alternatives. This simply recognizes that none of the models considered here is literally true and thus each will certainly be rejected on the basis of a large enough sample. For example, an extended version of the Fischer model (see Fischer's [13, 203] footnote 20) could be considered which would include all of the unanticipated money growth terms in the Barro-Rush model. It would not be difficult to imagine an even more general model which includes the actual money growth rates of the AUDI specification as well. If this model accurately reflected the data generating process, we would expect that, with a large enough sample, each of the three models we consider here would be rejected by each of the others since each would explain some variation in output not explained by the others.

20. The non-linear estimation procedure used was iterative Cochrane-Orcutt using TSP Version 4.2. t-statistics are in parentheses, and all have a 5% critical value of 2.101. [Mathematical Expression Omitted] denotes the estimated slandard error of the regression, and DW is the Durbin-Watson d-statistic.

21. Serially correlated random errors were formed by generating independent normally distributed errors with standard deviation given by the estimate from the null model, then passing them through the filter defined by the estimated autoregressive process. This was done for a sample extending back to 1945I in order to essentially eliminate the influence that initial zero values for the errors had on the effective sample. These generated errors were then used with the actual money growth data and expectations data to form the simulated unemployment rate data. The seed used by the pseudo-random number generator was also generated (pseudo-) randomly and then retrieved for the purpose of allowing replication. Program code and seed values are available from the authors upon request.

22. If one chooses to be more conservative, then we reject the null model if this adjusted p-value is less than 0.05 minus 1.96 standard errors of size 0.0022, or less than 0.0457.

23. This approach is taken by, for example, Dickey and Fuller [11] when they present finite sample properties of statistics applicable to unit root tests.

24. The prior specification of some other significance level such as 1% or 10% requires making obvious changes to our procedure.

25. We can indeed see, by comparing the unadjusted and adjusted p-values associated with these test statistics, that the adjustments are sometimes substantial, as well as important in the sense that they sometimes overturn initial rejections. There are four cases in which an apparent 5% level rejection is reversed on the basis of small-sample-size adjustments, three of these being to P tests, and two not even achieving an adjusted 10% level rejection.

26. Both of these situations are ones where the Fischer model serves as the sole alternative hypothesis. Those cases where the F test power levels are not much in excess of 5% have a probability of correctly rejecting a false null that is no better than the probability of incorrectly rejecting a true one; the test statistic has essentially no usefulness since its small sample distribution, at least at the upper tail, is basically identical under both the null and alternative hypotheses.

27. It must be noted, of course, that our dependent variable and our measure of anticipated money differ from Frydman and Rappaport's, as does our sample period.

28. We note that our empirical results are based on different data, on a different sample period and frequency, although our sample size, 30, is comparable to the size of the two samples used by McAleer and McKenzie, 28 and 40. However, the F test was confirmed to be less powerful than the P test, as they conjectured.

References

1. Barro, Robert J., "Unanticipated Money Growth and Unemployment in the United States." American Economic Review, March 1977, 101-15.

2. -----, "Unanticipated Money, Output, and the Price Level in the United States." Journal of Political Economy, August 1978, 549-80.

3. -----, "Unanticipated Money Growth and Unemployment in the United States: Reply." American Economic Review, December 1979, 1004-1009.

4. ----- and Mark Rush. "Unanticipated Money and Economic Activity," in Rational Expectations and Economic Policy, edited by Stanley Fischer. Chicago: University of Chicago Press, 1980, pp. 23-73.

5. Bernanke, Ben, Henning Bohn and Peter C. Reiss, "Alternative Non-Nested Specification Tests of Time-Series Investment Models." Journal of Econometrics, March 1988, 293-326.

6. Carns, Frederick, and Raymond Lombra, "Rational Expectations and Short-Run Neutrality: A Reexamination of the Role of Anticipated Money Growth." Review of Economics and Statistics, November 1983, 639-43.

7. Cox, David R., "Tests of Separate Families of Hypotheses," in Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability. Berkeley: University of California Press, 1961.

8. -----, "Further Results on Tests of Separate Families of Hypotheses." Journal of the Royal Statistical Society, 1962, Series B, 406-24.

9. Davidson, Russell and James G. MacKinnon, "Several Tests for Model Specification in the Presence of Alternative Hypotheses." Econometrica, May 1981, 781-93.

10. ----- and -----. Estimation and Inference in Econometrics, New York: Oxford University Press, 1993.

11. Dickey, David A. and Wayne A. Fuller, "Likelihood Ratio Statistics for Autoregressive Time Series with a Unit Root." Econometrica, July 1981, 1057-72.

12. Evans, G. B. A. and N. E. Savin, "Conflict Among the Criteria Revisited: The W, LR and LM Tests." Econometrica, May 1982, 737-48.

13. Fischer, Stanley, "Long-Term Contracts, Rational Expectations, and the Optimal Money Supply Rule." Journal of Political Economy, February 1977, 191-205.

14. Frydman, Roman and Peter Rappaport, "Is the Distinction Between Anticipated and Unanticipated Money Growth Relevant in Explaining Aggregate Output?" American Economic Review, September 1987, 693-703.

15. Godfrey, L. G. and M. Hashem Pesaran, "Tests of Non-Nested Regression Models: Small Sample Adjustments and Monte Carlo Evidence." Journal of Econometrics, 1983, 133-54.

16. Lucas, Robert E., Jr., "Expectations and the Neutrality of Money." Journal of Economic Theory, April 1972, 103-24.

17. -----, "Some International Evidence on Output-Inflation Tradeoffs." American Economic Review, June 1973, 326-34.

18. -----. "Econometric Policy Evaluation: A Critique," in The Phillips Curve and Labor Markets, Carnegie-Rochester Conference Series, edited by Karl Brunner and Allan Meltzer. New York: North-Holland, 1976, pp. 19-46.

19. MacKinnon, James G., "Model Specification Tests Against Non-Nested Alternatives." Econometric Reviews, 1983, 85-157.

20. McAleer, Michael, M. Hashem Pesaran and Anil K. Bera, "Alternative Approaches to Testing Non-Nested Models with Autocorrelated Disturbances." Communications in Statistics, Series A, 1990, 3619-44.

21. ----- and C. R. McKenzie, "Keynesian and New Classical Models of Unemployment Revisited." The Economic Journal, May 1991, 359-81.

22. McNees, Stephen K., "The Forecasting Record for the 1970s." New England Economic Review, September/October 1979, 33-53.

23. Mishkin, Frederic S. A Rational Expectations Approach to Macroeconometrics: Testing Policy Ineffectiveness and Efficient-Markets Models. Chicago: University of Chicago Press, 1983.

24. -----. Money, Banking, and Financial Markets, 3rd ed. New York: Harper Collins Publishers, 1992.

25. Pesaran, M. Hashem, "On the General Problem of Model Selection." Review of Economic Studies, April 1974, 153-71.

26. -----, "Comparison of Local Power of Alternative Tests of Non-Nested Regression Models." Econometrica, September 1982, 1287-1305.

27. -----, "A Critique of the Proposed Tests of the Natural Rate-Rational Expectations Hypothesis." The Economic Journal, September 1982, 529-54.

28. -----, "On the Policy Ineffectiveness Proposition and a Keynesian Alternative: A Rejoinder." The Economic Journal, June 1988, 504-508.

29. Rao, B. Bhaskara, "Some Further Evidence on the Policy Ineffectiveness Proposition." The Economic Journal, September 1992, 1244-50.

30. Rush, Mark and Douglas Waldo, "On the Policy Ineffectiveness Proposition and a Keynesian Alternative." The Economic Journal, June 1988, 498-503.

31. Small, David H., "Unanticipated Money Growth and Unemployment in the United States: Comment." American Economic Review, December 1979, 996-1009.