Adjusted P and F tests and the Keynesian-classical debate.
Bischoff, Charles W.
I. Introduction
The debate over the manner in which changes in money growth rates affect real economic activity, and over the role that systematic
monetary policy plays in this relationship, has been a protracted one,
yet it remains unresolved. This paper shows that a model proposed by
Fischer [13] in which multi-period-ahead expectational errors in money
growth are crucial for the determination of real economic aggregates is
empirically superior to two leading alternatives when the three are
estimated on a sample of U.S. quarterly data spanning the 1970s. In
addition, we show that the test used to demonstrate these results, a
version of Davidson and MacKinnon's [9] P test which we adjust
experimentally to remove small sample biases in its size, is, in this
application, more powerful than the more commonly-used F-ratio form of
the likelihood ratio test.(1) We use these results on relative power to
explain and reconcile some conflicting decisions to which the two
alternative tests lead.
The alternative specifications against which the Fischer model is
tested are an equilibrium business cycle model in which only
expectational errors over a short time horizon matter for real
variables,(2) and a model in which actual money growth affects the real
economy regardless of whether it is anticipated or not.(3) The direct
comparison of these three models is carried out using true ex ante
forecasts of money growth rather than an ex post decomposition of money
growth into an anticipated and an unanticipated part by way of the
frequently-used auxiliary equation approach.
Despite the many studies evaluating the New Classical and the AUDI models, neither the view that only unanticipated money growth matters
nor the view that only actual money growth matters has appeared
persuasive. It is well-known that these models and the Fischer model
have radically divergent implications regarding the efficacy of policy
conducted according to systematic rules,(4) rendering the choice of
specification crucial from a policy analysis perspective. Our evidence
of the superiority of Fischer's model thus supports the view that
anticipatable monetary policy can have real effects, but that these
effects differ from those following unanticipated policy changes.
II. Model Specifications and Alternative Tests
The models evaluated in this study are single-equation reduced form expressions which explain deviations in unemployment from its natural
rate by means of a distributed lag on alternative measures of money
growth. The forms of the three models of unemployment, which we identify
as the Barro, Fischer, and AUDI models respectively, are given by the
following equations:
U[N.sub.t] = U[N.sub.nt] + [summation of] [[Beta].sub.1i](t - i -
1[m.sub.t - i] - [E.sub.t - i - 1][m.sub.t - i]) + [u.sub.1t] where i =
0 to [k.sub.1],(1)
U[N.sub.t] = U[N.sub.nt] + [summation of] [[Beta].sub.2i](t - i -
1[m.sub.t] - [E.sub.t - i - 1][m.sub.t]) + [u.sub.2t] where i = 0 to
[k.sub.2],(2)
U[N.sub.t] = U[N.sub.nt] + [summation of] [[Beta].sub.3i] t - i -
1[m.sub.t - i] [u.sub.3t] where i = 0 to [k.sub.3],(3)
where U[N.sub.t] is a transformation of the rate of unemployment,
U[N.sub.nt] is the natural rate of the transformed unemployment rate, t
- i - 1 [m.sub.t-j] is the annualized rate of change in the money supply
between periods t - i - 1 and t - j, and [E.sub.t - i - 1][m.sub.t - j]
is the anticipated value of t - i - 1[m.sub.t - j] based on information
available in period t - i - 1. In each case, U[N.sub.nt] is represented
by including a constant and the fiscal variable G/y, the ratio of
federal government purchases to output,(5) in the estimated equations.
The residuals [u.sub.jt] are specified to be second-order autoregressive
transformations of independent and identically distributed normal random
variables, a specification that adequately captures the serial
correlation present in the quarterly data used.
There are many studies which have tested versions of the Barro model
in equation (1) and the AUDI model in equation (3), beginning with the
papers by Barro [1; 2], but the evidence has been quite mixed. The
conclusions reached by Barro include the findings that ". . . the
hypothesis that only the unanticipated part of money growth is relevant
to unemployment is accepted . . ." while ". . . the reverse
hypothesis that the [unanticipated money growth] values are irrelevant
to unemployment, given the [anticipated money growth] values, can easily
be rejected [1,109]." Mishkin, however, claims that ". . .
anticipated monetary policy does not appear to be less important than
unanticipated monetary policy. In fact, the opposite seems to be the
case [23, 118],"(6) although he does not directly test for the
significance of unanticipated money growth given its anticipated
component. Carns and Lombra [6] find that an ex ante measure of
unanticipated money growth serves to overturn Barro's finding of
insignificance of the anticipated part while supporting the conclusion
that unanticipated money matters.
Frydman and Rappaport consider the relative importance of
unanticipated and anticipated money growth and claim that "raw
money growth affects real output in the short run, irrespective of whether it is rationally anticipated or not [14, 702]." However,
though they demonstrate that a model characterized by their AUDI
hypothesis is not rejected by an equilibrium business cycle model, their
testing procedure does not allow a test of whether the equilibrium
business cycle model is rejected by the AUDI hypothesis.
McAleer, Pesaran, and Beta [20] find that their "New
Classical" model does not reject a "Keynesian"
specification, and as their approach allows for a switching of the roles
that the models play in the hypothesis test, they can also claim some
support for an apparent rejection of the New Classical model by the
Keynesian model. They point out, though, that this conclusion is suspect
because of biases in their test statistic. In fact, in a related study,
McAleer and McKenzie [21] present evidence that neither of two versions
of the New Classical model can be rejected by Keynesian models on the
basis of at least one test.(7) The recent exchange between Pesaran [27;
28] and Rush and Waldo [30] also serves to illustrate the tenuousness of
the conclusions when these models are tested against each other.
The empirical methods used in the above articles provide
illustrations of the types of tests of non-nested non-linear models
mentioned in the introduction. One approach involves embedding any two,
or all three, of the above competing models in an artificial composite
model, and then testing each of the original models against it. Such a
procedure therefore yields a set of linear restrictions that the null model imposes on some of the parameters in the artificial composite
model. For example, a test of the Barro model as the null hypothesis against the Fischer alternative(8) would require estimating the
artificial model
U[N.sub.t] = U[N.sub.nt] + [summation of] [[Beta].sub.1i](t - i -
1[m.sub.t - i] - [E.sub.t - i - 1][m.sub.t - i]) where i = 0 to
[k.sub.1] + [summation of] [[Beta].sub.2j](t - j - 1[m.sub.t] - [E.sub.t
- j - 1][m.sub.t]) + [u.sub.t] where j = 1 to [k.sub.2] (4)
and then testing the restrictions [[Beta].sub.2j] = 0 for j = 1
through [k.sub.2]. The composite equations necessary for testing other
competing specifications, and the implied restrictions, would be
constructed similarly.
Various tests of such restrictions are available, with those based on
the likelihood ratio principle being frequently used in this literature.
These tests involve a comparison of the values of the sum of squared
residuals from the restricted and the unrestricted or composite models,
based either on their difference or on a ratio of the two. One form of
the likelihood ratio test is based on the F-ratio(9) F = [(RSSR -
USSR)/q]/[USSR/(n - k)] where RSSR and USSR are the restricted and
unrestricted sum of squared residuals respectively, q is the number of
restrictions imposed by the null model on the composite equation, and n
- k is the degrees of freedom in the composite equation. No exact small
sample test statistic is available in the non-linear case, and the
second-order autoregressive error process makes our models
non-linear.(10) Since the chi-square form tends to reject more
frequently than does the F form, the F-ratio was used as the likelihood
ratio test. Tests were conducted in which each of the three models
served as the null against alternative composite models which embedded the null and each of the two competing models individually. In addition
to these six tests, each model served as a null against an alternative
embedding all three models simultaneously. Thus, we carried out a total
of nine F tests.
The other procedure which we considered employs the P test
appropriate for testing non-linear non-nested hypotheses, a test based
on the suggestions of Cox [7; 8] and proposed by Davidson and MacKinnon
[9], in which each specification is evaluated by its respective ability
to explain the variation in the dependent variable left unexplained by
the other specification. Examples of applications of the P test to
situations in which the non-linearities in estimation arise from
autoregressive specifications of the residuals are found in Bernanke,
Bohn, and Reiss [5] and McAleer, Peseran, and Bera [20]. The test
involves first using an appropriate non-linear least-squares estimation
procedure to obtain estimates of the null and alternative models. Then
ordinary least squares estimates of the coefficient [Alpha] and the
vector b in the auxiliary equation
[Mathematical Expression Omitted],
which can be more conveniently written as
[Mathematical Expression Omitted],
are obtained. In these equations, [Mathematical Expression Omitted]
and [Mathematical Expression Omitted] are the predicted values of
unemployment under the null and alternative models respectively, and
[H.sub.t] is the vector of derivatives of [Mathematical Expression
Omitted] with respect to each of the coefficients in the null model
evaluated at its non-linear least-squares estimates. When [Alpha] is set
to zero in either equation (5) or (6), the result is a Taylor-series
approximation around the non-linear estimate of the null model,
suggesting that a test of the null model can be based on a test of the
significance of the OLS estimate of [Alpha], [Mathematical Expression
Omitted], using its "t-ratio." A finding that [Mathematical
Expression Omitted] is statistically significant would then lead to a
rejection of the null by the alternative on the basis of this P test
procedure.
The same nine hypothesis tests were conducted using the P test as
were conducted with the F test procedure, six pairwise comparisons and
three tests of each null against the two alternatives simultaneously.
These last three tests of a null against a composite alternative were
conducted on the basis of a suggestion by Davidson and MacKinnon [9,
783](11) which involves a strategy similar to that just outlined, but
applied to the extended auxiliary equation
[Mathematical Expression Omitted]
where all previously defined variables are as before and
[Mathematical Expression Omitted] represents the predicted value from
the second alternative specification. The hypothesis that both
[[Alpha].sub.1] and [[Alpha].sub.2] are zero is tested using a standard
F test with two numerator-degrees-of-freedom, and a rejection of the
null hypothesis results when the F statistic exceeds its conventional
critical value. This extension of the non-nested testing procedure has
heretofore not been applied in any empirical studies.
As with the asymptotic F test procedure, the P test is strictly valid
only asymptotically since the "t-statistic" on [Mathematical
Expression Omitted] has a standard normal distribution in the limit but
an unknown distribution in small samples. An evaluation of the
small-sample properties of each of the tests and a comparison of the two
based on this evidence were therefore conducted on the basis of Monte
Carlo evidence. This experimental evidence also allowed straight-forward
adjustments in the critical regions of the various tests which yield a
uniform 5% probability of type I error.
These Monte Carlo comparisons of the small sample properties of the P
and F tests of non-linear non-nested models are the first to appear in
the literature. Pesaran [26] and Godfrey and Pesaran [15] provide
comparisons of the properties of the (unadjusted) J test, similar to the
P test, and the F test of a number of non-nested linear models. Rather
than rely on a comparison of our results to experimental evidence
strictly applicable only under different circumstances, we chose to
conduct our own experiments.
Since all of the eighteen tests we studied were found to over-reject
the null hypothesis when decisions were made on the basis of asymptotic
critical values, adjustments which reduce the frequency of rejection
would affect the power of all of the tests adversely. Other studies,
which considered only linear models, have found that non-nested tests
closely related to the P test are more powerful in small samples than F
tests when tests are based on asymptotic critical values.(12) However,
as the P tests as conducted here required more substantial small-sample
adjustments in their critical regions, the question of relative power
after such adjustments are made is raised. The findings presented below
suggest that even after the adjustments in size are made, the P tests
are still uniformly more powerful, often substantially so.
III. Data
The available raw data included the anticipations of the old M1 money
supply for the current through nine quarters ahead for the sample period
1970II through 1979IV.(13) From these, annualized expected growth rates
over an (i - j + 1)-quarter horizon, denoted above by [E.sub.t - i -
1][m.sub.t - j], were generated. Actual money growth rates for
corresponding time periods, the t - i - 1[m.sub.t - j]'s , were
based on revised figures for the old M1 series, used in order to
maintain consistency with the anticipations data. The unemployment rate
U, real output y, and the fiscal variable G used were the total labor
force unemployment rate, the level of real GNP (billions of 1972
dollars), and the level of real federal government purchases of goods
and services (billions of 1972 dollars), respectively. Regressions were
of the transformed variable UN = ln(U/(1 - U)) on a constant, the ratio
G/y, and the money growth rates relevant to each model.(14)
In order to preserve degrees of freedom and to allow a more extensive
comparison to the literature, we considered specifications with
distributed lags of length eight on the various measures of money
growth, so that in equations (1), (2) and (3), [k.sup.i] = 7 for i = 1,
2, 3.(15) We also found evidence in all models of a second-order
autoregressive error process, which also coincides with much of the
literature. Thus, nine quarters were dropped from the beginning of the
sample, leaving thirty observations on which to estimate models
involving twelve coefficients, and composite models involving between
nineteen and twenty-seven coefficients.
While more degrees of freedom would be desirable, we proceeded with
the available sample for a number of reasons. First, the expectations we
used were not being generated prior to 1970II, precluding any extension
of the sample backwards in time. Second, the Chase data were proprietary
information and were not available to us beyond 1979IV.
In addition, the announced change in 1979IV in the way the Federal
Reserve conducted monetary policy is likely to subject the estimated
equations to structural shifts akin to "Lucas critique"
complications if the estimation spanned this period and the coefficients
connecting money growth (whether anticipated or unanticipated) to
unemployment changed as a result of the 1979 regime change. If one uses
actual ex ante expectations of money growth during a time of change in
the conduct of monetary policy, as occurred under Volcker in 1979IV and
1982III, one does not need to worry about the shift being properly
reflected in the new specification of the monetary rule, but only that
the expectations are the person's best guess at the time they are
formed and recorded. However, the coefficients connecting errors in
expectations to the unemployment rate may shift as a result of the
regime change.(16) Thus we would expect that our equations, as estimated
on data from the 1970's, should not apply to the period between
1979IV and 1982III, nor to the period after 1982III.(17)
It is also not possible to go beyond 1979IV without splicing two
different data series due to the change in the definition of the money
supply being forecasted. Although the behavior of M1 during the
1980's has raised questions about its appropriateness, most studies
spanning periods through the seventies used M1 on the basis, in part, of
the better fit it provided compared to M2.(18) Thus, M1 is arguably the
better monetary aggregate to use, and limiting attention to the
seventies avoids the possibility of a structural shift due to the change
in its definition.
A similar analysis of the period since 1982III using M2 growth would
be of interest, but that is another study. The 1970s are not so far in
the past that it is not of interest what monetary model applied then. If
frequent changes in monetary policy occur, only relatively short and
intermittent episodes may be available for econometric analysis since,
in view of the "Lucas critique," precise quantitative results
should not be expected to carry over from one regime to another.
In any case, small sample sizes are often crucial only to the extent
that they do not permit the rejection of hypotheses, and when
tests' critical regions are based on asymptotic theory. Neither of
these concerns is applicable to this study. Our sample, according to the
size-adjusted P test procedure, is large enough to reject the Barro and
AUDI models when the Fischer model serves as the alternative.(19)
Furthermore, the tests are based on empirical distributions appropriate
to the available sample size, not on some inappropriate asymptotic
distribution. In light of these arguments, we felt it potentially very
informative to proceed with the data available.
IV. Estimation and Monte Carlo Experiments
The hypothesis testing procedures first required estimating each of
the models represented in equations (1), (2), and (3). From these
results the [Mathematical Expression Omitted] required for the P tests
and the restricted sums-of-squared-residuals used in the F tests were
obtained. The estimation results(20) are given in equations (1[prime]),
(2[prime]), and (3[prime]).
[Mathematical Expression Omitted];
[Mathematical Expression Omitted]
[Mathematical Expression Omitted];
[Mathematical Expression Omitted]
[Mathematical Expression Omitted];
[Mathematical Expression Omitted]
A ranking of the estimated standard errors of these regressions, the
[Mathematical Expression Omitted], suggests choosing the Fischer model
over the others. Indeed, since the models contain the same number of
parameters and are estimated on the same sample, conventional
discrimination criteria based on goodness-of-fit such as the adjusted
coefficient of determination [Mathematical Expression Omitted] and the
Akaike Information Criterion would lead to this decision. Nonetheless,
this evidence is merely indicative, while the hypothesis tests outlined
above can provide more exacting evidence upon which to base such a
decision.
The test statistic values resulting from these two procedures are
presented in the first row of figures in Table I.
It can be seen from equation (2[prime]), the estimate of the Fischer
model, that the largest and most significant coefficients are those on
expectational errors over a time horizon of six and seven quarters. This
implies that if multi-period-ahead expectational errors are an important
explanatory variable in an unemployment equation, it is largely due to
the existence of contracts which are fairly long-term in nature. That
nominal wage rigidities that persist for this length of time are an
important source of variations in unemployment in tum suggests that
policy actions which are implemented within this interval of time can
have real effects.
Upon specifying one of the three models as the null hypothesis, the
point estimates of this null model were used to generate 10,000
simulated realizations for the unemployment variable used as the measure
of real economic activity.(21) These data were then used to generate
10,000 values each of the three P test statistics and the three F test
statistics that were calculated under this null model. This was then
repeated posing each of the other two models as the null model in turn.
The proportion of times that the simulated test statistics exceed the
realized values in the first row of Table I provides estimates of the
small-sample-adjusted p-values associated with the observed test
statistics. These are reported in the second row of Table I. The 10,000
repetitions allow us to associate with an estimated p-value of 0.05 a
standard deviation of approximately [(0.05 * 0.95/10,000).sup.1/2] =
0.0022 and thus a fairly narrow confidence interval around our point
estimates [10,739]. If this adjusted p-value is less than the
conventional 0.05,(22) we reject the null model.
An alternative way of adjusting the tests for any small-sample biases
is to calculate the 95th percentile of the empirical distribution and
use this as an estimate of the adjusted critical value.(23) Comparing
the actual test statistic's outcome to this adjusted critical value
then determines the statistical decision. The adjusted critical values
are contained in the third row of Table 1. Critical values drawn from
relevant asymptotic distributions and the corresponding unadjusted
p-values implied by these are included in the lower half of Table I for
comparison.
Following the above procedure ensures that our various hypothesis
tests conform to the conventional 5% significance level standard.(24) In
order to evaluate both the relative merits of the alternative testing
procedures and to interpret the possible conflicting decisions that the
tests yield, the power of the various adjusted tests was then estimated.
To accomplish this, several sets of TABULAR DATA OMITTED 1000 simulated
realizations for unemployment were generated as described in footnote 21, but using the data and estimated coefficients of the models under
the alternative hypotheses as the base to which the generated errors
were added. These data were then used to generate 1000 observations each
on the P and F test statistics as applied to each of the hypotheses when
the alternative hypotheses were "true". The empirical
distributions then have upper tail areas above the corresponding
adjusted critical value from the previous stage of analysis which
approximate the power of that particular test against the specified
alternative. These estimates of the power of the adjusted tests are in
rows four through six of Table 1. The power levels of the unadjusted
tests are also presented in the lower half of the table in order to
provide an indication of the consequences for power of the
size-adjustments that were performed.
V. Empirical Results
The size-adjusted test results presented in Table I indicate three
rejections of the null hypothesis at the 5% level, all of these being
based on P tests. These rejections are of Barro's model by
Fischer's (at a 0.53% level), the AUDI model by Fischer's (at
a 0.96% level), and Barro's by the others jointly (at a 2.62%
level). The remaining case where Fischer's model serves as an
alternative, that where the AUDI null is tested against the other two
jointly, fails to reject the null at a 5% level but does yield a
rejection at the 9.41% level. Thus only when Fischer's model serves
as an alternative hypothesis does a rejection occur, and then only when
the P test is used. None of the tests where the Barro model or the AUDI
model (or both jointly) serve as the alternative hypothesis yield a
rejection of the null model, and in particular, in those cases where the
Fischer model acts as the null hypothesis, most of the p-values are over
80%. The P test results thus strongly support the Fischer model over the
other two specifications, even after adjusting for the P test's
propensity towards over-rejection.(25)
The conflicting decisions implied by the two tests in those cases
where the P test leads to a rejection can largely be resolved by the
experimental evidence on their relative powers. As seen in Table I, when
the Barro model serves as the null against either both alternatives or
the Fischer alternative, it is rejected at the conventional significance
level according to the adjusted P statistic but not according to the
adjusted F statistic. Similarly, the AUDI model is rejected by the
Fischer model according to the adjusted P statistic but not the adjusted
F statistic. In each of these cases, the estimated power levels shown in
line five of Table I indicate that the probability of a type II error,
that of failing to reject a false null model, is at least five times as
great for the adjusted F test when the Fischer model is taken to be the
true alternative.
A comparison of the adjusted power levels for the P and F tests
overall indicates that in all cases the P test is substantially more
powerful. Nine of the twelve estimates of the P test power exceed 50%
and six exceed 75%, while seven of the power estimates of the F tests do
not even achieve a level of 10%, and only two exceed 50%.(26)
These test results can also clarify some of the results presented in
other studies on the basis of alternative tests. Frydman and
Rappaport's [14] failure to reject their AUDI model by a Barro and
Rush [4] alternative model, for example, is supported by our P test
results.(27) However, we also find that neither can the AUDI model
reject the Barro and Rush model, a result that Frydman and
Rappaport's procedure does not allow them to consider. Moreover,
the failure of our adjusted P test to yield a rejection of the Barro
model is not likely to be due to low power since by our experimental
determination the power of the test is over 87% in this case. This
raises doubts regarding Frydman and Rappaport's conclusion that
"the AUDI hypothesis receives substantial empirical support [14,
693]."
McAleer and McKenzie [21] claim a rejection of their New Classical
model by their Keynesian model on the basis of a J test (a test
identical to our P test in the case of linear models) but fail to reject
the New Classical specification on the basis of an asymptotic F test.
They then speculate that
[g]iven the published results on asymptotic local power of various
non-nested tests, the failure of the asymptotic F test to reject the
null may simply reflect lower power relative to the J and JA tests [21,
374].
Thus the conflict in decisions indicated by their tests is attributed
by them to a likely underrejection of the New Classical model on the
basis of the F test, implying that the Keynesian model is the correct
one. Our results suggest otherwise.
When the Barro null was tested against the AUDI alternative, which
most closely approximates McAleer and McKenzie's test of the New
Classical null versus the Keynesian alternative, we also found that,
before adjusting the two tests, the decisions they led to were in
conflict.(28) We also found, though, that the bias in the size of our
unadjusted P test (and by extension their J test) was very large,
requiring an adjustment which in fact reversed the initial decision to
reject the null. Thus, by our adjusted tests, the Barro model was not
rejected by the Keynesian (AUDI) model. Not only was the conflict
between the two tests resolved, but the adjustments also yielded
estimated significance levels for the two tests which were essentially
identical. This clearly raises the possibility that it is the properties
of the J test, and not the F test, that McAleer and McKenzie use, that
account for the disagreement between their tests.
Our interpretation, in conjunction with the failure of the New
Classical model to reject the Keynesian model, presents a different
conclusion. These adjusted test results do not seem to be the product
either of size biases or, in the case of the adjusted P test, lack of
power. Instead, we suggest that they result from neither of these two
models being the appropriate specification. The New Keynesian (Fischer)
model is suggested to be the closest to the appropriate model.
VI. Summary and Conclusions
This paper has reconsidered the empirical relevance of unanticipated
money growth as a determinant of real economic behavior as measured by
the rate of unemployment. Attention has been paid in particular to the
role that multi-period-ahead expectational errors might play, as
suggested by Fischer's important contracting-based model. The
Fischer model was tested against two competing alternatives, a New
Classical model in which only short-term expectational errors matter and
a Keynesian model where actual money matters, both of which have been
frequently and recently evaluated empirically.
Our statistical testing of these models has diverged from the
conventional approach in a number of ways. Most importantly, two
different testing techniques, Davidson and MacKinnon's [9] P test
and the conventional F test, were applied, and their small-sample
properties were assessed on the basis of Monte Carlo experiments. These
experiments provide a means of adjusting for biases in size, and allow a
comparison of the power levels of the size-adjusted tests.
In addition, this paper has avoided some econometric difficulties
that have plagued other similar studies by its use of actual ex ante
forecasts of money growth rates in the estimation and testing of the
Fischer and New Classical models, rather than using proxy measures of
anticipations derived as predicted values from an auxiliary equation.
The test results presented here lend strong support to the model in
which a mechanism operating through multi-period-ahead expectational
errors on money growth explains the behavior of unemployment. This
conclusion is reinforced when Monte Carlo evidence on the properties of
small-sample tests used by others to evaluate similar models is used to
interpret their mixed results. We reconfirm the view that pre-announced
monetary policy actions can have real effects, and that the effects of
these anticipated changes differ from those due to similar but
unanticipated changes. This view directly conflicts with the policy
implications drawn from the other two models, both of which have
received a great deal more attention in the empirical literature.
Our results also provide a stark contrast between the properties of
the traditional F test and the less familiar P test. Even after
adjustments are made to their respective critical regions to ensure that
they have comparable size, the P test is found to be substantially more
powerful than the F test. Given the ease with which the P test can be
implemented and adjusted as described here, we suggest the more frequent
use of such a procedure when confronted with competing non-linear
non-nested specifications, particularly when samples are of the size
commonly found in applied macroeconometric studies.
1. Examples of related studies which use the likelihood ratio
principle include those by Barro [1; 2], Barro and Rush [4], Mishkin
[23], Carns and Lombra [6], and Frydman and Rappaport [14].
2. Such models, also referred to in the literature as "New
Classical" or "monetary misperceptions" models, were
first evaluated empirically by Barro [1; 2] in prominent studies which
have produced numerous responses, both supportive and critical of
Barro's initial findings. Although real business cycle models are
also commonly specified as equilibrium models, we do not consider any
version of such a model here as money plays no crucial role in them.
3. Models of this type have been referred to as "AUDI" (an
acronym for Anticipated-Unanticipated Distinction Irrelevent) by Frydman
and Rappaport [14], and "Keynesian" by Pesaran [27; 28],
McAleer and McKenzie [21], and others, although they could easily be
given a "monetarist" interpretation.
4. Mishkin [24, 718-33] provides a non-technical assessment of the
policy implications of each of the three models considered here.
5. This fiscal variable is the one used by Barro and Rush and is
included to capture shifts in the natural rate of their transformed
unemployment rate [4, 31-2], not as an aggregate demand-management
policy variable. As is reported below, when G/y is included in the
estimable equations as part of U[N.sub.nt], the resulting coefficient is
always insignificantly different from zero by conventional standards.
Nevertheless, we retained this variable in the specifications to
maintain comparability with Barro and Rush.
6. We note that this particular quotation applies to results based on
equations with longer distributed lags, polynomial distributed lag
smoothness constraints imposed on the coefficients, a different measure
of unemployment, and different variables reflecting the natural rate,
relative to Barro's and our papers. When an equation with seven
lags is considered, the claim that only unanticipated money matters is
not rejected at the 5% level, as stated in Mishkin's Table 6.1,
Column 2.2 [23, 117].
7. Rao [29], in a derivative study, also finds that the Keynesian
model fails to reject the New Classical model at the 5% level in one of
four cases after his test statistic is adjusted for bias.
8. Although, strictly speaking, the test is not against the Fischer
model alone but against an artificial model which includes the Fischer
model as well as the Barro null.
9. Davidson and MacKinnon [10, 92] recognize that "it may be
stretching terminology somewhat to call [the F-ratio] an LR test, but it
is certainly reasonable to say that [it] is based on the likelihood
ratio principle if the latter is broadly defined to mean basing a test
on the difference fetween the values of an objective function at the
restricted and unrestricted estimates." This is the interpretation
employed in this paper.
10. In the case of models and restrictions which are linear, the
F-ratio provides an exact test. Evans and Savin [12] present some
evidence on the small-sample properties of the asymptotic forms based on
the chi-square distribution venus exact forms of the likelihood ratio
tests. When the model is non-linear, however, the F-ratio is distributed
according to the F distribution only asymptotically. This leads Davidson
and MacKinnon to refer to the F-ratio calculated for non-linear models
as a pseudo-F statistic [10, 92].
11. Also see MacKinnon [19, 96-7].
12. Pesaran [25; 26] finds this to be the case when comparing the
Cox-type N test and the Davidson-MacKinnon J test to the F test, and
Godfrey and Pesaran [15] report similar results for an adjusted N test
and the W test.
13. The anticipations were ex ante forecasts made by Chase
Econometric Associates, Inc., and its predecessor organizations. We are
grateful to Michael K. Evans, Steven J. Elgart, and Leon W. Taub for
making these forecasts available to us.
All other studies of the effects of unanticipated money, with the
exception of Carns and Lombra [6], use auxiliary equations to decompose actual money growth into anticipated and unanticipated components. The
proxies generated by this approach are estimates of actual forecasts and
as such are not without their difficulties. They require, for one, an ex
post specification by the econometrician of the information set
available to forecasters at the time their forecasts were formed.
Empirical results have frequently been found, however, to lack
robustness to changes in the content of the information set or to
changes in the measurement of variables included. See the exchanges, for
example, between Barro [1; 3] and Small [31] and between Pesaran [27;
28] and Rush and Waldo [30]. Also, the processes by which the variables
in the information set are used by forecasters to generate expectations
are assumed to be linear and time-invariant. If in fact the actual
expectations formulation process is altered in response to structural
shifts or trends in the economic environment, the proxies generated by
the auxiliary equation will be subject to further misspecification. This
would be the case, for example, if the model were estimated over a
period during which there were a change in policy regime which altered
the process by which rational agents formed expectations.
The use of ex ante forecasts, being actual observations rather than
econometric estimates, avoids these potential problems. The forecasts
used here were formed by a major forecasting firm. Chase Econometrics Incorporated, using the information set it felt was most relevant at the
time that each of its forecasts were formed. Furthermore, it was quite
possible for Chase to change its forecasting procedure as and if it felt
changing conditions warranted. Thus the use of these actual forecasts
avoids misspecification by the econometrician of the actual
period-by-period expectations-generating process.
The use of an auxiliary equation, which often takes the form of a
difference equation with additional explanatory variables which must
also be forecasted, is likely to produce errors in estimating ex ante
forecasts which increase in variance with the forecast horizon. In an
equation representing the Fischer model in which multi-period forecast
errors appear, this will result in explanatory variables which are prone
to measurement error. The use of actual ex ante forecasts, even from a
sample of size one, should lessen this problem.
The Chase forecasts are at least equal in accuracy and other
desirable properties to other measures found in the literature. In a
study comparing the forecasts of Chase, Data Resources Inc., and Wharton
EFA, McNees [22, 41] finds that in terms of mean absolute errors of
their money forecasts, Chase does as well as DRI and better than
Wharton. An empirical comparison of Chase's one-period-ahead errors
with those contained in Burro and Rush's [4, 40-6] Table 2.3 also
favors the Chase measure using either mean absolute errors or root mean
square errors. Using the test by which Carns and Lombra [6] test for the
rationality of DRI's one-period-ahead forecasts, we found that the
Chase forecasts are also rational. In addition to these features of the
ex ante forecasts we used, support for their use can also be based on
Carns and Lombra's argument that they provide a check on previous
studies' results.
14. This specification coincides with that reported in Column (9) of
Table 2.1 in Barro and Rush [4].
15. Barro and Rush [4], Mishkin [23], Carns and Lombra [6], and
Frydman and Rappaport [14] all considered distributed lags of this
length when estimating models on quarterly data. This choice of lag
length is particularly useful in contrasting our results with these
studies since it is under this specification that Frydman and Rappaport
claim the strongest support for their AUDI model relative to the Barro
model while Mishkin finds less evidence to support his rejection of the
Barro model using a short lag length.
16. Our equations (1), (2), and (3) are essentially Phillips Curves,
and the slopes of these relationships are not invariant to changes in
the monetary regime, a point which is emphasized by Lucas [16; 17; 18].
17. Indications of coefficient instability in samples spanning these
regime changes are found in at least two studies, although neither
reports formal test statistics. Both Rush and Waldo [30, 501] and
Pesaran [28, 506] report evidence of instability when their samples are
extended through 1985.
18. For instance, Barro [1, 108] and Mishkin [23, 124] both refer to
a deterioration of fit with M2.
19. It is true that a larger sample, if available, might lead to
rejection of the Fischer model with either of the other two models as
alternatives. This simply recognizes that none of the models considered
here is literally true and thus each will certainly be rejected on the
basis of a large enough sample. For example, an extended version of the
Fischer model (see Fischer's [13, 203] footnote 20) could be
considered which would include all of the unanticipated money growth
terms in the Barro-Rush model. It would not be difficult to imagine an
even more general model which includes the actual money growth rates of
the AUDI specification as well. If this model accurately reflected the
data generating process, we would expect that, with a large enough
sample, each of the three models we consider here would be rejected by
each of the others since each would explain some variation in output not
explained by the others.
20. The non-linear estimation procedure used was iterative Cochrane-Orcutt using TSP Version 4.2. t-statistics are in parentheses,
and all have a 5% critical value of 2.101. [Mathematical Expression
Omitted] denotes the estimated slandard error of the regression, and DW
is the Durbin-Watson d-statistic.
21. Serially correlated random errors were formed by generating
independent normally distributed errors with standard deviation given by
the estimate from the null model, then passing them through the filter
defined by the estimated autoregressive process. This was done for a
sample extending back to 1945I in order to essentially eliminate the
influence that initial zero values for the errors had on the effective
sample. These generated errors were then used with the actual money
growth data and expectations data to form the simulated unemployment
rate data. The seed used by the pseudo-random number generator was also
generated (pseudo-) randomly and then retrieved for the purpose of
allowing replication. Program code and seed values are available from
the authors upon request.
22. If one chooses to be more conservative, then we reject the null
model if this adjusted p-value is less than 0.05 minus 1.96 standard
errors of size 0.0022, or less than 0.0457.
23. This approach is taken by, for example, Dickey and Fuller [11]
when they present finite sample properties of statistics applicable to
unit root tests.
24. The prior specification of some other significance level such as
1% or 10% requires making obvious changes to our procedure.
25. We can indeed see, by comparing the unadjusted and adjusted
p-values associated with these test statistics, that the adjustments are
sometimes substantial, as well as important in the sense that they
sometimes overturn initial rejections. There are four cases in which an
apparent 5% level rejection is reversed on the basis of
small-sample-size adjustments, three of these being to P tests, and two
not even achieving an adjusted 10% level rejection.
26. Both of these situations are ones where the Fischer model serves
as the sole alternative hypothesis. Those cases where the F test power
levels are not much in excess of 5% have a probability of correctly
rejecting a false null that is no better than the probability of
incorrectly rejecting a true one; the test statistic has essentially no
usefulness since its small sample distribution, at least at the upper
tail, is basically identical under both the null and alternative
hypotheses.
27. It must be noted, of course, that our dependent variable and our
measure of anticipated money differ from Frydman and Rappaport's,
as does our sample period.
28. We note that our empirical results are based on different data,
on a different sample period and frequency, although our sample size,
30, is comparable to the size of the two samples used by McAleer and
McKenzie, 28 and 40. However, the F test was confirmed to be less
powerful than the P test, as they conjectured.
References
1. Barro, Robert J., "Unanticipated Money Growth and
Unemployment in the United States." American Economic Review, March
1977, 101-15.
2. -----, "Unanticipated Money, Output, and the Price Level in
the United States." Journal of Political Economy, August 1978,
549-80.
3. -----, "Unanticipated Money Growth and Unemployment in the
United States: Reply." American Economic Review, December 1979,
1004-1009.
4. ----- and Mark Rush. "Unanticipated Money and Economic
Activity," in Rational Expectations and Economic Policy, edited by
Stanley Fischer. Chicago: University of Chicago Press, 1980, pp. 23-73.
5. Bernanke, Ben, Henning Bohn and Peter C. Reiss, "Alternative
Non-Nested Specification Tests of Time-Series Investment Models."
Journal of Econometrics, March 1988, 293-326.
6. Carns, Frederick, and Raymond Lombra, "Rational Expectations
and Short-Run Neutrality: A Reexamination of the Role of Anticipated
Money Growth." Review of Economics and Statistics, November 1983,
639-43.
7. Cox, David R., "Tests of Separate Families of
Hypotheses," in Proceedings of the Fourth Berkeley Symposium on
Mathematical Statistics and Probability. Berkeley: University of
California Press, 1961.
8. -----, "Further Results on Tests of Separate Families of
Hypotheses." Journal of the Royal Statistical Society, 1962, Series
B, 406-24.
9. Davidson, Russell and James G. MacKinnon, "Several Tests for
Model Specification in the Presence of Alternative Hypotheses."
Econometrica, May 1981, 781-93.
10. ----- and -----. Estimation and Inference in Econometrics, New
York: Oxford University Press, 1993.
11. Dickey, David A. and Wayne A. Fuller, "Likelihood Ratio
Statistics for Autoregressive Time Series with a Unit Root."
Econometrica, July 1981, 1057-72.
12. Evans, G. B. A. and N. E. Savin, "Conflict Among the
Criteria Revisited: The W, LR and LM Tests." Econometrica, May
1982, 737-48.
13. Fischer, Stanley, "Long-Term Contracts, Rational
Expectations, and the Optimal Money Supply Rule." Journal of
Political Economy, February 1977, 191-205.
14. Frydman, Roman and Peter Rappaport, "Is the Distinction
Between Anticipated and Unanticipated Money Growth Relevant in
Explaining Aggregate Output?" American Economic Review, September
1987, 693-703.
15. Godfrey, L. G. and M. Hashem Pesaran, "Tests of Non-Nested
Regression Models: Small Sample Adjustments and Monte Carlo
Evidence." Journal of Econometrics, 1983, 133-54.
16. Lucas, Robert E., Jr., "Expectations and the Neutrality of
Money." Journal of Economic Theory, April 1972, 103-24.
17. -----, "Some International Evidence on Output-Inflation
Tradeoffs." American Economic Review, June 1973, 326-34.
18. -----. "Econometric Policy Evaluation: A Critique," in
The Phillips Curve and Labor Markets, Carnegie-Rochester Conference
Series, edited by Karl Brunner and Allan Meltzer. New York:
North-Holland, 1976, pp. 19-46.
19. MacKinnon, James G., "Model Specification Tests Against
Non-Nested Alternatives." Econometric Reviews, 1983, 85-157.
20. McAleer, Michael, M. Hashem Pesaran and Anil K. Bera,
"Alternative Approaches to Testing Non-Nested Models with
Autocorrelated Disturbances." Communications in Statistics, Series
A, 1990, 3619-44.
21. ----- and C. R. McKenzie, "Keynesian and New Classical
Models of Unemployment Revisited." The Economic Journal, May 1991,
359-81.
22. McNees, Stephen K., "The Forecasting Record for the
1970s." New England Economic Review, September/October 1979, 33-53.
23. Mishkin, Frederic S. A Rational Expectations Approach to
Macroeconometrics: Testing Policy Ineffectiveness and Efficient-Markets
Models. Chicago: University of Chicago Press, 1983.
24. -----. Money, Banking, and Financial Markets, 3rd ed. New York:
Harper Collins Publishers, 1992.
25. Pesaran, M. Hashem, "On the General Problem of Model
Selection." Review of Economic Studies, April 1974, 153-71.
26. -----, "Comparison of Local Power of Alternative Tests of
Non-Nested Regression Models." Econometrica, September 1982,
1287-1305.
27. -----, "A Critique of the Proposed Tests of the Natural
Rate-Rational Expectations Hypothesis." The Economic Journal,
September 1982, 529-54.
28. -----, "On the Policy Ineffectiveness Proposition and a
Keynesian Alternative: A Rejoinder." The Economic Journal, June
1988, 504-508.
29. Rao, B. Bhaskara, "Some Further Evidence on the Policy
Ineffectiveness Proposition." The Economic Journal, September 1992,
1244-50.
30. Rush, Mark and Douglas Waldo, "On the Policy Ineffectiveness
Proposition and a Keynesian Alternative." The Economic Journal,
June 1988, 498-503.
31. Small, David H., "Unanticipated Money Growth and
Unemployment in the United States: Comment." American Economic
Review, December 1979, 996-1009.