Errors-in-Variables Bounds in a Tobit Model of Endogenous Protection.
Bohara, Alok K.
Kishore Gawande [*]
Alok K. Bohara [+]
The errors-in-variables (EIV) problem is pervasive in econometrics but has not received the attention it deserves, perhaps because it is
difficult to resolve. The first objective of this paper is to
demonstrate the effectiveness of recently developed methods to deal with
the EIV problem in models with censoring. The second objective of this
paper is to empirically examine, in light of the EIV problem, theories
of endogenous protection that have become important in trade theory in
their ability to explain why nations do not follow the traditional
economic maxim of free trade. These theories emphasizing
political-economic factors have gained momentum based on a set of
empirical studies that have sought to prove their validity. Whether
inferences about the theories of endogenous protection are gravely
affected by errors in variables is examined using data on U.S. nontariff
barriers with respect to nine developed countries. The theoretical
developments in Kiepper (1988) and Klepper and Learner (1984) are
combined with a result from Levine (1986), which usefully extends the
use of EIV diagnostics to a model with censoring.
1. Introduction
The predominantly nonexperimental nature of economic data compels
the use of proxies, imperfectly measured variables, and dirty data. This
paper is motivated by the cogent arguments in favor of sensitivity
analyses made by Leamer (1983, 1985). In this paper the recent
theoretical advances in the errors-in-variables (EIV) literature by
Klepper and Leamer (1984) and Klepper (1988), which have focused on the
linear regression model, are applied to a Tobit model via a result in
Levine (1986).
The empirical literature on endogenous protection provides a rich
context within which to study the sensitivity of inferences to the EIV
problem. Studies of endogenous protection based on the seminal empirical
work of Pincus (1975), Caves (1976), Ray (1981), and Baldwin (1985), and
among others, have significantly influenced traditional thinking in the
area of trade. It is a prime example of empirical work that has led
theoretical development and continues to influence it. But inferences
from econometric studies of endogenous protection are suspected to be
fragile because there is widespread use of proxies and variables that
are poorly measured. The EIV problem is not specific to just the
variables measured with error. The extensive use of mismeasured
variables and proxies may lead to spurious estimates on even
well-measured variables.
This paper seeks to make two contributions. First, using
cross-country and cross-industry data on nontariff barriers, the
sensitivity of inferences about the validity of theories of endogenous
protection to classical errors in variables is investigated. Second, the
applicability of the EIV methodology to limited-dependent variable
models is demonstrated. The paper proceeds as follows. In section 2, the
choice of regressors is motivated, an endogenous protection equation is
estimated, and inferences are made under the presumption that there are
no errors in variables. In section 3, the EIV methodology and its
extension to the Tobit model is described. In section 4, two kinds of
EIV analyses are performed, one that leads to bounds on estimated
coefficients and another that exposes those inferences that are fragile
on account of errors in variables. Section 5 concludes.
2. Inferences About Endogenous Protection Theories
Empirical Specification
Trade protection in the United States has been modeled in the
empirical literature as including four components: (i) a self-interested political component that is a response to protectionist pressures, which
is substantially influenced by the lobbying efforts of private agents,
(ii) an altruistic political component influenced by welfare-oriented
motives of the government, (iii) a retaliatory component that serves as
a strategic deterrent against undesirable protectionist policies of its
partners, and (iv) a component motivated by comparative advantage. Their
empirical relevance has been demonstrated by Caves (1976), among others,
using tariff data from the Kennedy round, by Ray (1981) and Baldwin
(1985) using tariff data during the Tokyo round of cuts, and by Trefler
(1993) using aggregate U.S. NTB (non-tariff barrier) data from 1983.
A feature of this study is the use of bilateral cross-industry NTB
data between the United States and nine developed partner countries.
NTBs include all trade barriers other than ad valorem tariffs. Prominent
examples of NTBs are antidumping duties to thwart dumping below fair
price, countervailing duties to counter partner's export subsidies,
quotas whose licences may be distributed to domestic agents, and
voluntary export restraints where the partner country voluntarily
restricts its exports. Leamer (1990) provides an exhaustive taxonomy of
nearly 50 NTBs. After the Tokyo round tariff cuts, the new protectionism in developed countries took the form of NTBs. In the United States,
their use sharply escalated after 1979 and continued to rise through the
1980s. Data from 1983 are used in this study and capture a period in
which the use of NTBs was widespread.
NTBs are measured in this study as coverage ratios; that is, the
fraction of imports covered by some NTB or other. Following
Baldwin's framework, the specification employed in the econometric
analysis is
[N.sub.ij] = [X1.sub.ij][[alpha].sub.1] +
[X2.sub.ij][[alpha].sub.2] + [X3.sub.ij][[alpha].sub.3] +
[[[beta]N.sup.*].sub.ij] + [[D.sup.*].sub.j][[gamma].sub.j] +
[[varepsilon].sub.ij],
[[varepsilon].sub.ij] [sim] N(0, [[sigma].sup.2]), i = 1, [ldots],
435, j = 1,[ldots], 9. (1)
United States NTBs on good i against country j, [N.sub.ij], are
determined by a self-interested political component, whose variables are
represented by the vector [X1.sub.ij], an altruistic political component
represented by [X2.sub.ij] the theory of comparative costs represented
by [X3.sub.ij], and an offensive component, [[[beta]N.sup.*].sub.ij],
designed to thwart foreign NTBs. Country-effect dummy variables are
included in [[D.sup.*].sub.j]. The parameters [[alpha].sub.1],
[[alpha].sub.2], [[alpha].sub.3], and [beta] are assumed stable across
industries and countries, and Equation 1 is estimated by pooling
industry and country data. Here, cross-industry data at the four-digit
SIC level of disaggregation are pooled across nine countries: Belgium,
Finland, France, Germany, Italy, Japan, the Netherlands, Norway, and the
United Kingdom. The errors [[varepsilon].sub.ij] are assumed to be
homoskedastic across countries and goods.
The choice of regressors in Equation 1 is influenced by
Baldwin's study but also includes newly constructed variables, and
an additional model, namely, the strategically retaliatory model. Table
1 shows the association of regressors with the underlying theory and the
expected sign on each coefficient. The Appendix details the construction
of the variables.
(i) Strategic retaliation: The aim of using retaliatory NTBs, as
opposed to employing them for purely protectionist purposes, is to deter
undesirable foreign trade policy at minimum domestic cost. In a
game-theoretic political economy model based on fairly real-world
assumptions, Baldwin (1990) shows the existence of optimal nonnegative retaliatory trade barriers. His argument is motivated by the ability of
retaliatory measures to discourage special-interest pressures in the
foreign country that led to the formation of the foreign trade barrier
in the first place. Grossman and Helpman (1995) describe a model with a
noncooperative trade-war equilibrium and a bargaining equilibrium
resulting from trade talks. In Baron's (1997) case study of the
Kodak-Fujifilm trade war, the use of normarket strategies such as
applying pressure on the government to impose sanctions is highlighted
using a bargaining model. Although these theories about retaliation and
bargaining are more properly tested using relative levels of protection
in the two countries, the use of foreign NTBs as a regressor can be used
to infer whether the United States retaliates against high NTBs abroad
(positive sign on [[N.sup.*].sub.ij]) or whether high NTBs abroad are an
indication of greater bargaining strength in the partner country
(negative sign on [[N.sup.*].sub.ij]).
(ii) Special-interest or pressure group model: The special-interest
group model associated with Olson (1965) and Pincus (1975), and
subsequently formalized by Brock and Magee (1978) and Findlay and
Wellisz (1982), suggests measures of special-interest pressure. The
concentration ratio (CONC4) and measures of scale economies (SCALE) have
traditionally been used as proxies for special-interest pressures
because the stakes from protection are highest in industries with a high
degree of concentration or scale economies. In addition to these
proxies, a more direct measure of pressure--corporate PAC (Political
Action Committee) campaign contributions scaled by industry value added (PACCVA83)--is employed and presumed to be positively related to the
level of protection. More recently, protection has been modeled by
Grossman and Helpman (1994) as the outcome of a menu auction, in which
industry lobbies each bid on a menu of trade tax vectors. The government
then sets a specific trade tax vector and collects from eac h lobby its
bid on that specific vector. In their model, which abstracts from market
structure issues, the prediction is that the inverse import penetration
ratio (CONS/[M.sub.ij]) should be positively related to protection. [1]
(iii) Adding machine model: The adding machine model due to Caves
(1976) focuses on the voting strength of the industry and suggests that
number of employees (NE82) and degree of unionization (UNION) in that
industry and the industry's labor intensity (LABINT82) are all
positively related to the level of NTBs. The number of states in which
production is located (REPRST) is another measure of the spread and
consequently of voting power. Further, this model predicts that
industries with a large number of unconcentrated firms are more likely
to receive protection than a concentrated industry so that the level of
protection is expected to be negatively associated with the
concentration ratio (CONC4). The special-interest group model and the
adding machine model fall under the category of political models
motivated by self-interest, and variables that represent them are
collected in X1.
(iv) Public interest: The set of models emphasizing the public
interest range from the status quo model based on Corden's (1974)
conservative social welfare function, to models of government altruism (e.g., Lavergne 1983) emphasizing equity issues. The status quo model,
used by Baldwin with some success in explaining tariff cuts during the
Tokyo round, suggests that the proportion of unskilled workers (P_UNSK)
and tariff protection (TAR) will be positively related to the level of
NTB protection. The equity model has been offered as an explanation as
to why industries without much political clout such as apparel and
textiles have been successful in obtaining protection. It suggests that
industries with low average earnings (AVEARN) or high labor intensity
(LABINT82) will likely obtain high levels of protection. The variables
associated with the status quo and equity models are contained in X2.
(v) Comparative cost-comparative advantage: The traditional
comparative advantage model of trade suggests that NTBs are positively
related to import penetration ([M.sub.ij]/CONS) and negatively to
exports ([X.sub.ij]/CONS). Because U.S. industries have been
demonstrated to have a comparative advantage in skill-intensive
industries, industries with a high proportion of scientists (P_SCI) and
managers (P_MAN) are expected to require less protection. A large change
in import penetration (DPEN7982) may be a sign of an industry that has
lost its comparative advantage and hence is a candidate for protection.
Other control variables included in X3, in addition to the comparative
cost variables, address the concerns of incorporating the effects of
real exchange rates into the cross-sectional analysis. The absolute
value of both, the real-exchange-rate (RER) elasticity of imports
(MELAST) and exports (XELAST), are expected to be positively related to
the level of protection during this period. [2] The extended period of
RER appreciation between 1981-1984 led to a rise in import penetration
and a lowering of exports, fueling protectionist pressures in industries
with high (absolute) RER elasticities.
Inference from ML Estimates
The dependent variable, [N.sub.ij], is measured only when it takes
a positive value. Foreign export subsidies, if not countervailed in the
United States act like negative NTBs, and the absence of any export
subsidy data leads to censoring in the lhs variable. Also, theoretically
there exist arguments for direct import subsidies (see Vousden 1990;
Grossman and Helpman 1994). With intra-industry trade in intermediate
goods, which characterizes a large part of the trade among the countries
in the present analysis, there is certainly the possibility of
subsidizing imports. Hence, the dependent variable is truncated below
zero, requiring the use of a Tobit specification.
The ML estimates from two models in Table 2 indicate that every
political economy model finds some support from the data. In the first
model, country effect dummies are included, while in the second model
both country dummies and industry dummies for four industry groups--food
processing, resource-based manufacturing, general manufacturing, and
capital-intensive industries--are included. Both models lead to similar
estimates. Of the group of variables representing any theory, at least
one variable has the expected sign and a large t-value. The model of
retaliation is clearly supported, with an estimate for the retaliation
coefficient that is both statistically and economically significant. The
pressure group model finds strong support from the significant estimate
on corporate PAC spending (PACCVA83). This variable provides clear-cut
and direct inference about special interest groups rather than the
indirect inference through the coefficient on the industry concentration
ratio (CONC4). The adding machine mod el finds support from the estimate
on number employed (NE82) and on the geographic spread of firms within
an industry (REPRST), validating Caves' (1976) theory that voting
power in terms of numbers is an important determinant of whether an
industry receives protection. However, the unexpected sign on labor
intensity (LABINT82) is contrary, and this finding is hard to
rationalize.
The data provide mixed inferences about models of political
altruism. The positive ML estimate on average earnings do not support
the status quo model of Corden (1974), but the positive coefficient on
ad valorem tariffs (TAR) indicates that the same industries that were
earlier protected by tariffs now receive NTB protection, thus
undermining the multilateral cuts from the Tokyo round. The status quo
model predicts that to prevent damage to industries from a sudden
removal of protection to the most highly protected before the Tokyo
round implementations, these industries would be supported in some
alternative way. The results show that tariff cu NTBs filled the gap in
protection left by the Tokyo round cuts. The equity model does not
receive support from the data. Although the coefficient on the
proportion unskilled (P_UNSK) is positive, it has a low t-value. Perhaps
the labor-intensity variable (LABINT82), earlier attributed to the
adding machine model, is more representative of the equity model and the
neg ative coefficient is evidence against that model. The model of
comparative costs--comparative advantages is strongly supported by the
data. Bilateral imports and exports ([M.sub.ij]/CONS, [X.sub.ij]/CONS)
have the expected signs and high t-values. Industries with high skill
levels measured by the proportion of scientists and engineers (P_SCI) do
not receive protection largely because as past empirical studies have
shown, the United States has a comparative advantage in the production
of skill-intensive goods.
A strong criticism of the empirical model is its tenuous connection
with any formal underlying theory. Due to the ad hoc nature of these
models, the variables are at best proxies for the theoretically correct
measure of special interest pressure, voting strength, public interest,
or government altruism. In addition to errors in variables arising from
their use as proxies, the variables are imperfectly measured. It is
therefore relevant to question whether the inferences in Table 2 are
robust to errors in variables. CONC4 and SCALE seek to measure the
stakes to firms from obtaining protection, as well as the ability to
solve the free-riding problem in organizing lobbying activities. REPRST
seeks to measure congressional representation of industries. P_SCI and
P_MAN seek to measure human capital across industries. P_UNSK is a proxy
for those workers for whom protection is the only form of insurance
against unemployment. The trade measures (NTB, import penetration,
export-to-consumption ratio, and tariffs) are subject to measurement
errors because they are concorded from disparate international systems
of data-keeping. PACCVA83, which is constructed from Federal Election
Commission tapes is also subject to measurement error due to concordance problems. Still other variables, namely P_SCI, P_MAN, P_UNSK, MELAST,
and XELAST, are measured at different levels of aggregation. An EIV case
can be made against many variables employed in the Tobit model. The
remainder of this paper is devoted to a sensitivity analysis of the ML
estimates to the EIV problem.
3. Errors-in-Variables Diagnostics in the Tobit Model
Errors-in-Variables Model
Consider the classical EIV model in which the observed variable y
is generated by
y = [beta]'[x.sup.*] + [mu], (2)
where [x.sup.*] is a K X 1 vector of true regressors with mean 0
and covariance matrix [Sigma], [mu]. is a classical disturbance with
mean 0 and variance [[sigma].sup.2], and [beta] is a K X 1 vector of
coefficients, whose estimation is the focus of interest. A K X 1 vector
of proxy variables x is observed, which measures [x.sup.*] with error as
x = [x.sup.*] + [epsilon], (3)
where [epsilon] is a K X 1 vector of measurement errors with mean 0
and covariance V = diag([[nu].sub.1], [[nu].sub.2], [ldots],
[[nu].sub.K]), which is assumed to be distributed independenfly of x and
[mu]. [3]
The EIV analysis of a model with a single regressor, K = 1, is
well-known. The set of feasible values of the single coefficient [beta]
lies between the direct-regression estimate from the regression of y on
x, and the reverse-regression estimate computed by regressing x on y and
then expressing y as a function of x. For example, if the regression of
x on y yields the fitted equation x = a + by, the reverse-regression
estimates on the intercept and slope are, respectively, a/b and 1/b.
With many regressors, K [greater than] 1, the generalization of this
result involves the direct-regression estimate plus the K reverse
regressions where the K regressors [x.sub.i], i = 1, [ldots], K are each
regressed on y and their reverse regression estimates then computed.
Klepper and Learner (1984) show that if every coefficient has the
same sign in the direct and all K reverse regressions, then the set of
feasible values of [beta] can be bounded. However, if the signs of any
coefficient differ across the direct and reverse regressions, none of
the coefficients can be bounded. [4] It is then necessary to invoke additional prior information to bound the feasible set. Klepper (1988)
describes how to use reasonable prior information to bound the feasible
values of [beta], and once this is accomplished, how to further tighten
the bounds on individual coefficients. The steps involved in
Kiepper's method are explained below.
The limitations of the Klepper--Leamer methodology should be noted.
First, it is assumed that response errors are additive white noise and
uncorrelated with all other latent and measured variables in the model.
Even though economic data may not satisfy these assumptions, relaxing
them makes the EIV problem even more intractable. Krasker and Pratt
(1986), Bekker, Kapteyn, and Wansbeek (1987), and Erickson (1989) show
that bounds will not generally exist if the no-correlation assumption
between measurement errors and the (true) equation errors is dropped.
More results with weaker assumptions are called for to provide general
solutions to the EIV problem with economic data (e.g., Erickson 1993;
Iwata 1992). Second, applying the Kiepper--Learner results to the Tobit
model requires the use of Levine's (1986) result (see the
Appendix). Levine's result is based on the assumption of joint
normality of the explanatory variables, which we also assume in our
application. A more fundamental problem here is that Levine a nd Klepper
both identify with population moments in deriving bounds. Because there
is uncertainty about the population Tobit model for the mismeasured
variables, there is uncertainty about the bounds. Hence, standard errors
for the Klepper-Leamer bounds are also required. This concern is
addressed here by computing not just the EIV bounds, but also their
standard errors.
Errors-in-Variables Diagnostics
We begin our EIV analysis of model 1 by computing the direct and
reverse regressions. [5] The upper and lower bounds from the set of
direct and reverse regressions are reported in Table 3. For the set of
feasible values of [beta] to be bounded, the coefficient on each
regressor must have the same sign within its interval; that is, no
interval may contain zero. Otherwise the feasible set of values for
[beta] cannot be bounded, and the EIV problem prevents any inference
about [beta]. Clearly, based on the intervals from the direct- and
reverse-regression estimates presented in Table 3, the set of political
economy coefficients cannot be bounded in the presence of errors in
variables.
To bound the feasible set of coefficients, prior information must
be introduced. Klepper (1988) focuses on two types of prior information:
(i) prior bound on [R.sup.*2], the (hypothetical) true R-squared of the
regression of y on x if all the measurement error in the
[x.sub.i]'s were completely removed, and (ii) prior bounds on
[f.sub.i], the fraction of the variation in each regressor that is
attributable to measurement error. These two Sets of bounds are not
independent of each other, and their connection is made evident below.
Klepper's method includes the computation of two sets of
diagnostics. The first is an upper bound M, such that if the true
[R.sup.2] of the regression can be constrained below M (using prior
information), the feasible set is bounded. Given the constraint [R.sup.*2] [less than] M, the set of feasible values of [beta] is
bounded by the direct regression and K constrained reverse regressions.
The EIV intervals have the same signs as the coefficients from the
direct regression. The intuition behind this result is as follows. If
the true [R.sup.2] is allowed to be unrestricted, then we cannot rule
out some combinations of measurement error variances that imply the true
regressors are collinear. If the true [R.sup.2] may be bounded below M,
it renders infeasible all the combinations of the measurement error
variances that imply the true regressors are collinear (see Klepper
1988). Hence, if prior information in the form of an upper bound on the
true [R.sup.2] is applied to the problem, then the resulting EIV
intervals for [beta] do not contain zero for any coefficient.
The second set of diagnostics involves the identification of two
key variables and the computation of two values for them, denoted
[d.sub.1] and [d.sub.2]. These values may be used to bound the
proportions of the total variances of the key variables that are
attributable to measurement error variances. If either of the two
measurement error variance proportions can be bounded below their
respective d value upon the basis of prior information, then a further
relaxation of the M bound on the true [R.sup.2] is possible. That is,
the feasible set of estimates can be bounded by restricting the true
[R.sup.2] below an even larger M value than permitted earlier. Beliefs
about [f.sub.i] and [R.sup.*2] are related. [6] To bound [R.sup.*2]
below M, the [f.sub.i] need to be correctly bounded. Klepper's
(1988) formula computes the maximum value, [d.sub.i], that each
[f.sub.i] can take and still not violate the upper bound on [R.sup.*2].
Corresponding to this new upper bound on the true [R.sup.2] is
another set of two key variables for which d values may be computed.
Again, by applying prior information to bound either of the measurement
error variance proportions below their respective d values, yet further
relaxation of the M bound on the true [R.sup.2] is possible. [7] This
process can then be repeated to tighten the EIV intervals. In some
cases, very reasonable bounds on the [f.sub.i] and [R.sup.*2] can lead
to fairly tight EIV intervals for [beta]. In other cases, reasonable
bounds on [f.sub.i] and [R.sup.*2] may lead to wide bounds on some
important coefficients. Klepper shows how to tighten the intervals for
those individual coefficients selectively by bounding a key [f.sub.i].
The key variables along the path of iterations are casualties of
the EIV problem, for they cannot be bounded and no inference about their
size and signs is possible.
Priors
Because the unconstrained direct and reverse regressions in Table 3
yield EIV intervals containing zero for all coefficients, we apply prior
information about (i) an upper bound on the true [R.sup.2] and (ii) the
maximum [f.sub.i] values, that is the proportion of variation in the
explanatory variables accounted by measurement error.
Our prior on the upper [R.sup.2] bound is based on a set of
cross-industry political economy studies. Leamer's (1990) NTB study
pools across countries and three-digit SITC industries with the
intention of measuring the impact of NTBs on trade. His explanatory
variables are mainly partner-country and industry-group dummies. He
reports [R.sup.2] values of around 0.30. Trefler's (1993) NTB study
employs a simultaneous Tobit model of NTBs and imports for
cross-industry U.S. data at the four-digit SIC level, aggregated (not
pooled) across partners. The log-likelihood ratio (LLR) reported could
be used to translate into a pseudo-[R.sup.2] measure if the LLR for the
null model were also reported. Baldwin's tariff study (1985, pp.
162-3) finds a value of [R.sup.2] close to 0.40. Other cross-industry
studies of tariffs with similar measures of fit are Ray (1981; U.S.
Kennedy round tariff levels), Pincus (1975; U.S. tariff levels in 1820),
and Caves (1976; Canadian Kennedy round tariffs). Based on these
studies, we ar e prepared to accept a true [R.sup.2] of between 0.30 and
0.40 for our study. Our regression contains many new variables that have
not been considered before including PAC spending, partner NTBs, RER
elasticities, and bilateral imports and exports. Further, country-fixed
effects are included. Hence, our prior value of the true [R.sup.2] of
the regression lies in the interval [0.30, 0.40] with a uniform
distribution over values in this interval. Although a more precise
statement may be required in other situations, it is adequate here.
Our priors on [f.sub.i], the measurement error variances as a
fraction of sample variance, are displayed in Table 4. The four
political economy variables NE82, LABJNT82, AVEARN, and NEGR82, taken
from the Census of Manufacturing, are presumed to be precisely measured.
So are the partner-country dummies. The Census does not report standard
errors for these variables, which is taken as evidence of the precision
with which they are estimated. However, the Census does report that the
relative standard error--that is, the standard error divided by the
estimate-- on the four-firm concentration ratio (CONC4) and on the
number of firms (used to construct SCALE) is greater than 0.15 for over
80% of the four-digit SIC industries. For these two variables, their
prior value of f is set at 0.40. We illustrate the rationale for this
with CONC4. Suppose the relative standard error for CONC4 averages to
0.30 for the sample. Then, since its sample mean is approximately 0.40,
the actual (average) standard error is 0.40 X 0.30 = 0.12, or a
measurement error variance equal to 0.0144. Because the sample variance
of CONC4 is 0.0425, this implies a value of [f.sub.CONC4] =
0.0144/0.0425 = 0.34. The prior value for [f.sub.CONC4] is therefore put
conservatively at 0.40. Similarly, the prior value for [f.sub.SCALE]
equals 0.40.
If there is a mismatch at the degree of aggregation for any
variable, their prior f value is set at a minimum of 0.40. Because
PACCVA83, UNION, REPRST, P_SCI, P_MAN, and P_UNSK are all measured at
the three-digit SIC level and are replicated at the four-digit level,
they are assigned a prior f value of 0.40. MELAST and XELAST are
estimated at the higher two-digit level of aggregation and replicated at
the four-digit level. Because their measurement error is greater than
that for the variables measured at the three-digit level, both their
prior f values are set at 0.50. Conversion from disparate systems of
data keeping induces measurement error. Whenever a variable requires
concordance between the trade system of data keeping (TSUSA, SITC, etc.)
and the industry system (SIC), they are considered to be moderately
measured with error and their prior f values are set at 0.25. While we
have employed reliable converters among these disparate systems, the
mappings are not accurate. For example, in going from the SITC system to
SIC, there are many cases with many-to-one and one-to-many mappings. For
these cases, the mappings are simply proportioned equally, which is an
approximation. Hence, the variables [[N.sup.*].sub.ij], M/CON, X/CON,
DPEN7982, and TAR each have prior f values equal to 0.25.
If anything, the set of priors on f are conservative. The priors
therefore lead to wider bounds on coefficient estimates than if the
measurement errors were believed to be smaller. Another important reason
to err on the conservative side is that statements about prior beliefs
are usually approximate judgements, and eliciting exact prior
information is a difficult, if not impossible, task (Learner 1978).
4. Inferences with Errors in Variables: EIV Diagnostics and Bounds
Stage I Iterations
Table 5 shows the values of M and [d.sub.i] along the path of
iterations. Consider the first row (iteration J = 0) of the table. M =
0.3375 signifies that if the true R-squared of the regression
([R.sup.2*]) is bounded below 0.3375, it would permit the coefficients
to be bounded and hence resolve the EIV problem. This value for
[R.sup.2*] is well within the prior interval set of [0.30, 0.40].
However, for [R.sup.2*] to be bounded below 0.3375, all the
[f.sub.i]'s must also be bounded below their respective [d.sub.i]
values. But that requires restricting the [f.sub.i] values of some
variables far below their admissible prior values (set out in Table 4).
The first row of Table 5 shows that the [f.sub.i] bounds for these
variables are in conflict with their prior bounds: [f.sub.PACCVA83]
[leq] 0.04, [f.sub.REPRST] [leq] 0.095, [f.sub.MELAST] [leq] 0.02, and
[f.sub.XELAST] [leq] 0.14.
To proceed, two key variables must be identified (only two such
variables exist), one of them chosen and the value of its [f.sub.i]
bounded below its computed [d.sub.i] value. These two key variables are
identified in Table 5 as CONC4 and P_MAN (bold) in the first row. There
are now two courses of action: (i) choose the variable for which
[f.sub.i] [less than] [d.sub.i] is satisfied in Table 4 or choose either
variable if both their [d.sub.i] values exceed their prior [f.sub.i]
values or (ii) choose neither if both their [d.sub.i] values are
unacceptably low. In the latter case, the iterations terminate. Because
CONC4 and P_MAN both have admissibly high [d.sub.i] values, we randomly
chose CONC4 and imposed the prior constraint [f.sub.CONC4] [leq] 0.40.
Recomputing the new M bound on [R.sup.*2] yielded a higher value of M =
0.3381. This now became the sufficient upper bound on [R.sup.*2]
required to relax the d([cdotp]) bounds on all the remaining variables
that were measured with error. [8] The key variable CONC4 now had upper
and lower bounds of the opposite signs and became the EIV problem's
first casualty. It could no longer support inferences about the
special-interest models of Olson (1965) and Brock and Magee (1978), and
the adding machine model of Caves (1976).
This process of iterating to find acceptable M bounds and
d([cdotp]) bounds is termed "stage I". [9] After six stage I
iterations, the M bound was still satisfactory, but the two key
variables at this step (REPRST and MELAST) both had unacceptably low
[d.sub.i], values, thus concluding the stage I iterations. The six
iterations were based on the following sequence of bounds on the
fraction of variation attributable to measurement error in key
variables: [f.sub.CONC4] [leq] 0.40, [f.sub.SCALE] [leq] 0.40,
[f.sub.P_MAN] [leq] 0.40, [f.sub.UNION] [leq] 0.40, [f.sub.DPEN7982]
[leq] 0.25, [f.sub.P_UNSK] [leq] 0.40. The M bound on [R.sup.*2] at the
end of the stage I iterations was 0.3706, which produced intervals for
all coefficient estimates except those corresponding to a bounded key
[f.sub.i].
Stage I Intervals
The stage I intervals are presented in Table 6 under the column
labeled "path 1". For comparison, we also include the a priori sign on each variable, the maximum likelihood estimates, their standard
errors, and the connection of the variable with an underlying theory.
The stage I intervals are constructed after imposing the constraint
[R.sup.*2] [less than] 0.3706. [10] The blanks denote the key variables
selected during the stage I iterations, for which no inferences are
possible. The bounds on the remaining variables can be used to make
inferences about the models of endogenous protection they each
represent.
The computation of their standard errors is not straightforward.
Because the iterations trace a specific path, the EIV bounds are
conditional on the set of variables that define the path, or the path
set. There is no analytic formula for the bounds even were the path set
known in advance, therefore rendering analytic techniques such as the
delta method practically useless for our purpose. Hence, we use a
simulation method for computing standard errors on the bounds similar to
the method used by Krinsky and Robb (1986). The details of the
simulation method are provided in the Appendix.
The retaliation model of Baldwin (1990) is supported by the
positive interval on [[N.sup.*].sub.ij]. From the point of view of the
model of Grossman and Helpman (1995) and Baron (1997), this result may
be used to infer that the United States has bargaining strength in the
sense that an increase in partner NTBs is met by an increase in U.S.
NTBs. A negative coefficient would imply that a foreign NTB increase
deters U.S. NTBs. The standard errors on the lower and upper bounds of
the stage I interval for [[N.sup.*].sub.ij] show that both bounds are
measured quite precisely and are statistically significantly greater
than zero at the 5% level.
The special-interest model finds support in the positive, though
wide, interval for PACCVAS3 (industry PAC spending). Both the lower and
upper bounds are more than two standard errors greater than zero, which
demonstrate that they are precisely estimated. The estimates support the
prior belief that the greater is special-interest pressure in the form
of congressional campaign contributions by industry lobbies, the higher
the protection they receive. The traditionally used proxies for
special-interest motives, namely CONC4 (four-firm concentration) and
SCALE (firm scale of output) do not allow any inference about the
special interest model because of measurement error in regressors that
are correlated with these variables. This reiterates the point made by
Klepper, Kamlet, and Frank (1993), which is that the use of proxies that
are correlated with other variables that are poorly measured may lead to
spurious estimates on the well-measured proxy variables. [11] SCALE and
CONC4 fall into this category of proxie s whose usefulness is suspect
due to their correlation with other poorly measured variables. The
results reiterate the fact that it may be well worth the construction of
variables that more directly represent the theory, a point made by
Baldwin (1985) in assessing the empirical literature on the
special-interest model. PACCVA83 is constructed to do just that, and it
is reassuring that it is unaffected by the EIV problem. Its positive
interval unambiguously supports the special-interest model: Politically
active industries succeed in buying protection. However, perfect
measurement is not by itself sufficient in avoiding the EIV problem, as
will be seen in the case of the adding machine model.
The one variable most representing the adding machine model of
Caves (1976), namely NE82 (number of employees), is presumed to be
perfectly measured because the data are from an accurately conducted
census. Importantly, it is not a proxy and precisely represents what it
is designed to measure, namely, voting strength. The bounds on perfectly
measured variables can be computed after determining the bounds on the
mismeasured variables using the method in Bollinger (1996). [12] Even
though NE82 is measured without error, its correlation with other
regressors that are measured with error fatally affects its usefulness
for inferential purposes. The EIV interval on the coefficient on NE82
contains zero. Hence, the variable that is the mainstay of the adding
machine model does not allow unambiguous inference due to the presence
of other mismeasured variables.
The negative interval on LABINT82 (labor intensity) runs counter to
the theory's prediction and is a puzzle. [13] We can only speculate
that the contrary sign is due to specification error other than
measurement error. The positive interval for REPRST (geographic spread
of an industry) provides support for the adding machine model. REPRST
proxies the geographic spread of an industry, and its positive interval
indicates that industries that are geographically concentrated are less
successful in obtaining protection than those that are dispersed and
therefore more widely represented in Congress. Two factors, however,
diminish the case REPRST makes for the adding machine model. First, the
EIV interval for REPRST is too wide to infer whether support for the
adding machine model is significant in magnitude. Second, the standard
error on the lower bound of 1.776 makes the lower bound statistically
insignificant from zero. This implies that under some measurement error
schemes, it is possible that the estimate on R EPRST, were all
measurement error removed, could be zero or even negative. Even though
further iterating in the (subsequent) stage II runs can narrow the
interval further, they cannot be expected to solve the imprecision with
which the lower bound is estimated.
The comparative cost-comparative advantage model finds support from
the positive interval for [M.sub.ij]/CONS (bilateral import penetration)
and negative interval for [X.sub.ij]/CONS (bilateral exports scaled by
consumption) just as the theory predicts. [14] Even though the intervals
are wide, the lower bound of 2.870 on import penetration and the upper
bound of -- 11.90 on export to consumption are economically large
numbers. They imply that if import penetration were to increase by 0.05,
then NTB coverage would increase by 0.05 X 2.87 = 0.143, or if the
export-to-consumption ratio were to increase by 0.05, the coverage ratio
would decline by 0.05 X 11.90 = 0.60. Hence, the support for the theory
of protection according to comparative disadvantage is quite strong. The
standard errors indicate that both the upper and lower bounds of
[M.sub.ij]/CONS and [X.sub.ij]/CONS are precisely measured.
The empirical literature surrounding the Leontief paradox has
established that the United States is human-capital abundant. The
negative interval on P_SCI (proportion of employees who are scientists)
confirms this. However, the interval is too wide to judge whether the
data strongly support the theory of comparative costs and advantages.
Further, the upper bound of -0.357 is statistically no different from
zero at the 5% level of statistical significance. There remains the
possibility that under some measurement error configurations, the true
coefficient on P_SCI is statistically no different from zero. Because it
is widely accepted that management is the source of comparative
advantage in U.S. manufacturing, it is unfortunate that the EIV problem
precludes any inference about the human capital variable P_MAN.
The evidence in favor of the models of public interest is mixed.
Even though AVEARN (average earnings) is a perfectly measured variable,
due to its correlation with mismeasured variables, its interval contains
zero. Hence, AVEARN cannot be used to refute the public interest model.
The positive interval on TAR (tariffs) is reassuring for Corden's
status quo theory for it indicates that the tariff cuts in the Tokyo
round were made up by tariff protection. It validates Corden's
conservative social welfare function, based on the notion that
government prefers the status quo to sudden and drastic income changes.
Industries with high tariffs (which suffered the highest Tokyo Round
cuts) continued to be protected with NTBs. However, this support for the
Corden model is weakened due to the large standard error on the lower
bound for TAR, which makes it possible for the true coefficient to be
statistically no different from zero. P_UNSK (proportion unskilled)
cannot be used to make inferences about the public interes t models
because it is chosen as a key variable in the stage I iterations. The
positive interval on NEGR82 (employment growth) is evidence against the
theory of protection based on equity considerations and altruism but
both the upper and lower bounds are measured quite imprecisely, as is
evident from their standard errors. Neither bound is statistically
different from zero.
It is natural to question whether the EIV bounds are robust to the
path of iterations. There are 26 possible paths with 6 iterations
(nodes) and a choice of one out of two key variables at each node.
However, the set of key variables through which the paths can be routed
is a smaller subset of the set of mismeasured variables and hence the
number of free paths are limited. This is fortunate because our final
results are robust to the choice of path. We experimented with three
alternative stage I paths. Of the two key variables identified by
Klepper's algorithm at each node, path 1 chooses one at random,
provided its d value is acceptable according to the priors in Table 4.
Path 2 goes through the key variable that has the higher d value. Path 3
goes through that key variable that has the lower d value, provided of
course that it is not unacceptably low; otherwise, it is routed through
the other key variable. As the intervals in Table 6 show, all three
paths have similar progressions and although the sequence may be
different (see notes to the table), the path set making up the set of
key variables chosen along any path is the same for all three paths.
Since the M bound on the true [R.sub.2] depends only on the path set,
not their sequence, the stage I EIV intervals are the same for all three
paths. [15]
Stage II intervals
The stage I bounds may be too wide for policy purposes or policy
simulations, and it may be necessary to narrow them further.
Klepper's method for tightening the bounds on individual
coefficients is as follows. To tighten an individual bound (say, the
lower bound), the [f.sub.i] value of one key variable (there is only one
such variable corresponding to that bound) must be bounded below its
[d.sub.i] value. In the next iteration, another key variable can be
similarly used to tighten the bound further. These iterations may be
continued until one of two events occur: either the bound is generated
by a direct regression, in which case the coefficient bound cannot be
tightened any further, or the required [d.sub.i] bound on the key
variable is unacceptably low. This set of iterations is termed stage II
iterations. At their conclusion, the final EIV bounds are produced.
The final EIV intervals are reported in the last two columns of
Table 6. In the first of those columns, the lower and upper bounds for
variable i are expressed in the original units as [[[b.sup.LB].sub.i],
[[b.sup.UB].sub.i]]; in the next column, the interval estimates are
expressed as beta coefficients by standardizing by the sample standard
deviations. The beta coefficient on an rhs variable [x.sub.i] in a
linear regression is the ordinary least squares estimate
[[b.sup.OLS].sub.i] times sd([x.sub.i])/sd(y), where sd(.) denotes
sample standard deviation. Hence, the beta coefficients corresponding to
the interval bounds for variable i are [[[b.sup.LB].sub.i] X
sd([x.sub.i])/sd(y), [[b.sup.UB].sub.i] X sd([x.sub.i])/sd(y)].
Consider the stage II bounds expressed in original units. A
comparison with the stage I bounds shows that in many cases the bounds
remain unchanged: This is true for both bounds for REPRST and MELAST,
lower bound for PACCVA83, and lower bound for TAR. In all these cases,
the key variables corresponding to those bounds at the last stage I
iteration (J = 6) all have unacceptably low [d.sub.i] values. Hence, the
stage II iterations terminate at iteration J = 6 without tightening
those bounds. In fact, all the stage II bounds in Table 6 terminated
because the key variable had an unacceptably low [d.sub.i] value. In no
case did the stage II iterations progress beyond J = 7, and usually it
terminated at J = 6. Had they been allowed to progress beyond that, say,
by allowing the key [f.sub.i] value to be bounded below the required
[d.sub.i] value, a further tightening would be possible. For example, if
the [d.sub.i] value for the key variable for REPRST, which was also
REPRST, were allowed to be bounded below 0.15, then the upper bound on
REPRST could be tightened down to 44.7.
Because the stage II iterations did not progress far, if at all,
beyond the stage I iterations, the EIV bounds did not, in general,
narrow significantly beyond the stage I bounds. The inferences are
therefore largely unchanged. The implications of the EIV problem for the
theories of political economy are not very damaging if the focus is
entirely on the sign on the coefficients, but they are damaging if the
size of the intervals is a matter of concern.
The beta coefficients allow a comparison of the width of the final
EIV bounds across variables. They indicate the amount of change in
standard deviation units of NTBs induced by a change of one standard
deviation in the rhs variable. Note that because sd(y) is smaller than
the standard deviation of the latent uncensored dependent variable, the
beta coefficients reported here are overstated. If it is supposed that
an absolute beta value of 0.5 is an economically significant magnitude,
because all the beta coefficient intervals for the mismeasured variables
contain 0.5, the possibility that all these variables are economically
significant cannot be precluded; that is, their true coefficients may
all be of economically significant magnitude. Beyond this, it is
difficult to make inferences about the likelihood of an absolute beta
value exceeding 0.5. At best, informal judgements may be made, such as
PACCVA83 is highly likely to have a true beta value greater than 0.5.
In sum, the EIV problem does not damage the case for any of the
political economy models. Some variables do fall victim to the EIV
problem, but there is always at least one variable which does allow
inference about the underlying theory. The models of retaliation,
pressure groups, voting strength, and comparative advantage all find
support from the signs on the variables that do allow inference. The
models of public interest get ambiguous or no support. The status quo
model is refuted by the contrary sign on AVEARN while the equity model
is refuted by the contrary signs on AVEARN and NEGR82. On the other
hand, the imprecision with which the lower bound for AVEARN is measured
opens the possibility that the true coefficient on AVEARN is
statistically no different from zero. The high standard errors on both
bounds for NEGR82 actually make this highly likely.
More optimistic priors about the extent of measurement error are
required, if narrow bounds are desired. While this has the potential to
strikingly narrow the final bounds, far greater confidence in the priors
would be required. The importance of prior information cannot be
overstated. The analysis began with unbounded EIV intervals that
required prior information to narrow the bounds. The other side of the
coin is, of course, that none of the results are valid unless the prior
information is correct. We have taken care to elaborate how our priors
are formed. Based on those priors, we must accept the resulting EIV
bounds, however wide they are. If the intervals are too wide to be
useful for policy purposes, the problem cannot be overcome via our
priors. Improved prior information calls for the empirical measurement
of measurement error.
5. Conclusion
The literature on endogenous protection contains a variety of
disparate theories that address the motivation for the observed
cross-industry structure of protection. They range from models of
political self-interest to models of government altruism. With the
possible exception of Grossman and Helpman (1994, 1995), endogenous
protection theory has not yielded tight predictions against which the
theory can be tested using precisely measured variables. Hence, a number
of proxy variables, which probably span the range in terms of quality,
has been used in their empirical testing. This is justifiable because it
is costly to construct precisely measured variables. For example, it is
a daunting task to compute the tariff-equivalents of NTBs, which are
probably better measures of the restrictive effect of trade barriers
than the import coverage ratios employed in this paper.
Sturdy inference in this setting would seem to require a
sensitivity analysis of the estimates to the EIV problem. In this paper,
the methodology of Klepper and Leamer (1984) and Klepper (1988) is
employed to perform a sensitivity analysis to classical errors in
variables of estimates from the nonlinear Tobit model. Their methodology
focuses on diagnostics that take the form of imposing restrictions on
the fraction of variation in regressors that are due to measurement
error. Some variables fall victim to the EIV problem and do not allow
any inferences. But reasonable prior restrictions can bound the
coefficients for the remaining variables and allow useful inferences
that are robust to classical errors in variables. We hope that
sensitivity to errors in variables becomes a regular component of
applied econometric studies, not because they are yet another set of
diagnostics, but because economic data are fundamentally prone to
measurement error.
We are grateful to two anonymous referees for insightful comments
that improved this paper considerably. We acknowledge helpful
suggestions by Ed Bedrick. Responsibility for any remaining errors is
ours.
Finally, the EIV problem in the area of the political economy of
protection benefits from recent work by Anderson and Neary (1996), who
provide a theoretical basis for the computation of tariff equivalents of
NTBs. Their measures are less prone to measurement errors than the ad
hoc coverage ratios we have used. While their general equilibrium method
requires far greater information than is available for a study at this
scope and hence introduces additional sources of measurement error,
their short-cut partial equilibrium method can potentially be used in
the next generation of empirical work in this area using newer data.
(*.) Department of Economics, University of New Mexico,
Albuquerque, NM 87131, USA; E-mail gawande@unm.edu; corresponding
author.
(+.) Department of Economics, University of New Mexico,
Albuquerque, NM 87131, USA; E-mail bohara@unm.edu.
Received September 1997; accepted May 1999.
(1.) More precisely, the inverse import penetration ratio divided
by the absolute price elasticity of imports should be positively related
to protection.
(2.) MELAST and XELAST are taken from Ceglowaki (1989). The
variable MELAST is measured as a negative number. Hence, the coefficient
on MELAST is expected to be negative.
(3.) The model can easily be generalized to include the case where
the dependent variable is also measured with error, but to keep the
exposition simple, we presume that y is measured without error. All the
results hold if the true dependent variable is [y.sup.*], which is
measured as y, where y = [y.sub.*] + [delta] and [delta] is normally
distributed but not correlated with either [y.sup.*] or [epsilon]. Here,
we assume that [delta] is absorbed into the error term [mu] in Equation
2.
(4.) The reason for this is that with more than one variable
measured with error, it may he possible for the measurement error
variances to take on values that imply that the true regressors are
collinear.
(5.) The analysis of model 2 arrives at similar conclusions as the
analysis presented in the paper.
(6.) The example in Klepper, Kamlet, and Frank (1993, p. 196) is
instructive. Suppose all of the residual variation in the regression of
y on x is due to the measurement error in one variable [x.sub.i]. Then
removing the measurement error in [x.sub.i] would raise the R squared of
the regression of y on x to 1, implying [R.sup.2*] equals one. But for
this to happen, [f.sub.i], would have to equal its largest possible
value. Thus, if it is believed that the true [f.sub.i], equaled its
maximum possible value, then it would have to be believed that
[R.sup.*2] equaled one. And if it were believed that [R.sup.*2] were
less than one, then it would have to also be believed that the true
[f.sub.i] was less than its maximum value.
(7.) Since a Tobit model is analyzed in the paper, the [R.sup.*2]
measure is a pseudo-[R.sup.2] computed as 1 -
[[[hat{[sigma]}].sup.2]/[[hat{[sigma]}].sup.2] + b'Nb)], where
[[hat{[sigma]}].sup.2] is the MLE of the error variance in the Tobit
model, b is the MLE of the Tobit coefficients, and N is the variance
matrix of the matrix of rhs variables x. Levine's (1986) result
stated earlier motivates its use.
(8.) See footnote 17, p. 198 of Klepper, Kamlet, and Frank (1993)
for an intuitive explanation for why M bound is relaxed through the
[d.sub.i] bounding of one of the key variables. The new upper bound on
[R.sup.*2] of 0.3381 is well within our admissible priors for M.
(9.) For example, the next iteration proceeds in the following
manner. The d([cdotsp]) bounds on the remaining variables required to
support the new M bound on [R.sup.*2] at iteration 1 are unacceptably
low, specifically for [[N.sup.*].sub.ij], PACCVA83, REPRST, TAR, M/CON,
X/CON, P_SCI, and MELAST. The new key variables identified at iteration
1 were SCALE and P_MAN. Because their [d.sub.i] bounds were both
admissib]e, we randomly chose SCALE and constrained [f.sub.SCALE] [leq]
0.40. Hence, SCALE no longer supported unambiguous inference about the
validity of the special interest model. Imposing the constraints [f.sub.CONC4] [leq] 0.40 (from iteration J = 0) and [f.sub.SCALE] [leq]
0.40 (iteration J = 1), yielded a new M bound on [R.sup.*2] with a
marginally higher value of M = 0.3383. This now became the sufficient
upper bound on [R.sup.*2] in order to relax the [d.sub.i] bounds on the
remaining coefficients.
(10.) If [R.sup.*2] is bounded below 0.3706, by Klepper's
method the extreme points of the feasible set are composed of
[2.sup.j](K + 1 - J) regressions, where J is the number of stage I
iterations before the EIV coefficient bounds are computed and K is the
number of regressors measured with error. According to this method, the
EIV intervals for the nonkey variables will be of the same sign as the
direct regression while the EIV intervals for the key variables will all
contain 0. In the model considered here, J = 6 and K = 15 (the four
variables NE82, LABINT82, NEGR82, and AVEARN and the nine dummies are
presumed to be perfectly measured) so the stage I coefficient bounds are
constructed as (elementwise) extreme values from [2.sup.6] x 10 = 640
regressions. Hence, the perfectly measured variables reduces the
computational load and also the width of the intervals on the
mismeasured variables. For the computation of the EIV intervals on the
perfectly measured variables themselves, see footnote 12.
(11.) The extent of asymptotic bias in a coefficient will be
greater: (i) the larger the correlation between it and the mismeasured
variable(s), (ii) the smaller the independent explanatory power of the
variable relative to the mismeasured variable(s), and (iii) the larger
the variation in the mismeasured variable(s) due to measurement error
(see Klepper, Kamlet, and Frank (1993, p. 204)).
(12.) The bounds on the variables not measured with error are
computed using Bollinger's (1996) method. Let the regression model
be written by partitioning the perfectly measured variables x1: n X k1,
and the variables measured with error x2: n X k2. Let [x1.sup.*] be the
true variables related to x1, with e1 = x1 - [x1.sup.*] being the
measurement error. Then y = [x1.sup.*][beta]1 + x2[beta]2 + [mu] is the
model of interest with [beta]: k1 X 1, and [beta]2: k2 X 1, that is
Equation 2 rewritten after partitioning [x.sup.*]. Let the residuals
from the linear projection of y on x2 be denoted w, the matrix of
residuals from the (hypothetical, since [x1.sup.*] is unmeasurable)
regression of [x1.sup.*] on x2 be denoted [W.sup.*]: n X k1, and the
matrix of residuals from the regression of x1 on x2 be denoted W: n X
k1. Then the regression model w = [W.sup.*][beta]1 + p. together with
the measurement error model W = [W.sup.*] + e1 involve only variables
measured with error. However, we know from the standard omitted
variables bias formula that the coefficients b2 from the regression of y
on only x2 are b2 = [beta] + G[beta]1, where G is the k2 X k1 matrix of
coefficients from the regression of [x1.sup.*] on x2. Since the
measurement error is white noise, the regression of x1 on x2 gives
consistent estimates of G. Thus, the bound for [beta]2 the coefficients
on variables not measured with error, are derived, given any bounds on
[beta]1, from the formula [beta]2 = b2 - G[beta]1. In the application,
b2 is the Tobit MLE of the regression of y on x2.
(13.) The labor intensity variable may also be argued to represent
the pressure group model. The specific-factors model (see, e.g., Mussa
1974) predicts that, because the returns to the specific factor that
benefits from protection increase with the industry's labor
intensity, lobbying by specific factors is an increasing function of
labor intensity. Hence, protection should rise with labor intensity. The
evidence in Table 5 runs counter to this prediction as well.
(14.) The positive estimate on [M.sub.ij]/CONS runs counter to the
Grossman-Helpman (1994) prediction. The main reason is that their
prediction requires the scaling of [M.sub.ij]CONS by import
elasticities, which is not undertaken here (see footnote 1). Goldberg
and Maggi (1997) find support for the Grossman-Helpman hypothesis using
Trefier's (1993) NTB data. There are a number of reasons why their
results may be correct for their data. First, they aggregate across
partners rather than pool across partners. Hence, they have around 400
observations, and we have nearly 4000. Second, they estimate a system
with a rather different specification than our single-equation Tobit
model. The literature on errors-in-variables bounds for systems of
equations is still in its infancy. Regardless, it will be useful to use
the Goldberg-Maggi data and their specification of the (single)
protection equation, and subject it to an EIV sensitivity analysis as we
have done here.
(15.) The data simulations required to compute the standard errors
on the stage I bounds permitted another view of the robustness of the
EIV bounds to possible paths. We performed approximately 900 data
simulations, of which 300 resulted in the same path set as in Table 6
and 600 with different path sets. If we compute the means across all the
900 simulated bounds and compare them with the corresponding means from
the 300 bounds with our path set, they are qualitatively the same and
quantitatively statistically no different in most cases. This striking
result demonstrates the robustness of the results across paths. The main
reason is that there is a core set of path variables that are constant
in most of the 900 simulated path sets. Only a few path variables
change.
(16.) The model can easily be generalized to include the case in
which the dependent variable is also measured with error, hut to keep
the exposition simple, we presume that y is measured without error. All
the results hold if the true dependent variable is [y.sup.*], which is
measured as y, where y = [y.sup.*] + [delta] and [delta] is normally
distributed but not correlated with either [y.sup.*] or [varepsilon].
Here, we assume that [delta] is absorbed into the error term [mu] in
Equation 2.
(17.) The reason for this is that with more than one variable
measured with error, it may be possible for the measurement error
variances to take on values that imply that the true regressors are
collinear.
References
Anderson, J. E., and J. P. Neary. 1996. A new approach to
evaluating trade policy. Review of Economic Studies 63:107-25.
Baldwin, Richard E. 1990. Optimal tariff retaliation rules. In The
political economy of international trade: Essays in honor of Robert E.
Baldwin, edited by R. W. Jones and A. Krueger. Cambridge, MA: Basil
Blackwell, pp. 108-21.
Baldwin, Robert E. 1985. The political economy of U.S. import
policy. Cambridge, MA: MIT Press.
Baron, D. P. 1997. Integrated strategy and international trade
disputes: The Kodak-Fujifilm ease. Journal of Economics and Management
Strategy 6:291-346.
Bekker, P., A. Kapteyn, and T. Wansbeek. 1987. Consistent sets of
estimates for regressions with correlated or uncorrelated measurement
errors in arbitrary subsets of all variables. Econometrica 55:1223-30.
Bollinger, C. R. 1996. Bounding mean regressions when a binary regressor is mismeasured. Journal of Econometrics 73:387-99.
Brock, W P., and S. P. Magee. 1978. The economics of special
interest politics: The case of tariffs. American Economic Review
68:246-50.
Caves, R. E. 1976. Economic models of political choice:
Canada's tariff structure. Canadian Journal of Economics 9:278-300.
Ceglowski, J. 1989. Dollar depreciation and U.S. industry
performance. Journal of International Money and Finance 8:233-51.
Corden, W. M. 1974. Trade policy and welfare. Oxford, UK: Oxford
University Press.
Erickson, T. 1989. Proper posteriors from improper priors for an
unidentified errors-in-variables model. Econometrica 57:1299-316.
Erickson, T. 1993. Restricting regression slopes in the
errors-in-variables model by bounding the error correlation.
Econonsetrica 61:959-69.
Findlay, R., and S. Wellisz. 1982. Endogenous tariffs and the
political economy of trade restrictions and welfare. In Import
competition and response, edited by J. Bhagwati. Chicago: University of
Chicago Press.
Goldberg, P., and G. Maggi. 1997. Protection for sale: An empirical
investigation. Princeton University. Mimeographed.
Grossman, G. M., and E. Helpman, 1994. Protection for sale.
American Economic Review 4:833-50.
Grossman, G. M., and E. Helpman, 1995. Trade wars and trade talks.
Journal of Political Economy 103:675-708.
Iwata, S. 1992. Instrumental variables estimation in
errors-in-variables model's when instruments are correlated.
Journal of Econometrics 53 1-3:297-322.
Klepper, S. 1988. Regressor diagnostics for the classical
errors-in-variables model Journal of Econometrics 37:225-50.
Klepper, S., M. S. Kamlet, and R. G. Frank. 1993. Regressor
diagnostics for the errors-in-variables model-An application to the
health effects of pollution. Journal of Environmental Economics and
Management 24:190-211.
Klepper, S., and E. E. Learner. 1984. Consistent sets of estimates
for regressions with errors in all variables. Econometrica 52:163-83.
Kokkelenberg, B. C., and D. R. Sockell. 1985. Union membership in
the United States, 1973-1981. Industrial and Labor Relations Review 38:497-543.
Krasker, W. S., and J. W. Pratt. 1986. Bounding the effects of
proxy variables on regression coefficients. Economtrica 54:641-55.
Krinsky, I., and A. L. Robb. 1986. On approximating the statistical
properties of elasticities. Review of Economics and Statistics 68:715-9.
Lavergne, R. 1983. The Political economy of U.S. tariffs. Toronto:
Academic Press.
Leamer, E. E. 1978. Specification searches. New York: Wiley.
Leamer, E. E. 1983. Let's take the con Out of econometrics.
American Economic Review 73:31-43.
Leamer, E. E. 1985. Sensitivity analysis would help. American
Economic Review 753:308-13.
Leamer, E. E. 1990. The structure and effects of tariff and
nontariff barriers in 1983. In The political economy of international
trade: Essays in honor of Robert E. Baldwin, edited by R. W. Jones and
A. Krueger. Cambridge, MA: Basil Blackwell, pp. 224-60.
Levine, D. K. 1986. Reverse regressions for latent-variable models.
Journal of Econometrics 32:291-2.
Mussa, M. 1974. Tariffs and the distribution of income: The
importance of factor specificity, substitutability, intensity in the
short and long run. Journal of Political Economy 1191-1203.
Olson, M. 1965. The logic of collective action. Cambridge, MA:
Harvard University Press.
Pincus, J. J. 1975. Pressure groups and the pattern of tariffs.
Journal of Political Economy 83:775-8.
Ray, E. J. 1981. The determinants of tariff and nontariff trade
restrictions in the United States. Journal of Political Economy
89:105-21.
Trefler, D. 1993. Trade liberalization and the theory of endogenous
protection: An econometric study of U.S. import policy. Journal of
Political Economy 101:138-60.
Vousden, N. 1990. The economics of trade protection. Cambridge, UK:
Cambridge University Press.
Weinberger, M. I., and D. U. Greavey. 1984. The PAC directory: A
complete guide to political action committees. Cambridge, MA: Ballinger.
Variable Definitions, Political Economy
Theories, and Expected Signs
Theory Variable Sign
Dependent variable [N.sub.ij]
Retaliation, strategic policy [[N.sup.*].sub.ij] +
Pressure groups PACCVA53 +
SCALE +
CONC4 +, -
Adding machine NE82 +
UNION +
LABINT82 +
REPRST +
Comparative costs-comparative ad- [M.sub.ij]/CONS +
vantage
[X.sub.ij]/CONS +
DPEN7982 +
P_SCI -
P_MAN -
Public interest (status quo, equity) AVEARN -
TAR +, -
P_UNSK +
NEGR82 -
Other control variables MELAST -
XELAST +
[D_C.sub.j], j = 1,
[ldots], 9
[D_I.sub.j], j = 1,
[ldots], 4
Theory Variable Description
Dependent variable U.S. all NTB (nontariff barrier)
coverage of imports of good i
from partner j (ratio)
Retaliation, strategic policy Partner j's all NTB coverage of
its imports of good i from the
United States (ratio)
Pressure groups Corporate PAC spending by the
industry, 1977-1984 scaled by
value added ($100 Mn/$Bn)
Measure of industry scale: Value
added per firm, 1982. ($Bn/firm)
Four-firm concentration ratio, 1982
Adding machine Number of employees, 1982 (Mn. persons)
Fraction of employees unionized, 1981
Labor intensity: Share of labor
in value added, 1982
Number of states in which production
is located, 1982 (scaled by 100)
Comparative costs-comparative Penetration of U.S. consumption of
advantage good i by imports from partner j
U.S. exports of good ito partner j,
scaled by consumption
IMP/CONS(1979) - IMP/CONS(1982),
IMP total industry imports
Fraction of employees classified as
scientists and engineers, 1982
Fraction of employees classified as
managerial, 1982
Public interest (status quo, equity) Average earnings per employee,
1982 ($Mn/year)
Ad valorem tariff rate
Fraction of employees classified
as unskilled, 1982
Growth in employment, 1981-1982
Other control variables Real exchange rate elasticity of imports
Real exchange rate elasticity of exports
Nine country dummies
Four industry group dummies: Food,
resource-based, general manufacturing,
and capital-intensive
Cross-industry four-digit SIC level data pooled across the nine partners
j: Belgium, Finalnd, France, Germany, Italy, Japan, the Netherlands,
Norway, and the United Kingdom. Number of observations = 3915.
Pooled Runs: Tobit MLEs; [a] Dependent Variable:
All U.S. NTBs ([N.sub.ij])
Theory Rhs Variable Model 1
Retaliation, strategic policy [[N.sup.*].sub.ij] 0.337 [**]
(0.047)
Pressure groups PACCVA83 1.332 [**]
(0.267)
SCALE 0.079
(0.191)
CONC4 -0.037
(0.084)
Adding machine NE82 1.047 [**]
(0.346)
UNION -0.136
(0.087)
LABINT82 -1.286 [**]
(0.157)
REPRST 6.110 [*]
(3.212)
Comparative costs, comparative [M.sub.ij]/CONS 4.427 [**]
advantage (1.508)
[X.sub.ij]/CONS -19.44 [**]
(4.732)
DPEN7982 0.039
(0.058)
P_SCI -1.056 [**]
(0.376)
P_MAN 0.249
(0.436)
Public interest (status quo, equity) AVEARN 20.42 [**]
(4.040)
TAR 1.909 [**]
(0.273)
P_UNSK 0.417
(0.357)
NEGR82 0.141
(0.105)
Control variables MELAST 0.190 [**]
(0.028)
XELAST 0.055 [**]
(0.022)
[D_C.sub.j], j = 1, See note [b]
[ldots], 9
[D_I.sub.j], j = 1, --
[ldots], 4
Theory Model 2
Retaliation, strategic policy 000.234 [**]
(0.050)
Pressure groups 1.077 [**]
(0.267)
-0.061
(0.196)
-0.041
(0.085)
Adding machine 0.916 [**]
(0.340)
-0.103
(0.086)
-0.685 [**]
(0.168)
6.089 [*]
(3.15)
Comparative costs, comparative 5.072 [**]
advantage (1.486)
-13.598 [**]
(4.653)
0.033
(0.584)
-0.691
(0.411)
-0.327
(0.440)
Public interest (status quo, equity) 22.95 [**]
(4.045)
1.923 [**]
(0.271)
-1.165
(0.425)
0.156
(0.103)
Control variables 0.245 [**]
(0.030)
0.032
(0.022)
See note [b]
See note [c]
N = 3915, k = 28, degree of truncation = 84.3%. Four-digit SIC
cross-industry data pooled across nine countries: Belgium, Finland,
France, Germany, Italy, Japan, the Netherlands, Norway, and the United
Kingdom. Goodness of fit: Model 1: likelihood-ratio statistic 685.6,
Maddala's [R.sup.2] = 0.160, McFadden's [R.sup.2] = 0.213,
Cragg-Uhler's [R.sup.2] = 0.286. Model 2: likelihood-ratio
statistic = 741.2, Maddala's [R.sup.2] = 0.173, McFadden's
[R.sup.2] = 0.230, Cragg-Uhler's [R.sup.2] = 0.309.
Standard errors in parentheses.
(*.)and (**.)indicate, respectively, that \t\ [greater than] 1.98
and \t\ [greater than] 1.66.
(a.)MLEs, maximum likelihood estimates.
(b.)All country dummies have negative MLEs with t-values in excess
of 2.
(c.)Of the four industry group dummies--food, resources,
manufacturing, and capital intensive--the food and the manufacturing
dummies are positive and statistically significant.
Bounds from Direct and Reverse Regressions Dependent
Variable: All U.S. NTBs ([N.sub.ij])
Rhs Variable
[N.sub.ij] [-19.30, 13.37]
PACCVA83 [-73.5, 203.8]
SCALE [-22.1, 1225.2]
CONC4 [-279.5, 34.3]
NE82 [-292.1, 242.8]
UNION [-90.1, 45.8]
LABINT82 [-69.91, 29.87]
REPRST [-1563.5, 3033.1]
[M.sub.ij]/CONS [-30.2, 1213.3]
[X.sub.ij]/CONS [-1673.0, 1276.6]
DPEN7982 [-11.1, 5.32]
P_SCI [-161.1, 100.3]
P_MAN [-229.3, 1016.0]
AVERAN [-2349, 5607]
TAR [-243.8, 110.1]
P_UNSK [-71.1, 439.4]
NEGR82 [-15.04, 112.6]
MELAST [-22.7, 7.06]
XELAST [-3.89, 11.44]
D_[C.sub.j], j = 1, [ldots], 9 See note [a]
N = 3915, k = 28, degree of truncation = 84.3%. Four-digit SIC
cross-industry data pooled across nine countries: Belgium, Finland,
France, Germany, Italy, Japan, the Netherlands, Norway, and the United
Kingdom. Goodness of fit: Model 1: likelihood-ratio statistic 685.6,
Maddala's [R.sup.2] = 0.160, McFadden's [R.sup.2] = 0.213,
Cragg-Uhler's [R.sup.2] = 0.286. Model 2: likelihood-ratio
statistic = 741.2, Maddala's [R.sup.2] = 0.173, McFadden's
[R.sup.2] = 0.230, Cragg-Uhler's [R.sup.2] = 0.309. Standard errors
in parentheses. (*.)and (**.)indicate, respectively, that \t\ [greater
than] 1.98 and \t\ [greater than] 1.66.
NE82, LABINT82, AVEARN, and NEGR82 are presumed tobe accurately
measured. See Appendix for construction of their bounds. The 20
two-digit SIC dummiesare presumed to be accurately measured.
(a.)All country dummies have negative maximum likelihood estimates
with t-values in excess of 2.
f-value: The Fraction of Variation in Each Variable
Attributable to Measurement Error
Degree of
Mismeasure-
Variable ment f
[[N.sup.*].sub.ij] Moderate 0.25
PACCVA83 Serious 0.40
SCALE Serious 0.40
CONC4 Serious 0.40
NE82 None 0
UNION Serious 0.40
LABINT82 None 0
REPRST Serious 0.40
[M.sub.ij]/CONS Moderate 0.25
[X.sub.ij]/CONS Moderate 0.25
DPEN7982 Moderate 0.25
P_SCI Serious 0.40
P_MAN Serious 0.40
AVEARN None 0
TAR Moderate 0.25
P_UNSK Serious 0.40
NEGR82 None 0
MELAST Serious 0.50
XELAST Serious 0.50
[D.sub.j], j = 1, [ldots], 9 None 0
Variable Cause/Source
[[N.sup.*].sub.ij] Trade-industry mismatch;
SITC to ISIC to SIC
PACCVA83 Higher aggregation level (three-digit SIC)
SCALE Use industry-level data to proxy
firm-level data
CONC4 Census but from firm-level data.
Relative standard error mostly
reported to be [greater than]0.15
NE82 Census of manufactures
UNION Higher aggregation level (three-digit
SIC); Estimates from Kokkelenberg and
Sockell (1985)
LABINT82 Census of manufactures
REPRST State-level data at three-digit SIC
[M.sub.ij]/CONS Trade-industry mismatch: SITC-SIC
[X.sub.ij]/CONS Trade-industry mismatch: SITC-SIC
DPEN7982 Trade-industry mismatch: SITC-SIC
P_SCI Higher aggregation level (three-digit SIC)
P_MAN Higher aggregation level (three-digit SIC)
AVEARN Census of manufactures
TAR Trade-industry mismatch: TSUSA-SIC
P_UNSK Higher aggregation level (three-digit SIC)
NEGR82 Census of manufactures
MELAST Higher aggregation level (two-digit SIC);
Estimates from Caglowski (1989)
XELAST Higher aggregation level (two-digit SIC);
Estimates from Caglowski (1989)
[D.sub.j], j = 1, [ldots], 9 --
Errors-in-Variables Diagnostics for Stage I Iterations:
M([cdotp]) and d([cdotp]) Values
(All NTBs ([N.sub.ij]) as Dependent Variable)
Key
Variable Itera- d([cdotp])
Chosen M([cdotp]) tion [[N.sup.*].sub.ij] PAC SCA CON UNI REP TAR
-- 0.3375 0 0.03 0.04 0.77 0.55 0.30 0.09 0.02
CONC4 0.3381 1 0.04 0.05 0.74 -- 0.37 0.13 0.03
SCALE 0.3383 2 0.05 0.06 -- -- 0.39 0.13 0.04
P_MAN 0.3402 3 0.12 0.14 -- -- 0.55 0.18 0.09
UNION 0.3428 4 0.16 0.18 -- -- -- 0.20 0.13
DPEN7982 0.3435 5 0.20 0.22 -- -- -- 0.22 0.16
P_UNSK 0.3706 6 0.47 0.43 -- -- -- 0.24 0.40
Key
Variable
Chosen P_UN MCON XCON DPEN P_SC P_MA MEL XEL
-- 0.36 0.15 0.05 0.78 0.09 0.56 0.02 0.14
CONC4 0.44 0.19 0.07 0.81 0.13 0.60 0.04 0.20
SCALE 0.46 0.21 0.08 0.82 0.14 0.60 0.04 0.22
P_MAN 0.51 0.36 0.18 0.90 0.28 -- 0.10 0.37
UNION 0.54 0.42 0.21 0.93 0.33 -- 0.12 0.44
DPEN7982 0.62 0.46 0.28 -- 0.39 -- 0.17 0.53
P_UNSK -- 0.55 0.59 -- 0.50 -- 0.42 0.69
M([cdotp]) is the upper bound on [R.sup.*2] (the R-squared of the
regression if measurement error were completely removed from all
variables) in order to bound the coefficients. The R-squared (and
therefore M) in the Tobit model is computed as the pseudo R-squared
given by 1 - [[hat{[sigma]}].sup.2]/( [[hat{[sigma]}].sup.2] + b'
Nb).
d([cdotp]) is the upper bound on f([cdotp]), the fraction of
variation in each variable attributable to measurement error. f([cdotp])
[less than] d([cdotp]) elementwise is a necessary condition for
[R.sup.*2] [less than] M.
The bold [d.sub.i] value denotes the key variables (at this initial
iteration). If either of their [f.sub.i] is bounded below its
corresponding [d.sub.i] value, then M increases and relaxes the upper
bound on [R.sup.*2]
Perfectly measured variables NE82, LABINT82, AVERN, NEGR82 do not
appear here. Variable names are abbreviated but are in the same order as
in Tables 1-4.
In stage I, iterations continue until the M bound becomes
acceptable (conditional on all [d.sub.i] bounds being acceptable) or if
the M bound is acceptable at an early step, say, at the first iteration,
iterations continue until all [d.sub.i] bounds are acceptable. Here,
after six iterations the d bounds on REPRST and MELAST were both
unacceptably low, and hence, the stage I iterations were terminated.
Appendix
I. Data
Data on NTBs (nontariff barriers) were taken from the United
Nations Commission for Trade and Development data base on trade control
measures. Data were aggregated to the four-digit SIC level, which
required concordance among disparate systems of data keeping. The sample
accounts for over 98% of manufacturing sales. In the following, COMTAP
refers to the Compatible Trade and Production Database, 1968-1986, CM
refers to the 1982 Census of Manufactures, ASM for the 1983 Annual
Survey of Manufactures, and CPS refers to the 1983 Current Population
Survey. Bilateral trade and production (the latter required to obtain
domestic consumption) were constructed using 1983 figures from COMTAP.
These data were at the ISIC level and were concorded into the SITC (r1)
level and then into the four-digit SIC level. However, aggregate (across
partners) trade data for the United States were aggregated up from the
tariff-line (TSUSA) data (as is TAR). Political Action Committee (PAC)
campaign contribution data are from Federal E lection Commission (FEC)
tapes for the four election cycles between 1977-1984. Since PACs are
associated with individual firms, PACCVA83 was constructed as follows.
Using COMPUSTAT tapes, firms were classified into three- or four-digit
SIC industries. Where firm coverage was incomplete in COMPUSTAT, PACs
were classified into two-digit SIC industries using Weinberger and
Greavey (1984) and replicated at the four-digit level. Because
classification of PACs to SIC industries is one to many, we use per-firm
contributions as our measure of PAC spending. This is scaled by value
added to get PACCVA83. Geographic concentration (GEOG) is defined as
[[[sigma].sup.50].sub.j]= 1 \(V[A.sub.ij]/[[[sigma].sup.50].sub.j]= 1
V[A.sub.ij]) -- ([POP.sub.j]/[[[sigma].sup.50].sub.j]= 1 [POP.sub.j])\,
where [VA.sub.ij] is value added in industry i and state j and
[POP.sub.j] is population in State j. Value-added data are from ASM.
REPRST is constructed from the county data in the Geographic Area Series
of the COM. Earnings and empl oyment (AVEARN, SH_L) are also from ASM as
are capital stock figures. Number of firms (used in SCALE) and CONC4 are
taken from
CM. SCHOOL, P_SCI, P_MAN, P_UNSK are from CPS. UNION is from
Kokkelenberg and Sockell (1985). MELAST and XELAST are replicated at the
four-digit level from the two-digit estimates of Ceglowski (1989).
II. Technical Appendix: EIV Bounds Using Klepper's (1988)
Diagnostics
Following Klepper (1988), consider the classical
errors-in-variables model in which the observed variable y is generated
by
y = [beta]'[x.sup.*] + [mu] (A1)
where [x.sup.*] is a K X 1 vector of true unobservable regressors
with mean 0 and covariance matrix [Sigma], [mu], is a classical
disturbance with mean 0 and variance [[sigma].sup.2], and [beta] is a K
X 1 vector of coefficients on which interest centers. A K X 1 vector of
proxy variables X is observed, which is related to [x.sup.*] by
x = [x.sup.*] + [varepsilon] (A2)
where [varepsilon] is a K X 1 vector of measurement errors with
mean 0, covariance matrix V = diag([[nu].sub.1], [[nu].sub.2],[ldots],
[[nu].sub.k]), and which is assumed to be distributed independently of X
and [mu]. [16] Without any further distributional assumptions about the
unobservables, the parameters of the model are not identified. Klepper
and Leamer (1984) show that it is possible to bound the parameters using
the fact that the second moment of the observables, [[sigma].sup.2],
[Sigma], and V must be positive semi-definite (p.s.d.). Let [s.sup.2]
denote the sample variance of y, r the vector of sample covariances
between y and x, and N the sample covariance matrix of x. Klepper and
Leamer derive the following equations that solve for [[sigma].sup.2],
[Sigma], and V in terms of the sample moments of (y, x')':
[Sigma] = N - V, (A3)
[beta] = [(N - V).sup.-1]r (A4)
[[sigma].sup.2] = [s.sup.2] - r'[(N - V).sup.-1]r, (A5)
where the solution is unique if (N - V) is nonsingular. If X were
perfectly measured, V = 0 and [beta] is the usual least squares (LS)
estimator. Where V is not a zero matrix, it is used to adjust N so that
the solution for [beta] is an adjusted LS estimator. Even though V is
not usually known, leading to the identification problem, a set of
values for [beta] that consistently bounds the true value of [beta] can
be constructed by appeal to the fact that V and the solutions for
[Sigma] and [[sigma].sup.2] in Equations A3 and A5 must be p.s.d. A
result by Levine (1986) allows an extension of Klepper and Leamer (1984)
and Klepper (1988) to a Tobit model. The main result is that if the
matrix
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
is used in place of the sample moments of (y, x')', the
results from the linear model apply.
Consider the direct regression estimate plus the K reverse
regressions where the K regressors x, i = 1,[ldots], K are each
regressed on y and the reverse regression estimates then computed (by
solving for y in terms of the x's). Klepper and Leamer show that if
for every coefficient, the sign of the direct and all K reverse
regressions is the same, then the set of feasible values of [beta] can
be bounded. However, if the signs of any coefficient differ across the
direct and reverse regressions, none of the coefficients can be bounded.
[17] It is then necessary to invoke additional prior information to
bound the feasible set. Klepper (1988) describes the use of reasonable
prior information to bound the feasible values of [beta], and even
narrow these bounds for individual coefficients. Two types of
information (they are not independent of one another) used by Klepper
(1988) in constructing EIV diagnostics are as follows: (i) bounds on
[f.sub.0], the fraction of the variation in each regressor that is
attributa ble to measurement error, and (ii) bound on [R.sup.*2], the
(hypothetical) R squared of the regression of y on x if all the
measurement error in the [x.sub.i]'s were completely removed.
A value, denoted M, can be computed such that if the true [R.sup.2]
of the regression can be bounded below this value, the feasible set is
bounded. Bounding the true [R.sup.2] below M renders infeasible all the
combinations of the measurement error variances that imply the true
regressors are collinear (Klepper 1988). Klepper shows that the
combination of measurement error variances that imply a true [R.sup.2]
equal to M is one for which exactly two of the measurement error
variances are nonzero. If either of these two measurement error
variances can be bounded below its respective value upon the basis of
prior information, then this combination is rendered infeasible and it
is no longer necessary to bound the true [R.sup.2] below M to bound the
feasible set. Instead, the feasible set of estimates can be bounded by
bounding the true [R.sup.2] below a larger M value, which is computed
from the diagnostics in Klepper (1988). Now, corresponding to this new
upper bound on the true [R.sup.2] is yet another two-me
asurement-error-variance combination. If this combination can he
rendered infeasible as above, then once again the upper bound on the
true [R.sup.2] required to bound the feasible set is further relaxed.
This process can then be repeated. Because a Tobit model is analyzed in
this paper, the [R.sup.2] measure is a pseudo-[R.sup.2] computed as 1 -
[[[hat{[sigma]}].sup.2]/[[hat{[sigma]}].sup.2] + b'Nb), where
[[hat{[sigma]}].sup.2] is the maximum likelihood estimate (MLE) of the
error variance in the Tobit model, b is the MLE of the Tobit
coefficients, and N is the variance matrix of the matrix of rhs
variables x. Levine's (1986) result stated earlier motivates its
use.
III. Technical Appendix: Simulated Standard Errors on EIV Bounds
The standard errors on the EIV bounds reported in Table 6 are
computed by simulation (see, e.g., Krinsky and Robb 1986) as follows.
The Tobit ML estimates and their covariance matrix are used to generate
m samples of [beta] assuming a multivariate normal distribution as
[[tilde{[beta]}].sub.i] = [hat{[beta]}] + [C'rndn.sub.i](k,
1), i = 1,[ldots],m, (A7)
where C is the Cholesky decomposition of the covariance matrix of
the Tobit MLE[hat{[beta]}], and [rndn.sub.i] is a randomly drawn
standard normal random vector with dimension k Hence,
[[tilde{[beta]}].sub.i], is a sample value of [beta] generated according
to the Tobit estimates on the coefficients and their covariance matrix.
Each sample i then defines a new sample moment matrix by replacing
[hat{[beta]}] with [[tilde{[beta]}].sub.i] in Equation A6. This
covariance matrix is the main input into producing the EIV intervals.
Thus, m sets of EIV intervals are computed. The standard errors reported
in Table 6 estimated as the standard deviations across these m upper and
lower bounds, respectively.
To generate the m samples, we discard all generated samples where
the path set does not coincide with our path set. In Table 6, the path
set for the stage I bounds is {CONC4, SCALE, P_MAN, UNION, DPEN7982,
P_UNSK}. Note that the bounds depend only on the path set, not their
sequence. The standard errors are computed conditionally on this path
set. The standard errors reported in Table 6 are based on m = 300. Now,
to generate m samples with this specific path set, required (3 x m)
unconditional samples. That is, of the unconditional sample, around 33%
contained this specific path set. Regardless, the mean across the full
unconditional sample were not qualitatively or quantitatively different
from what we have reported in the tables. This is significant because it
demonstrates that the EIV bounds are robust to the choice of paths, on
average. This is mainly due to the fact that across all samples only one
or two variables in the path set are usually different from our path
set. Hence, the path set itself is no t very volatile.
The standard errors on the stage II bounds are based on one
iteration beyond the stage I iterations, that is, for stage I, J = 6,
and for stage II, J = 7. The standard errors on the stage II bounds are
based on the unconditional sample, not just those samples that follow
the exact path for each stage II bound. This is done mainly for
computational convenience so that a separate set of simulations for each
individual coefficient is not required. If anything, this overstates
their standard errors, but we do not believe it qualitatively alters any
of the conclusions.
Finally, for the perfectly measured variables, NE82, LABINT82,
AVEARN, and NEGR82 (plus the country dummies), the computation of
standard errors use Bollinger's (1996) formula. Write the
regression model as a partitioned model, y = x1[beta]1 + x2[beta]2 +
[mu] with [beta]: k1 x 1, and [beta]2: k2 x 1, where x1 is the set of
mismeasured variables and x2 is the set of perfectly measured variables.
Let G be the k2 x k1 matrix of coefficients from the regression of x1 on
x2. The EIV bound for [beta] are derived, given any bounds on [beta]1,
from the formula [beta]2 = b2 - G[beta]1, where we use the Tobit MLE of
the regression of y on x2 in place of b. The computation of standard
errors now follows easily. G is fixed in all simulations. The
simulations proceed along the lines described above, generating m sets
of upper and lower bounds on [beta]1 (as well as m values for the vector
b2). From this, we compute in upper and lower bounds on [beta]2 from the
formula. The standard deviations across the m bounds are est imates of
the corresponding standard errors. We use m = 300.
Note that in Table 6 we report the bounds as determined by the
original data and the standard errors from the simulations described
here. Alternatively, we could have reported the means from the
simulations in place of the bounds. Although they are not reported here
for brevity, the means are qualitatively similar to what we report.