Interpreting the regression coefficient in semilogarithmic functions: a note.
Krautmann, Anthony C. ; Ciecka, James
Abstract
Prior studies have warned us against interpreting a coefficient in
semilogarithmic models as being the proportional change in the dependent
variable associated with a unit change in the independent variable. In
this paper, we argue that this critique regarding the regression
coefficients is itself misplaced, and show that such coefficients can
reasonably be interpreted as the proportional change in the dependent
variable.
Concern about the proper interpretation of a regression coefficient in a semilogarithmic functional form was originally expressed by
Halvorsen and Palmquist (1980), who warned against interpreting the
coefficient of a dummy variable as being the proportional change in the
dependent variable. Halvorsen and Palmquist (hereafter, H&P) argued
that the proportionate change in the dependent variable, g, is related
to the regression coefficient, a, on a dummy variable by the equation g
= exp ([alpha]) - 1.
Given the popularity of the semilogarithmic functional form,
especially in the estimation of Mincer-type wage equations, it is
important to fully understand the meaning of regression coefficients. In
surveying the literature on studies that estimated semilogarithmic
models, it is obvious that H&P's interpretation of the
regression coefficient continues to be widely held (Kennedy, 1981;
Thornton and Innes, 1989; Lassibille, 1998; Asgary, et al, 1997; Benson,
et al, 1998; Curington, et al, 1997; Dor and Farley, 1996; Malpezzi, et
al, 1998; Cancio, et al, 1996; Macdonald and Cavalluzzo, 1996; Boulding
and Purohit, 1996; Levy and Miller, 1996; Curington, 1994; Even and
Macpherson, 1993; Rummary, 1992; Baimbridge, 1998; Baimbridge and
Whyman, 1997) (1).
In this note, we argue that H&P's interpretation of the
regression coefficient is misleading. We contend, on the contrary, that
it is quite reasonable to interpret u as the proportional change in the
dependent variable. This conclusion arises from the recognition that g =
([Y.sub.1] - [Y.sub.0])/Y necessarily entails a reference point, given
by the value of the dependent variable in the denominator. In H&P,
the authors use Y = [Y.sub.0] in the denominator of g (i.e., where
[Y.sub.0] is the value of Y when the dummy variable equals zero). We
argue that there is no logical reason for using [Y.sub.0] over [Y.sub.1]
(where [Y.sub.1] is the value of Y when the characteristic is present).
How one defines g is important, for it ends up determining whether the
regression coefficient overestimates or underestimates g. When using
[Y.sub.0] as the reference point, the regression coefficient does indeed
underestimates g. When using [Y.sub.0] as the reference point, the
regression coefficient does indeed understimate g (especially as the
coefficient deviates further from zero). But using [Y.sub.1] as the
reference point flips this conclusion around-the regression coefficient
overestimates g.
To see this, consider the following semilogarithmic function:
In Y= [delta] + [beta]X + [alpha]D ...(1)
where X is a continuous variable, and D is a dummy variable
representing some qualitative characteristic. One notion of the
proportional change, and that which was used by H&P, is defined
relative to [Y.sub.0], given by:
[g.sub.0] = ([Y.sub.1 - [Y.sub.0])/[Y.sub.0] ...(2)
But an equally plausible notion of the proportional change can be
defined relative to [Y.sub.1] instead of [Y.sub.0]. Let [g.sub.1] be the
definition of the proportional change in Y when using [Y.sub.1] as the
reference point, given by:
[g.sub.1] = ([Y.sub.1 - [Y.sub.0])/[Y.sub.1] ...(3)
Because neither reference point is more appropriate than the other,
we face the same ambiguity we confront when teaching Principles students
about elasticities. The typical solution to this ambiguity problem is
obtained by calculating an arc elasticity, where the percentage change
is taken relative to the average of the "beginning" and
"ending" values. This proposed approach is even more relevant
in regression models where a qualitative characteristic is being
measured using a dummy variable. Given the arbitrary assignment of one
and zero in defining a dummy variable, one cannot argue that the
proportional change is any more relevant in comparison to [Y.sub.0]
(when the dummy is equal to zero) than [Y.sub.1] (when it is equal to
one). Thus, there exists another equally plausible definition of the
proportional change, one based on the point of reference being the
average of [Y.sub.0] and [Y.sub.1], given by:
[g.sub.2] = ([Y.sub.1 - [Y.sub.0])/[bar.Y] ...(4)
where [bar.Y] = ([Y.sub.0] + [Y.sub.1])/2
Using the semilogarithmic function given in (1), these three
alternative definitions of g are related to the regression coefficient a
in the following manner:
[g.sub.0] = exp ([alpha]) - 1 ...(2')
[g.sub.1] = exp ([alpha]) - 1/exp([alpha]) ...(3')
[g.sub.2] = [exp ([alpha]) - 1/ exp ([alpha]) + 1] ...(4')
In Table One below are the theoretical values for [g.sub.0],
[g.sub.1], and [g.sub.2] for a wide range of values of the regression
coefficient a.
As Table One shows, [alpha] is bounded by [g.sub.0] and [g.sub.1],
and the divergence of a from g worsens as a deviates from zero. Since
neither [g.sub.0] nor [g.sub.1] provides an unambiguously correct
standard with which to compare to [alpha], the definition given by (4)
is equally appealing. As seen in Table One, when one compares [alpha] to
[g.sub.2], the differences are very small, even for relatively large
values of [alpha]. To illustrate, consider a value of [alpha] = 0.50,
leading the analyst to infer a 50 per cent difference in Y due to the
existence of the qualitative characteristic. The definition of the
proportional change using [Y.sub.0] as the reference point would suggest
the true proportional change is about 65% (i.e., [g.sub.0] = 0.649) --a
nearly 25 percent understatement arising from using the regression
coefficient. Yet when this proportional change is defined relative to
[bar.Y], we get a value of [g.sub.2] = 0.490, meaning the regression
coefficient underestimated g by only about 2 percent. While a is not
exactly the same as [g.sub.2], this numerical example suggests that the
misinterpretation proposed by H&P is severely overstated (even when
a is relatively large).
To illustrate the appeal of using [g.sub.2] as the standard,
consider the following gender wage gap example. Suppose Y denotes
earnings and D is a dummy variable equal to 1 for males and 0 for
females. If [alpha] = 0.50 in (1), then we would infer that males earn
50 percent more than females after controlling for other variables
affecting earnings. If the claim were made that [g.sub.0] is the correct
measure of the percentage change in Y, then H&P would argue that
males actually earn 64.9% more than females-that is, [alpha]
underestimates the true proportional change. Since the assignment of
zeroes and ones is arbitrary when defining a dummy variable, suppose the
zero-one value assignments had been reversed. Then (1) becomes
In Y=([delta] + [alpha])+ [beta]X - [alpha]D...(5)
where D now equals 1 for females and 0 for males (2). In this case,
the regression coefficient on D in (5) will be -[alpha] = -0.50 implying
that [g.sub.0] = -.393 or -39.3 or -39.3 percent. Thus, while the
analyst would infer that females earn 50 percent less than males,
H&P would propose that the correct proportion is actually 39.3
percent less. Ignoring negative signs, this suggests that a
overestimates the 39.3% lower earnings of females. Hence, using
[g.sub.0] as the standard, the regression coefficient either
underestimates or overestimates the impact on Y depending on the
arbitrary assignment of zeroes and ones in defining D. This
inconsistency is eliminated when [g.sub.2] = -0.490 or minus 49 percent.
Since D's coefficient in (5) is -0.50, the analyst would infer that
females earn 50% less than males, when the "correct" answer is
49% lower earnings. This is exactly the same conclusion we would have
reached had D been defined equal to 1 if male and 0 if female as in
equation (1). In this case, [alpha] = 0.50 (i.e., a 50% higher salary
for males) when the proportion is [g.sub.2] = 0.49.
While the journals surveyed above suggest that many economists
accept the interpretation proposed by H&P, we find a marked contrast
when it comes to the discussion appearing in many of the major
econometrics textbooks. We found that nearly all texts continue to
interpret the coefficient as the proportional change (see Greene, 2000;
Studenmund, 2001; Gujarati, 1995; Wooldridge, 2000; Berndt, 1991). Given
the discussion above, it is obvious that these textbook authors are
technically correct only if we accept the definition of a proportional
change given by [g.sub.2].
In sum, misinterpretation arguments regarding regression
coefficients in semilogarithmic functions are themselves misplaced given
the lack of a clear reference point for calculating changes in the
dependent variable. We propose an equally appealing definition of the
proportional change, one which resembles the method often used in
calculating arc elasticities. Using that definition, we show that the
regression coefficient in a semilogarithmic function is extremely close
to the theoretical value of the proportional change. This leads us to
suggest that there is no need to transform the regression coefficients
as prescribed by H&P in order to get the correct proportional
changes.
REFERENCES
Asgary, N., P. Gregory, and M. Mokhtari (1997), "Money Demand
and Quantity Constraints: Evidence from the Soviet Interview
Project", Economic Inquiry, 35:365-77. Baimbridge, M. (1998),
"Academic and private Sector Salaries: Chalk and Cheese?"
Applied Economic Letters, 5:211-14.
--and P. Whyman (1997), "Demand for Religion in the British
Isles," Applied Economic Letters, 4:79-82.
Benson, E., J. Hansen, A. Schwartz, and G. Smersh (1998),
"Pricing esidential Amenities: The Value of View", Journal of
Real Estate Finance and Economics, 16: 1, 55-73.
Berndt, E. (1991), "Analyzing the Determinants of Wages and
Measuring Wage Discrimination: Dummy Variables in Regression
Models", in" The Practice of Econometrics, Reading, MA:
Addison--Wesley Publishing (Page 173).
Boulding, W. and D. Purohit (1996), "The Price of
Safety", Journal of Consumer Research, 23: 12-15.
Cancio, A., T. Evens, and D. Maume (1996), "Reconsidering the
Declining Significance of Race: Racial Differences in Early Career
Wages", American Sociological Review, 61: 541-56.
Currington, W. (1994), "Compensation for Permanent Impairment
and the Duration of Work Absence", Journal of Human Resources, 29:
3, 888-910.
--(1997), A. Farmer, and W. Allen. "Retroactive Benefits in
Income Replacement Programs: Results from a Modified Natural
Experiment", Southern Economic Journal, 64: 1,255-67.
Dor, A. and D. Farley (1996), "Payment Sources and the Cost of
Hospital Care: Evidence from a Multiproduct Cost Function with Multiple
Payers", Journal of Health Economics, 15: 1-21.
Even, W. and D. Macpherson (1993), "The Decline of
Private-Sector Unionism and the Gender Wage Gap", Journal of Human
Resources, 28: 2, 279-95.
Greene, William (2000), Econometric Analysis (4th edition) New
Jersey: Prentice-Hall, Inc. (Page-215).
Gujarati, Damodar (1995), Basic Econometrics (3rd edition) Your:
McGraw-Hill (Page-169).
Halvorsen, Robert and Raymond Palmquist (1980), "The
Interpretation of Dummy Variables in Semilogarithmic Equations",
American Economic Review, 70: 3,474-75.
Kennedy, P. (1981), "Estimation with Correctly Interpreted
Dummy Variables in Semilogarithmic Equations", American Economic
Review, 71: 4,801.
Lassibille, G. (1998), "Wage Gaps Between the Public and
Private Sectors in Spain", Economics of Education Review, 17: l,
83-92.
Levy, D. and T. Miller (1996), "Hospital Rate Regulations, Fee
Schedules, and Workers' Compensation Medical Payments",
Journal of Risk and Insurance, 63: 1, 35-47.
Malpezzi, S., G. Chun, and R. Green (1998), "New
Place-to-place Housing Price Indexes for U.S. Metropolitan Areas, and
their Determinants", Real Estate Economics, 26: 2, 235-51.
Macdonald, J. and L. Cavalluzzo (1996), "Railroad
Deregulation: Princing Reforms, Shipper Responses, and the Effects on
Labor", Industrial and Labor Relations Review, 50: 1, 80-91.
Rummery, S. (1992), "The Contribution of Intermitten Labour
Force Participation to the Gender Wage Differential", Economic
Record, 68: 202, 351-64.
Studenmund, A, H. Using Econometrics: A Practical Guide (4th
edition), Boston: Addition Wesley Longman, p. 209, 2001.
Thornton, R. and J. Innes (1989), "Interpreting
Semilogarithmic Regression Coefficients in Labor Research", Journal
of Labor Research, 10: 4, 443-47.
Wooldridge, Jeffrey (2000), Introductory Econometrics: A Modern
Approach, Australia: South-Western College Publishing, p. 184.
Table 1
[alpha] [g.sub.0] [g.sub.1] [g.sub.2]
1.00 1.718 0.632 0.924
0.75 1.117 0.528 0.717
0.50 0.649 0.393 0.490
0.25 0.284 0.221 0.249
0.10 0.051 0.049 0.100
0.05 0.051 0.049 0.050
0 0 0 0
-0.05 -0.049 -0.051 -0.050
-0.10 -0.095 -0.105 -0.100
-0.25 -0.221 -0.284 -0.249
-0.50 -0.393 -0.649 -0.490
-0.75 -0.528 -1.117 -0.717
-1.00 -0.632 -1.718 -0.924