Quantile regression for sports economics.
Leeds, Michael A.
Introduction
Quantile regression (QR) represents an important alternative to the
standard least squares approach to regression analysis. Unlike least
squares, QR allows researchers to study the distribution of incomes or
profits, not just the mean values of these variables. It also provides a
way check for heteroskedastic errors and useful method for avoiding
censored variable bias. For these and other reasons, sports economists
should make frequent use of QR techniques. However, despite being
available as a research tool since the work of Koenker and Basset
(1978), QR remains a rarely-used tool in sports economics. This failure
has two sources. First, many researchers do not exploit QR because they
do not fully realize what a powerful tool QR can be, and they shy away
from asking the sort of questions that QR can help answer. Second, many
researchers misapply QR or misinterpret their results.
In this article, I provide a user-friendly introduction to QR. A
certain amount of econometric theory, however, is necessary to
understand when QR is needed, how it differs from ordinary least squares
regressions, and what it can add to our understanding of reality. In the
next section, I develop the QR estimator and show how it differs from
the OLS estimator. I then explain how to interpret QR estimates. In
particular, I warn against a common misinterpretation of QR results.
Specific applications of QR are then provided. The following section
shows how one can use QR to test for heteroskedasticity and to eliminate
censored variable bias, while the next section shows how to use QR to
create counterfactual distributions of the dependent variable.
Deriving the Quantile Regression Estimators
Most econometrics focuses on means. Consider the standard
estimation of the determinants of the dependent variable, [y.sub.i]:
[y.sub.i] = [beta]'[X.sub.i] + [[epsilon].sub.i]. (1)
where [X.sub.i] is a vector of independent variables and the error
term is iid N(0, [[sigma].sup.2]) and uncorrelated with [X.sub.i]. The
least squares regression estimate [beta] minimizes the sum of squared
error terms [[SIGMA].sup.N.sub.i=1][([y.sub.i] - [[??].sub.i]).sup.2].
The estimate fits a line to the data, which can be expressed as the
expected value of yj conditional on the values of the [X.sub.i]
E([y.sub.i]|[X.sub.i]) = [beta]'[X.sub.i] (2)
We can think of the regression line as passing through the peak of
a sequence of normal distributions. Figure 1 illustrates the impact of
goals that a hockey player scores on the natural logarithm of his
salary, holding all other variables constant.
The least squares estimator leads naturally to a discussion of
means. The normal equations show that the regression line in Figure 1
passes through the sample mean of both salary ([bar.S]) and goals scored
([bar.G]). More generally, they indicate that the least squares
regression line passes through the mean value of salary given the number
of goals scored. Specifically:
[bar.y] = [??]'[bar.X] (3)
Under the assumptions laid out above, we know that the regression
line also passes through the median of the conditional distribution
F([y.sub.i]|[X.sub.i]). If, however, the distribution is not normal--if,
for example, it is skewed--the mean and the median will not coincide. In
some cases, such as the Laplace or Cauchy distributions, the mean is
difficult or impossible to compute. In these cases, it is useful to have
an estimator that provides the conditional median of [y.sub.i] under
more general conditions. One of the first applications of median--and
later quantile--regression was to deal with settings in which the error
term is skewed. Kahane (2010) provides just such a justification for
using QR to evaluate earnings on the PGA tour.
[FIGURE 1 OMITTED]
Median regression estimates minimize the sum
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)
By summing up the vertical distances, Equation (4) places equal
weights on distances above and below the regression line. Minimizing S
thus results in a special case of QR estimation--it estimates the 50th
quantile, where a quantile ([Q.sub.[theta]]) of a random variable (z) is
the minimum value of z that has [theta] percent of the distribution
lying beneath it:
[Q.sub.[theta]](z) = [F.sup.-1.sub.z]([theta]) [greater than or
equal to] [theta]], (5)
where F(z) is the cumulative density function. Thus, if [theta]=0.5
and z~N(0, [[sigma].sup.2]) in Equation (5), [Q.sub.[theta]](z) is the
sample mean. In the case of median regression, half the observed values
of [y.sub.i] for any given value of [X.sub.i] lie above the fitted line,
and half lie below it.
Because minimizing equation (4) yields the conditional median of
[y.sub.i], the median regression line is unchanged if the distribution
of the residual is skewed up or down. This does not hold for the least
squares estimator, which yields the conditional mean. The standard
regression line moves up or down if the distribution of the residual is
skewed up or down.
Returning to the hockey example, while the median represents the
"typical" player in the NHL, we may be more interested in
other parts of the distribution. We may, for example, want to know what
affects the pay of players who are unusually well-paid or unusually
poorly paid, given the number of goals they score. A general quantile
regression fits a line so that a given proportion of the [y.sub.i]
conditional on an observed [X.sub.i] lies below the regression line with
the remainder lying above it. Thus, the 90th quantile specifies a line
that passes through the upper tail of the density function
f([y.sub.i]|[x.sub.i]), while the 10th quantile specifies a fine that
passes through the lower tail. The QR estimator for a general quantile,
9, minimizes the weighted sum
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (6)
When 6-0.5, minimizing S becomes equivalent to the procedure
outlined above.
Equation (6) is often rewritten as
S = 1/n [[SIGMA].sup.n.sub.i=1][[rho].sub.[theta]]([[epsilon].sub.[theta]i]) (7)
Pre-multiplying the summation by (1/n) has no impact on the
minimization, as it is a simple, monotonic transformation of the
summation, while [[rho].sub.[theta]([[epsilon].sub.[theta]i]) restates
the weighted sum. Consistent with Equation (5), [rho] imposes a weight
of [theta] on an observation if [[epsilon].sub.[theta]i] > 0 and a
weight of 1-[theta] if [[epsilon].sub.[theta]i] < 0. We call
[[rho].sub.[theta]](x) a check function, as the weighting system looks
like a check mark. More precisely, the weights are line segments that
are anchored at the given percentile of the distribution and form a
90[degrees] angle (see Koenker and Hallock (2001) and Angrist and
Pischke (2009) for a more complete discussion).
Performing QR and Interpreting the Output
Using STATA, one can estimate a quantile regression with the
command
qreg depvar indepvars, quant (Q), (8)
where "depvar" is the dependent variable,
"indepvars" are the explanatory variables (separated by
spaces), and "Q" is the quantile one wishes to estimate. For
example, to estimate the 25th quantile for the regression of salary on
goals one types the command
qreg salary goals, quant (0.25) (9)
To run a sequence of quantile regressions at once (e.g., for the
10th, 25th, 50th, 75th, and 90th), one simply replaces Expression 9 with
sqreg salary goals, quant (0.1 0.25 0.5 0.75 0.9) (10)
As is common when error terms are not normally distributed,
standard errors might be difficult or impossible to compute. In that
case, bootstrapping may be necessary. Bootstrapping starts by estimating
the model in Equation (1), computing the residual for all N observations
in the sample, and saving both predicted values, [[??].sub.i] and
[[??].sub.j]. One then resamples the data set (with replacement),
drawing N new observations of the residual. Pairing the re-sampled
[[??].sub.j] with the predicted values of y enables one to recalculate
all N values of the dependent variable as [MATHEMATICAL EXPRESSION NOT
REPRODUCIBLE IN ASCII] Finally, one re-fits the model using the newly
computed [[??].sub.i]. Repeating this procedure yields increasingly
reliable standard errors. One can easily obtain bootstrapped estimates
for quantile regression by replacing Expression (9) with
bsqreg salary goals, quant (0.25) reps (R), (11)
where the command bsqreg computes the standard errors via
bootstrapping with R repetitions. Because the estimates eventually
converge as the number of repetitions rises, one should specify a large
value for R. Using a low R (STATA's default is 25) is useful when
performing preliminary estimation, but one should experiment with larger
numbers of repetitions to see how many are required for convergence.
Typically, 250 repetitions are sufficient, but the appropriate number of
repetitions could exceed 1000.
Table 1 shows the results for select variables of a series of
quantile regression of the natural logarithm of salary on player and
team characteristics. (1) When estimating several different
specifications, the presentation of results can become unwieldy. As a
result many researchers avoid large, hard-to-read tables by presenting
their results in graphical form. Figures 2-4 illustrate the QR and OLS
results for the impact of goals, plus/minus rating, and hits. (2)
In each figure, the OLS estimate appears as a horizontal line. This
reflects the fact that the OLS estimate for each variable is the same at
all levels of the conditional distribution. In contrast, the QR results
in Figures 2-4 show three different patterns of impact. Figure 2 shows
that the impact of goals falls when one goes from the 10th quantile to
the 25th quantile but then steadily rises. While this movement is
interesting, one should note that the differences in the coefficients
from each other or from the OLS estimate are not statistically
significant. This is illustrated by the fact that the 95-percent
confidence intervals--illustrated by the shaded area--overlap all the
estimates. (3) When reading studies that use QR, one should be careful
to note whether the author accounts for such significance measures.
Failing to do so can lead to misguided conclusions.
[FIGURE 2 OMITTED]
[FIGURE 3 OMITTED]
[FIGURE 4 OMITTED]
The coefficients for plus/minus rating in Figure 3 rise as one goes
from the 10th to the 50thk quantile but declines thereafter, forming an
"inverse-U" shape. Figure 4 shows that the impact of hits
steadily falls.
As with OLS estimates, one interprets QR coefficients as rates of
change in conditional expected values. The QR expectations, however, are
taken of quantiles, rather than of the variables themselves. In our
hockey example, the OLS coefficient on goals is the impact of goals on
the conditional mean of salary.
[??] = [partial derivative](E(S|G))/[partial derivative]G, (12)
where S is salary and G is goals. Because the expectation in
Equation (2) is a constant, it is independent of the distributions
quantile. This results in the single coefficient estimate in column 1 of
Table 1, rather than an estimate that changes with the quantile, as seen
in columns 2 through 6. It also corresponds to the horizontal lines in
Figures 2-4.
The QR estimate refers to the expectation of the [theta]th
conditional quantile of the dependent variable. The estimate shows how
the expected conditional quantile changes
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (13)
Thus, if the error term of the regression equation is iid N(0,
[[sigma].sup.2]), Equations (12) and (13) are equivalent statements.
Equation (13) provides the basis for interpreting QR output. The
coefficient [[beta].sub.[theta]] tells us the impact of a small change
in the explanatory variable on the value of [y.sub.i] that corresponds
to the [[theta].sub.th] quantile in the conditional distribution of the
residual. Returning to our hockey example, the estimated coefficients
for the impact of goals rises from the statistically insignificant
[[??].sub.0.1] = 0.0114 to [[??].sub.0.9] = 0.0197. These results mean
that scoring one more goal has no impact on the 10th percentile of the
conditional distribution of ln(salary) but increases the salary of a
player in the 90th percentile of the conditional distribution by two
percent. In other words, goals do not add to the salaries of players who
are unusually low-paid but have a large impact on the salaries of
players who are unusually well-paid.
In contrast, the impact of hits falls from [[??].sub.0.1] = 0.0016
to the statistically insignificant [[??].sub.0.9] = -0.0011. Thus, one
more hit increases the salary of a player in the 10th percentile of the
conditional of the distribution of ln(salary) by 0.16 percent, while it
has no impact on a player in the 90th percentile. Thus an added hit has
a greater impact for a player who is unusually low-paid than for an
unusually well-paid player.
It is important to emphasize what the quantile coefficient does not
mean, as QR results are often misinterpreted in one of two ways. The
coefficient does not tell us that scoring one more goal increases the
salaries of the top 10% of goal scorers or of players in the top 10
percent of the NHLs salary distribution by 1.97%. Each of these
interpretations applies to a regressions run on a subset of the
population of NHL players, one containing only the 10% highest scorers
and the other containing the 10% earning the highest salaries. Neither
interpretation relates to the results of a QR regression.
[FIGURE 5 OMITTED]
Vincent and Eastman (2009) fall into the trap of interpreting
quantiles as subsets of the data. This leads them to claim that high
quantiles reflect the pay of star players, while low quantiles refer to
low-paid "diggers." In fact, the quantile coefficients have a
more subtle interpretation. In our example, QR uses the full sample of
NHL players to generate the estimates. It plots a line through the
scatter of all points so that observed values of player salaries have a
given probability of lying above the fitted line at each value of the
number of goals a player scores. Thus, the 90th quantile regression is
not based on the richest players or the most prolific scorers. Instead,
it shows the impact of scoring one more goal on the conditional
distribution of salaries. That is, [[??].sub.0.9] shows how scoring
another goal affects the salaries of players who are highly-paid
relative to other players who have scored a given number of goals. A
coefficient of 0.0197 means that a player who falls in the 90th
percentile of the error distribution at his current number of goals
receives on average 1.97% more in salary if he scores another goal. This
result applies equally to low-paid, marginal players who score 2 goals a
season and highly-paid stars who score 52 goals in a season, as long as
their salary falls in the 90th percentile of salaries among players who
score that many goals.
Figure 5 shows the difference between the correct and incorrect
interpretation of QR results. The QR regression line fitting the 90th
quantile, 0-0.9, passes through the upper tail of the two distributions
illustrated. The figure indicates that the 90th quantile can correspond
to a relatively low salary. In Figure 5, a player in the 90th percentile
of salaries among those who score few goals still earns less than a
middling player who scores many goals.
The 90th quantile identifies players who are unusually well-paid
conditional on the number of goals they score. It does not refer to
players who score many goals or earn large salaries. Thus, a player who
is in the middle of the conditional distribution for players who score
52 goals (which would make him one of the league leaders) would be among
the highest-paid players in the league. However, as Table I shows, an
additional goal brings him only a 1.55% increase in salary. A player who
scores only 2 goals but is in the 90th quantile earns much less overall,
but he receives an additional 1.97% in salary from each additional goal.
The 90th quantile is not the same as being in the 90th percentile
of salaries. That consists of all players whose salaries lie above the
horizontal dotted line in Figure 5. To study the impact of an additional
goal on players in the 90th percentile, one should run a regression
using only points that lie above the horizontal line.
The distinction between unusually highly-paid players and unusually
low-paid players provides a way to interpret QR results that should be
particularly useful for sports economists. Specifically, the estimates
may give us a way to specify players' bargaining power. Players
whose pay is "unusually" high could have particularly strong
bargaining power, whether because of popularity with fans (as might have
been the case for players like Derek Jeter) or because of outside
opportunities (as might have been true for Oscar de la Hoya). Our
inability to measure bargaining power has long limited our ability to
estimate salary equations. QR analysis provides one way out of this box.
(4)
Statistical Applications of Quantile Regression
The fact that QR acts on specific parts of the conditional
distribution enables researchers to use it to test for or resolve a
number of econometric problems. For example, one can use QR estimates at
different points in the conditional distribution of the dependent
variable to test for heteroskedasticity. The impact of the hypothetical
coefficients in our hockey example appears in Figure 6. (For more on the
subject, see Angrist and Pischke, 2009.)
[FIGURE 6 OMITTED]
Using the results for hits presented in Table 1, [[??].sub.0.1] =
0.0016 implies that the lower tail of the distribution is pushed upward
by 0.16% as the number of goals rises. Ignoring for now the statistical
insignificance of the coefficient, the estimate [[??].sub.0.9] = 0.0011
implies that, if a player in the 90% percentile makes one more hit, his
salary fall by 0.11%. This pulls the upper tail of the conditional
distribution down as the number of hits increases. Both the upper
extreme and the lower extreme are pulled away from the median, which
rises at a rate of 0.0008 ([[??].sub.0.5] = 0.0008). With the lower tail
pulled down from the median and the upper tail pulled up from the
median, the result is an reduction in the variance of the error term as
the number of goals rises.
If the coefficients had been the same (say, [[??].sub.0.1] =
[[??].sub.0.5] = [[??].sub.0.9] = 0.0008), then salaries for both
quantiles would have increased by 0.08% with each additional hit. In
this case, the lines through all quantiles of the conditional
distributions would be parallel. This, in turn, means that the
conditional distribution of the dependent variable shifts upward at a
constant rate as the independent variable increases, keeping the
variance of the conditional distribution constant. The picture of the
distributions would look like Figure 1. Thus, one way to test for
heteroskedasticity is to test whether [MATHEMATICAL EXPRESSION NOT
REPRODUCIBLE IN ASCII].
QR also provides an alternative way to deal with censored
variables. Censoring occurs when all values of the dependent variable
that lie below a given threshold, c, are assigned the value c. Censoring
is common in professional sports, where players' salaries are often
limited by league-wide minima that are specified by collective
bargaining agreements. A player whose marginal revenue product lies
below this minimum value either does not earn a roster spot or receives
the mandated minimum salary. Because many values are pushed up from
their "actual" levels, OLS estimation results in biased
estimates, and researchers must resort to difficult-to-interpret methods
such as Tobit analysis.
QR provides a new way around the problem of censored variables. If,
for example, all the ideal assumptions hold, except for the fact that
the dependent variable is censored, median regression can provide an
unbiased version of the OLS estimates. Recall that the median of a
distribution is unaffected by how far above or below the median the
distribution lies. As long as the affected observations stay on one side
of the median, the median is unaffected. Thus, if the conditional median
salaries lie above the league-mandated minimum, the median regression
estimates are unaffected by censoring.
The above result holds only as long as censoring does not push the
conditional median below the threshold. If it does, then censoring
affects median regression as well as OLS. For this reason, QR results
are less likely to be affected by league-wide minima for estimates in
the upper quantiles and are more likely to be affected for estimates in
the lower quantiles.
QR and Counterfactual Distributions
Economists frequently ask what the world would have been like if
events had taken a different turn. For example, labor economists wonder
what the distribution of pay would be if unions were as powerful today
as they had been in the 1970s (Buchinsky, 1994; DiNardo et ah, 1996).
Sports economists similarly ask what salary distributions would be today
if free agency did not exist (Leeds & Kowalewski, 2001). QR allows
us to specify such counterfactual distributions. Similarly, QR provides
a generalization of the "Oaxaca decomposition" (Oaxaca, 1973a,
1973b) by showing how discrimination affects the distribution of wages
and not just the mean wage.
To appreciate Oaxaca's key insight, let's modify the
hockey example to ask whether European hockey players are the victims of
discrimination. (5) The most obvious way to test for discrimination is
to add a dummy variable to the right-hand side variables. While dummy
variables and interaction effects can capture the impact of
discrimination if there are only a few explanatory variables, they
quickly become unwieldy and difficult to interpret. In contrast, the
Oaxaca decomposition can deal with an arbitrary number of explanatory
variables. In addition, it allows us to simulate what the mean pay of
Europeans (denoted by the subscript E) would be if they were treated
like North Americans (N). To perform the decomposition, one runs
separate Mincer wage equations for North Americans and Europeans. The
predicted average wage for each group is given by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (14a)
for North Americans and
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (14b)
for Europeans. I define [[??].sub.j] as the predicted average value
of the (natural logarithm of) a North American or European players
salary and [[bar.X].sub.i] as a vector of the mean values of explanatory
variables for each group. One can use the above equations to predict
what the mean value of Europeans' pay would be if they were treated
the same as North Americans. (I call this simulated pay [[??].sub.N,E].)
To simulate one applies the means of the explanatory variables for
European players to the parameters that determine the pay of North
American players. This simply replaces [[bar.X].sub.N] in Equation (13a)
with [[bar.X].sub.E]. The result is the prediction
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (15)
The difference [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
results from what Oaxaca calls "differences in the data,"
differences in the abilities of Europeans and North Americans. The
difference [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] reflects
"differences in the coefficients," differences in how teams
reward the performances of Europeans and North Americans. Oaxaca
attributes this latter difference to discrimination.
In the sample underlying our example, the average salary of North
American players (about $980,000) is about half that of European players
($1.90M). However, as the last two columns of Table 2 show, European
players generally outperform North American players. To see how much of
this is due to performance, one can simulate how much Europeans would be
paid if they were treated like North Americans. To form the
counterfactual wage given by Equation (15), apply the coefficients in
the first column of Table 2 to the means in the third column. The result
is a simulated average pay of about $1.52M. Thus, about half the
differential can be explained by performance. (6)
The Oaxaca decomposition was a significant advance over detecting
discrimination with a dummy variable. However, the reliance on means
leaves unanswered how means might change. As noted above, means can
change because the entire distribution moves or because a portion of the
distribution is distended. The average wages of Europeans could thus
approach those of North Americans for any number of reasons. For
example, the pay of highly-paid Europeans might rise relative to those
of otherwise equivalent North Americans, while those of most European
players may remain far below the pay of their North American
counterparts (for an example of how this applies to sex discrimination
in the labor market, see Albrecht et al., 2003).
Machado and Mata (2005) simulate the entire counterfactual
distribution of wages by combining QR techniques with Oaxaca's
simulation. To see how Machado and Mata's extension applies to our
example, begin by running a separate regression for the subsample of
North Americans. However, instead of running an OLS equation, as in
Equations (14) and (15), they run a series of quantile regressions for
randomly selected quantiles, [[theta].sub.l] to generate the quantile
estimates ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII])
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (16)
For each quantile regression, they randomly select, with
replacement, an observation from the subsample of Europeans and apply
the value to the coefficients and Equation (16) to form the predicted
value
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (14)
This prediction gives one data point in the simulated distribution
of (the natural logarithm of) wages that would have been earned by the
ith European hockey player ([X.sub.Ei]) if his salary were in quantile
[theta] and he had been treated like a North American ([MATHEMATICAL
EXPRESSION NOT REPRODUCIBLE IN ASCII]). Once one has forecasted enough
data points, one can then graph the actual distribution of the pay of
European players and compare it to the distribution of pay that would
have existed if Europeans were treated like North Americans. Figure 7
compares the actual distribution of salaries for European free agents in
the NHL in 2010 and 2011 with the counterfactual distribution that would
have prevailed had the European players been treated like North American
players. Both distributions are bimodal with both modes in roughly the
same place. However, the counterfactual distribution shows that, if
Europeans had been treated like North Americans, the lower mode would
have been more prominent, while the higher mode would have been less
prominent. By comparing the counterfactual and actual distributions, one
can state more clearly the impact of discrimination on the pay of
European hockey players.
[FIGURE 7 OMITTED]
Conclusion
Quantile regression is a highly underutilized tool in sports
economics. Sports economists frequently encounter settings that call for
QR techniques. For example, superstar effects and thin labor markets
suggest that bargaining power frequently affects salary negotiations. As
a result, there is often strong reason to believe that error terms are
not normally distributed. As another example, the presence of a minimum
salary (and, in the case of the NBA, a maximal salary) frequently
censors the dependent variable in salary regressions. Under such
conditions, OLS could yield biased estimates. By allowing exogenous
variables to have different impacts on "unusually" highly paid
players than they have on players with "unusually" low pay, QR
also provides a way of accounting for the relative bargaining power of
individual players.
Sports economists also frequently ask questions that are best
answered using QR techniques. Discrimination, or such institutional
changes as the appearance of free agency, often affects the entire
distribution of salaries or of playing time. OLS estimates allow us to
see how these factors affect the mean salary. Changes in the mean,
however, might mask broader movements of the distribution. QR allows
researchers to see how free agency affects all players. Anecdotal
evidence, for example, suggests that free agency might have rewarded
superstars while placing downward pressure on the pay of marginal
players. By allowing researchers to simulate an entire counterfactual
distribution, QR techniques allow for a more nuanced analysis of such
institutional changes.
Despite the clear value of QR, it remains seldom used for two basic
reasons. First, many sports economists do not appreciate the power of QR
estimation and how it can improve upon standard techniques. Second, even
if they appreciate QR, they might not understand how to use QR or how to
interpret their results. This has led many economists to misinterpret
the results of QR estimation. Failing to interpret QR results properly
will cause leading journals to reject papers that ask important
questions and use the appropriate techniques.
The goal of this essay has been to help sports economists to
overcome these three stumbling blocks. Properly used, QR can expand the
variety of questions that sports economists ask. It can also improve
upon the quality of answers that they provide.
Michael A. Leeds is a professor in the Department of Economics. His
research interests include the economics of baseball in Japan and gender
differences in economic contests.
References
Albrecht, J., Bjorklund, A., & Vroman, S. (2003). Is there a
glass ceiling in Sweden? Journal of Labor Economics, 21,145-176.
Angrist, J. D., & Pischke, J.-S. (2009). Mostly harmless
econometrics: An empricist's companion. Princeton, NJ: Princeton
University Press.
Buchinsky, M. (1994). Changes in the US wage structure, 1963-1987:
Application of quantile regression. Econometrica, 62, 405-458.
DiNardo, J., Fortin, N. M., & Lemieux, T. (1996). Labor market
institutions and the distribution of wages, 1973-1992: A wemiparametric
approach. Econometrica, 64,1001-1044.
Kahane, L. (2010). Returns to skill in professional golf: A
quantile regression approach. International Journal of Sport Finance,
5,167-180
Kahane, L., Longley, N., & Simmons, R. (2013). The effects of
co-worker heterogeneity on firm-level output: Assessing the impacts of
cultural and language diversity in the National Hockey League. Review of
Economics and Statistics, 95, 302-314.
Koenker, R., 8c Bassett, G. (1978). Regression Quantiles.
Econometrica, 46(1), 33-50.
Koenker, R., 8c Hallock, K. F. (2001). Quantile regression. Journal
of Economic Perspectives, 15(4), 143-156.
Kowalewski, S. (2010). Salary determination in the National
Football League. Unpublished doctoral dissertation, Temple University.
Leeds, M., 8c Kowalewski, S. (2001). Winner-take-all in the NFL:
The effect of the salary cap and free agency on the composition of skill
position players. Journal of Sports Economics, 2,244-256.
Machado, J., 8c Mata, J. (2005). Counterfactual decomposition of
changes in wage distributions using quantile regression. Journal of
Applied Econometrics, 20, 445-465.
Oaxaca, R. (1973a). Male-female wage differentials in urban labor
markets. International Economic Review, 14, 693-709.
Oaxaca, R. (1973b). Sex discrimination in wages. In O. Ashenfelter
8c A. Rees (Eds.), Discrimination in labor markets (pp. 124-151).
Princeton, NJ: Princeton University Press.
Vincent, C., 8c Eastman, B. (2009). Determinants of pay in the NHL:
A quantile regression approach. Journal of Sports Economics, 10,
256-277.
Von Allmen, P., Leeds, M. A., 8c Malakorn, J. (2014). Victims or
beneficiaries? Wage premia and national origin in the National Hockey
League. Unpublished manuscript.
Endnotes
(1) For more on the underlying data, see von Allmen, Leeds, and
Malakorn (2014). Full regression results are available upon request.
(2) The Plus/minus statistic awards a player a point for each goal
scored by his team when he is on the ice and deducts a point for each
goal scored by an opponent when he is on the ice. A hit occurs when a
player initiates contact with an opposing player that causes the player
to lose control of the puck.
(3) A graphing feature for QR can be added to STATA with the
command: ssc install grqreg.
(4) To my knowledge, Kowalewski (2010) was the first to apply this
interpretation of QR.
(5) Oaxaca's initial estimation measured pay differentials
between men and women. While there is a sizable literature that tests
for discrimination against French-Canadians, discrimination against
Europeans has received scant attention. One exception is Kahane,
Longley, and Simmons (2013).
(6) Von Allmen et al. (2014) attribute the remaining differential
to differences in bargaining power rather than to discrimination
Author's Note
An early version of this paper was presented for the North American
Association of Sports Economists at the 2013 Western Economic
Association International meetings in Seattle. I thank Gary Solon for
introducing me to the theory behind quantile regression and providing me
with notes on the subject that have become worn with use. I also thank
Eva Marikova Leeds and an anonymous referee for their many helpful
comments and suggestions on this manuscript.
Michael A. Leeds
Temple University
Table 1: Selected OLS and QR Results
Coefficient OLS 10th 25th
Quantile Quantile
Defenseman 0.1585 *** 0.0557 0.0204
(3.06) (0.85) (0.41)
Goals 0.0195 *** 0.0134 0.0078
(4.20) (1.31) (1.05)
Plus/Minus 0.0112 *** 0.0131 *** 0.0143 ***
(4.72) (2.72) (4.41)
Hits 0.0007 0.0016 ** 0.0014 **
(1.24) (2.03) (2.02)
Coefficient 50th 75th 90th
Quantile Quantile Quantile
Defenseman 0.1033 * 0.1648 *** 0.1131
(1.90) (3.53) (0.74)
Goals 0.0155 *** 0.0178 *** 0.0197 ***
(3.05) (3.74) (2.72)
Plus/Minus 0.0146 *** 0.0109 *** 0.0096 **
(4.48) (3.83) (2.12)
Hits 0.0008 0.0003 -0.0011
(1.34) (0.45) (1.18)
Other variables were: Player weight, team revenue,
time on ice, time on ice while team was short-handed,
time on ice during a power play, career games, career
games squared, draft position, and the number of
previous All Star Game appearances. The regression
also contained dummy variables indicating whether the
player was European and whether the player was a free
agent in 2010 or 2011.
Table 2: Coefficients and means for the Oaxaca Decomposition
Variable OLS for North Means Means for
Americans for North Europeans
Americans
Salary -- $978,494 $1.90 MM
Defenseman 0.1106 0.3118 0.4074
Weight 0.0028 204.6379 208.0864
Team Revenue (MM) 0.0006 101.5337 104.1355
Total Minutes on Ice 0.0001 703.3261 1109.617
Power Play Minutes 0.0036 53.2782 114.5679
Short-handed Minutes 0.0016 56.6667 80.3704
Career Games 0.0011 250.4029 422.7531
Career Games2 1.12[(10).sup.-6] 1,330,336 299,688
Goals in prior year 0.0159 5.8897 9.3827
Plus/Minus in prior year 0.0120 -1.1439 1.7778
Hits in prior year 0.0005 61.4628 71.1111
Draft Position -0.0103 4.7434 3.7778
All-Star Appearances 0.1860 0.0791 0.6420
Free Agent in 2011 0.01050 0.5204 0.6049
Constant 12.3578 -- --