文章基本信息

标题：Quantile regression for sports economics.
作者：Leeds, Michael A.
期刊名称：International Journal of Sport Finance
印刷版ISSN：1558-6235
出版年度：2014
期号：November
语种：English
出版社：Fitness Information Technology Inc.
摘要：Quantile regression (QR) represents an important alternative to the standard least squares approach to regression analysis. Unlike least squares, QR allows researchers to study the distribution of incomes or profits, not just the mean values of these variables. It also provides a way check for heteroskedastic errors and useful method for avoiding censored variable bias. For these and other reasons, sports economists should make frequent use of QR techniques. However, despite being available as a research tool since the work of Koenker and Basset (1978), QR remains a rarely-used tool in sports economics. This failure has two sources. First, many researchers do not exploit QR because they do not fully realize what a powerful tool QR can be, and they shy away from asking the sort of questions that QR can help answer. Second, many researchers misapply QR or misinterpret their results.
关键词：Heteroscedasticity;Quantile regression;Sports clubs

Quantile regression for sports economics.

Leeds, Michael A.

Introduction

Quantile regression (QR) represents an important alternative to the standard least squares approach to regression analysis. Unlike least squares, QR allows researchers to study the distribution of incomes or profits, not just the mean values of these variables. It also provides a way check for heteroskedastic errors and useful method for avoiding censored variable bias. For these and other reasons, sports economists should make frequent use of QR techniques. However, despite being available as a research tool since the work of Koenker and Basset (1978), QR remains a rarely-used tool in sports economics. This failure has two sources. First, many researchers do not exploit QR because they do not fully realize what a powerful tool QR can be, and they shy away from asking the sort of questions that QR can help answer. Second, many researchers misapply QR or misinterpret their results.

In this article, I provide a user-friendly introduction to QR. A certain amount of econometric theory, however, is necessary to understand when QR is needed, how it differs from ordinary least squares regressions, and what it can add to our understanding of reality. In the next section, I develop the QR estimator and show how it differs from the OLS estimator. I then explain how to interpret QR estimates. In particular, I warn against a common misinterpretation of QR results. Specific applications of QR are then provided. The following section shows how one can use QR to test for heteroskedasticity and to eliminate censored variable bias, while the next section shows how to use QR to create counterfactual distributions of the dependent variable.

Deriving the Quantile Regression Estimators

Most econometrics focuses on means. Consider the standard estimation of the determinants of the dependent variable, [y.sub.i]:

[y.sub.i] = [beta]'[X.sub.i] + [[epsilon].sub.i]. (1)

where [X.sub.i] is a vector of independent variables and the error term is iid N(0, [[sigma].sup.2]) and uncorrelated with [X.sub.i]. The least squares regression estimate [beta] minimizes the sum of squared error terms [[SIGMA].sup.N.sub.i=1][([y.sub.i] - [[??].sub.i]).sup.2]. The estimate fits a line to the data, which can be expressed as the expected value of yj conditional on the values of the [X.sub.i]

E([y.sub.i]|[X.sub.i]) = [beta]'[X.sub.i] (2)

We can think of the regression line as passing through the peak of a sequence of normal distributions. Figure 1 illustrates the impact of goals that a hockey player scores on the natural logarithm of his salary, holding all other variables constant.

The least squares estimator leads naturally to a discussion of means. The normal equations show that the regression line in Figure 1 passes through the sample mean of both salary ([bar.S]) and goals scored ([bar.G]). More generally, they indicate that the least squares regression line passes through the mean value of salary given the number of goals scored. Specifically:

[bar.y] = [??]'[bar.X] (3)

Under the assumptions laid out above, we know that the regression line also passes through the median of the conditional distribution F([y.sub.i]|[X.sub.i]). If, however, the distribution is not normal--if, for example, it is skewed--the mean and the median will not coincide. In some cases, such as the Laplace or Cauchy distributions, the mean is difficult or impossible to compute. In these cases, it is useful to have an estimator that provides the conditional median of [y.sub.i] under more general conditions. One of the first applications of median--and later quantile--regression was to deal with settings in which the error term is skewed. Kahane (2010) provides just such a justification for using QR to evaluate earnings on the PGA tour.

[FIGURE 1 OMITTED]

Median regression estimates minimize the sum

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)

By summing up the vertical distances, Equation (4) places equal weights on distances above and below the regression line. Minimizing S thus results in a special case of QR estimation--it estimates the 50th quantile, where a quantile ([Q.sub.[theta]]) of a random variable (z) is the minimum value of z that has [theta] percent of the distribution lying beneath it:

[Q.sub.[theta]](z) = [F.sup.-1.sub.z]([theta]) [greater than or equal to] [theta]], (5)

where F(z) is the cumulative density function. Thus, if [theta]=0.5 and z~N(0, [[sigma].sup.2]) in Equation (5), [Q.sub.[theta]](z) is the sample mean. In the case of median regression, half the observed values of [y.sub.i] for any given value of [X.sub.i] lie above the fitted line, and half lie below it.

Because minimizing equation (4) yields the conditional median of [y.sub.i], the median regression line is unchanged if the distribution of the residual is skewed up or down. This does not hold for the least squares estimator, which yields the conditional mean. The standard regression line moves up or down if the distribution of the residual is skewed up or down.

Returning to the hockey example, while the median represents the "typical" player in the NHL, we may be more interested in other parts of the distribution. We may, for example, want to know what affects the pay of players who are unusually well-paid or unusually poorly paid, given the number of goals they score. A general quantile regression fits a line so that a given proportion of the [y.sub.i] conditional on an observed [X.sub.i] lies below the regression line with the remainder lying above it. Thus, the 90th quantile specifies a line that passes through the upper tail of the density function f([y.sub.i]|[x.sub.i]), while the 10th quantile specifies a fine that passes through the lower tail. The QR estimator for a general quantile, 9, minimizes the weighted sum

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (6)

When 6-0.5, minimizing S becomes equivalent to the procedure outlined above.

Equation (6) is often rewritten as

S = 1/n [[SIGMA].sup.n.sub.i=1][[rho].sub.[theta]]([[epsilon].sub.[theta]i]) (7)

Pre-multiplying the summation by (1/n) has no impact on the minimization, as it is a simple, monotonic transformation of the summation, while [[rho].sub.[theta]([[epsilon].sub.[theta]i]) restates the weighted sum. Consistent with Equation (5), [rho] imposes a weight of [theta] on an observation if [[epsilon].sub.[theta]i] > 0 and a weight of 1-[theta] if [[epsilon].sub.[theta]i] < 0. We call [[rho].sub.[theta]](x) a check function, as the weighting system looks like a check mark. More precisely, the weights are line segments that are anchored at the given percentile of the distribution and form a 90[degrees] angle (see Koenker and Hallock (2001) and Angrist and Pischke (2009) for a more complete discussion).

Performing QR and Interpreting the Output

Using STATA, one can estimate a quantile regression with the command

qreg depvar indepvars, quant (Q), (8)

where "depvar" is the dependent variable, "indepvars" are the explanatory variables (separated by spaces), and "Q" is the quantile one wishes to estimate. For example, to estimate the 25th quantile for the regression of salary on goals one types the command

qreg salary goals, quant (0.25) (9)

To run a sequence of quantile regressions at once (e.g., for the 10th, 25th, 50th, 75th, and 90th), one simply replaces Expression 9 with

sqreg salary goals, quant (0.1 0.25 0.5 0.75 0.9) (10)

As is common when error terms are not normally distributed, standard errors might be difficult or impossible to compute. In that case, bootstrapping may be necessary. Bootstrapping starts by estimating the model in Equation (1), computing the residual for all N observations in the sample, and saving both predicted values, [[??].sub.i] and [[??].sub.j]. One then resamples the data set (with replacement), drawing N new observations of the residual. Pairing the re-sampled [[??].sub.j] with the predicted values of y enables one to recalculate all N values of the dependent variable as [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] Finally, one re-fits the model using the newly computed [[??].sub.i]. Repeating this procedure yields increasingly reliable standard errors. One can easily obtain bootstrapped estimates for quantile regression by replacing Expression (9) with

bsqreg salary goals, quant (0.25) reps (R), (11)

where the command bsqreg computes the standard errors via bootstrapping with R repetitions. Because the estimates eventually converge as the number of repetitions rises, one should specify a large value for R. Using a low R (STATA's default is 25) is useful when performing preliminary estimation, but one should experiment with larger numbers of repetitions to see how many are required for convergence. Typically, 250 repetitions are sufficient, but the appropriate number of repetitions could exceed 1000.

Table 1 shows the results for select variables of a series of quantile regression of the natural logarithm of salary on player and team characteristics. (1) When estimating several different specifications, the presentation of results can become unwieldy. As a result many researchers avoid large, hard-to-read tables by presenting their results in graphical form. Figures 2-4 illustrate the QR and OLS results for the impact of goals, plus/minus rating, and hits. (2)

In each figure, the OLS estimate appears as a horizontal line. This reflects the fact that the OLS estimate for each variable is the same at all levels of the conditional distribution. In contrast, the QR results in Figures 2-4 show three different patterns of impact. Figure 2 shows that the impact of goals falls when one goes from the 10th quantile to the 25th quantile but then steadily rises. While this movement is interesting, one should note that the differences in the coefficients from each other or from the OLS estimate are not statistically significant. This is illustrated by the fact that the 95-percent confidence intervals--illustrated by the shaded area--overlap all the estimates. (3) When reading studies that use QR, one should be careful to note whether the author accounts for such significance measures. Failing to do so can lead to misguided conclusions.

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

[FIGURE 4 OMITTED]

The coefficients for plus/minus rating in Figure 3 rise as one goes from the 10th to the 50thk quantile but declines thereafter, forming an "inverse-U" shape. Figure 4 shows that the impact of hits steadily falls.

As with OLS estimates, one interprets QR coefficients as rates of change in conditional expected values. The QR expectations, however, are taken of quantiles, rather than of the variables themselves. In our hockey example, the OLS coefficient on goals is the impact of goals on the conditional mean of salary.

[??] = [partial derivative](E(S|G))/[partial derivative]G, (12)

where S is salary and G is goals. Because the expectation in Equation (2) is a constant, it is independent of the distributions quantile. This results in the single coefficient estimate in column 1 of Table 1, rather than an estimate that changes with the quantile, as seen in columns 2 through 6. It also corresponds to the horizontal lines in Figures 2-4.

The QR estimate refers to the expectation of the [theta]th conditional quantile of the dependent variable. The estimate shows how the expected conditional quantile changes

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (13)

Thus, if the error term of the regression equation is iid N(0, [[sigma].sup.2]), Equations (12) and (13) are equivalent statements.

Equation (13) provides the basis for interpreting QR output. The coefficient [[beta].sub.[theta]] tells us the impact of a small change in the explanatory variable on the value of [y.sub.i] that corresponds to the [[theta].sub.th] quantile in the conditional distribution of the residual. Returning to our hockey example, the estimated coefficients for the impact of goals rises from the statistically insignificant [[??].sub.0.1] = 0.0114 to [[??].sub.0.9] = 0.0197. These results mean that scoring one more goal has no impact on the 10th percentile of the conditional distribution of ln(salary) but increases the salary of a player in the 90th percentile of the conditional distribution by two percent. In other words, goals do not add to the salaries of players who are unusually low-paid but have a large impact on the salaries of players who are unusually well-paid.

In contrast, the impact of hits falls from [[??].sub.0.1] = 0.0016 to the statistically insignificant [[??].sub.0.9] = -0.0011. Thus, one more hit increases the salary of a player in the 10th percentile of the conditional of the distribution of ln(salary) by 0.16 percent, while it has no impact on a player in the 90th percentile. Thus an added hit has a greater impact for a player who is unusually low-paid than for an unusually well-paid player.

It is important to emphasize what the quantile coefficient does not mean, as QR results are often misinterpreted in one of two ways. The coefficient does not tell us that scoring one more goal increases the salaries of the top 10% of goal scorers or of players in the top 10 percent of the NHLs salary distribution by 1.97%. Each of these interpretations applies to a regressions run on a subset of the population of NHL players, one containing only the 10% highest scorers and the other containing the 10% earning the highest salaries. Neither interpretation relates to the results of a QR regression.

[FIGURE 5 OMITTED]

Vincent and Eastman (2009) fall into the trap of interpreting quantiles as subsets of the data. This leads them to claim that high quantiles reflect the pay of star players, while low quantiles refer to low-paid "diggers." In fact, the quantile coefficients have a more subtle interpretation. In our example, QR uses the full sample of NHL players to generate the estimates. It plots a line through the scatter of all points so that observed values of player salaries have a given probability of lying above the fitted line at each value of the number of goals a player scores. Thus, the 90th quantile regression is not based on the richest players or the most prolific scorers. Instead, it shows the impact of scoring one more goal on the conditional distribution of salaries. That is, [[??].sub.0.9] shows how scoring another goal affects the salaries of players who are highly-paid relative to other players who have scored a given number of goals. A coefficient of 0.0197 means that a player who falls in the 90th percentile of the error distribution at his current number of goals receives on average 1.97% more in salary if he scores another goal. This result applies equally to low-paid, marginal players who score 2 goals a season and highly-paid stars who score 52 goals in a season, as long as their salary falls in the 90th percentile of salaries among players who score that many goals.

Figure 5 shows the difference between the correct and incorrect interpretation of QR results. The QR regression line fitting the 90th quantile, 0-0.9, passes through the upper tail of the two distributions illustrated. The figure indicates that the 90th quantile can correspond to a relatively low salary. In Figure 5, a player in the 90th percentile of salaries among those who score few goals still earns less than a middling player who scores many goals.

The 90th quantile identifies players who are unusually well-paid conditional on the number of goals they score. It does not refer to players who score many goals or earn large salaries. Thus, a player who is in the middle of the conditional distribution for players who score 52 goals (which would make him one of the league leaders) would be among the highest-paid players in the league. However, as Table I shows, an additional goal brings him only a 1.55% increase in salary. A player who scores only 2 goals but is in the 90th quantile earns much less overall, but he receives an additional 1.97% in salary from each additional goal.

The 90th quantile is not the same as being in the 90th percentile of salaries. That consists of all players whose salaries lie above the horizontal dotted line in Figure 5. To study the impact of an additional goal on players in the 90th percentile, one should run a regression using only points that lie above the horizontal line.

The distinction between unusually highly-paid players and unusually low-paid players provides a way to interpret QR results that should be particularly useful for sports economists. Specifically, the estimates may give us a way to specify players' bargaining power. Players whose pay is "unusually" high could have particularly strong bargaining power, whether because of popularity with fans (as might have been the case for players like Derek Jeter) or because of outside opportunities (as might have been true for Oscar de la Hoya). Our inability to measure bargaining power has long limited our ability to estimate salary equations. QR analysis provides one way out of this box. (4)

Statistical Applications of Quantile Regression

The fact that QR acts on specific parts of the conditional distribution enables researchers to use it to test for or resolve a number of econometric problems. For example, one can use QR estimates at different points in the conditional distribution of the dependent variable to test for heteroskedasticity. The impact of the hypothetical coefficients in our hockey example appears in Figure 6. (For more on the subject, see Angrist and Pischke, 2009.)

[FIGURE 6 OMITTED]

Using the results for hits presented in Table 1, [[??].sub.0.1] = 0.0016 implies that the lower tail of the distribution is pushed upward by 0.16% as the number of goals rises. Ignoring for now the statistical insignificance of the coefficient, the estimate [[??].sub.0.9] = 0.0011 implies that, if a player in the 90% percentile makes one more hit, his salary fall by 0.11%. This pulls the upper tail of the conditional distribution down as the number of hits increases. Both the upper extreme and the lower extreme are pulled away from the median, which rises at a rate of 0.0008 ([[??].sub.0.5] = 0.0008). With the lower tail pulled down from the median and the upper tail pulled up from the median, the result is an reduction in the variance of the error term as the number of goals rises.

If the coefficients had been the same (say, [[??].sub.0.1] = [[??].sub.0.5] = [[??].sub.0.9] = 0.0008), then salaries for both quantiles would have increased by 0.08% with each additional hit. In this case, the lines through all quantiles of the conditional distributions would be parallel. This, in turn, means that the conditional distribution of the dependent variable shifts upward at a constant rate as the independent variable increases, keeping the variance of the conditional distribution constant. The picture of the distributions would look like Figure 1. Thus, one way to test for heteroskedasticity is to test whether [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

QR also provides an alternative way to deal with censored variables. Censoring occurs when all values of the dependent variable that lie below a given threshold, c, are assigned the value c. Censoring is common in professional sports, where players' salaries are often limited by league-wide minima that are specified by collective bargaining agreements. A player whose marginal revenue product lies below this minimum value either does not earn a roster spot or receives the mandated minimum salary. Because many values are pushed up from their "actual" levels, OLS estimation results in biased estimates, and researchers must resort to difficult-to-interpret methods such as Tobit analysis.

QR provides a new way around the problem of censored variables. If, for example, all the ideal assumptions hold, except for the fact that the dependent variable is censored, median regression can provide an unbiased version of the OLS estimates. Recall that the median of a distribution is unaffected by how far above or below the median the distribution lies. As long as the affected observations stay on one side of the median, the median is unaffected. Thus, if the conditional median salaries lie above the league-mandated minimum, the median regression estimates are unaffected by censoring.

The above result holds only as long as censoring does not push the conditional median below the threshold. If it does, then censoring affects median regression as well as OLS. For this reason, QR results are less likely to be affected by league-wide minima for estimates in the upper quantiles and are more likely to be affected for estimates in the lower quantiles.

QR and Counterfactual Distributions

Economists frequently ask what the world would have been like if events had taken a different turn. For example, labor economists wonder what the distribution of pay would be if unions were as powerful today as they had been in the 1970s (Buchinsky, 1994; DiNardo et ah, 1996). Sports economists similarly ask what salary distributions would be today if free agency did not exist (Leeds & Kowalewski, 2001). QR allows us to specify such counterfactual distributions. Similarly, QR provides a generalization of the "Oaxaca decomposition" (Oaxaca, 1973a, 1973b) by showing how discrimination affects the distribution of wages and not just the mean wage.

To appreciate Oaxaca's key insight, let's modify the hockey example to ask whether European hockey players are the victims of discrimination. (5) The most obvious way to test for discrimination is to add a dummy variable to the right-hand side variables. While dummy variables and interaction effects can capture the impact of discrimination if there are only a few explanatory variables, they quickly become unwieldy and difficult to interpret. In contrast, the Oaxaca decomposition can deal with an arbitrary number of explanatory variables. In addition, it allows us to simulate what the mean pay of Europeans (denoted by the subscript E) would be if they were treated like North Americans (N). To perform the decomposition, one runs separate Mincer wage equations for North Americans and Europeans. The predicted average wage for each group is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (14a)

for North Americans and

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (14b)

for Europeans. I define [[??].sub.j] as the predicted average value of the (natural logarithm of) a North American or European players salary and [[bar.X].sub.i] as a vector of the mean values of explanatory variables for each group. One can use the above equations to predict what the mean value of Europeans' pay would be if they were treated the same as North Americans. (I call this simulated pay [[??].sub.N,E].) To simulate one applies the means of the explanatory variables for European players to the parameters that determine the pay of North American players. This simply replaces [[bar.X].sub.N] in Equation (13a) with [[bar.X].sub.E]. The result is the prediction

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], (15)

The difference [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] results from what Oaxaca calls "differences in the data," differences in the abilities of Europeans and North Americans. The difference [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] reflects "differences in the coefficients," differences in how teams reward the performances of Europeans and North Americans. Oaxaca attributes this latter difference to discrimination.

In the sample underlying our example, the average salary of North American players (about $980,000) is about half that of European players ($1.90M). However, as the last two columns of Table 2 show, European players generally outperform North American players. To see how much of this is due to performance, one can simulate how much Europeans would be paid if they were treated like North Americans. To form the counterfactual wage given by Equation (15), apply the coefficients in the first column of Table 2 to the means in the third column. The result is a simulated average pay of about $1.52M. Thus, about half the differential can be explained by performance. (6)

The Oaxaca decomposition was a significant advance over detecting discrimination with a dummy variable. However, the reliance on means leaves unanswered how means might change. As noted above, means can change because the entire distribution moves or because a portion of the distribution is distended. The average wages of Europeans could thus approach those of North Americans for any number of reasons. For example, the pay of highly-paid Europeans might rise relative to those of otherwise equivalent North Americans, while those of most European players may remain far below the pay of their North American counterparts (for an example of how this applies to sex discrimination in the labor market, see Albrecht et al., 2003).

Machado and Mata (2005) simulate the entire counterfactual distribution of wages by combining QR techniques with Oaxaca's simulation. To see how Machado and Mata's extension applies to our example, begin by running a separate regression for the subsample of North Americans. However, instead of running an OLS equation, as in Equations (14) and (15), they run a series of quantile regressions for randomly selected quantiles, [[theta].sub.l] to generate the quantile estimates ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII])

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (16)

For each quantile regression, they randomly select, with replacement, an observation from the subsample of Europeans and apply the value to the coefficients and Equation (16) to form the predicted value

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (14)

This prediction gives one data point in the simulated distribution of (the natural logarithm of) wages that would have been earned by the ith European hockey player ([X.sub.Ei]) if his salary were in quantile [theta] and he had been treated like a North American ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). Once one has forecasted enough data points, one can then graph the actual distribution of the pay of European players and compare it to the distribution of pay that would have existed if Europeans were treated like North Americans. Figure 7 compares the actual distribution of salaries for European free agents in the NHL in 2010 and 2011 with the counterfactual distribution that would have prevailed had the European players been treated like North American players. Both distributions are bimodal with both modes in roughly the same place. However, the counterfactual distribution shows that, if Europeans had been treated like North Americans, the lower mode would have been more prominent, while the higher mode would have been less prominent. By comparing the counterfactual and actual distributions, one can state more clearly the impact of discrimination on the pay of European hockey players.

[FIGURE 7 OMITTED]

Conclusion

Quantile regression is a highly underutilized tool in sports economics. Sports economists frequently encounter settings that call for QR techniques. For example, superstar effects and thin labor markets suggest that bargaining power frequently affects salary negotiations. As a result, there is often strong reason to believe that error terms are not normally distributed. As another example, the presence of a minimum salary (and, in the case of the NBA, a maximal salary) frequently censors the dependent variable in salary regressions. Under such conditions, OLS could yield biased estimates. By allowing exogenous variables to have different impacts on "unusually" highly paid players than they have on players with "unusually" low pay, QR also provides a way of accounting for the relative bargaining power of individual players.

Sports economists also frequently ask questions that are best answered using QR techniques. Discrimination, or such institutional changes as the appearance of free agency, often affects the entire distribution of salaries or of playing time. OLS estimates allow us to see how these factors affect the mean salary. Changes in the mean, however, might mask broader movements of the distribution. QR allows researchers to see how free agency affects all players. Anecdotal evidence, for example, suggests that free agency might have rewarded superstars while placing downward pressure on the pay of marginal players. By allowing researchers to simulate an entire counterfactual distribution, QR techniques allow for a more nuanced analysis of such institutional changes.

Despite the clear value of QR, it remains seldom used for two basic reasons. First, many sports economists do not appreciate the power of QR estimation and how it can improve upon standard techniques. Second, even if they appreciate QR, they might not understand how to use QR or how to interpret their results. This has led many economists to misinterpret the results of QR estimation. Failing to interpret QR results properly will cause leading journals to reject papers that ask important questions and use the appropriate techniques.

The goal of this essay has been to help sports economists to overcome these three stumbling blocks. Properly used, QR can expand the variety of questions that sports economists ask. It can also improve upon the quality of answers that they provide.

Michael A. Leeds is a professor in the Department of Economics. His research interests include the economics of baseball in Japan and gender differences in economic contests.

References

Albrecht, J., Bjorklund, A., & Vroman, S. (2003). Is there a glass ceiling in Sweden? Journal of Labor Economics, 21,145-176.

Angrist, J. D., & Pischke, J.-S. (2009). Mostly harmless econometrics: An empricist's companion. Princeton, NJ: Princeton University Press.

Buchinsky, M. (1994). Changes in the US wage structure, 1963-1987: Application of quantile regression. Econometrica, 62, 405-458.

DiNardo, J., Fortin, N. M., & Lemieux, T. (1996). Labor market institutions and the distribution of wages, 1973-1992: A wemiparametric approach. Econometrica, 64,1001-1044.

Kahane, L. (2010). Returns to skill in professional golf: A quantile regression approach. International Journal of Sport Finance, 5,167-180

Kahane, L., Longley, N., & Simmons, R. (2013). The effects of co-worker heterogeneity on firm-level output: Assessing the impacts of cultural and language diversity in the National Hockey League. Review of Economics and Statistics, 95, 302-314.

Koenker, R., 8c Bassett, G. (1978). Regression Quantiles. Econometrica, 46(1), 33-50.

Koenker, R., 8c Hallock, K. F. (2001). Quantile regression. Journal of Economic Perspectives, 15(4), 143-156.

Kowalewski, S. (2010). Salary determination in the National Football League. Unpublished doctoral dissertation, Temple University.

Leeds, M., 8c Kowalewski, S. (2001). Winner-take-all in the NFL: The effect of the salary cap and free agency on the composition of skill position players. Journal of Sports Economics, 2,244-256.

Machado, J., 8c Mata, J. (2005). Counterfactual decomposition of changes in wage distributions using quantile regression. Journal of Applied Econometrics, 20, 445-465.

Oaxaca, R. (1973a). Male-female wage differentials in urban labor markets. International Economic Review, 14, 693-709.

Oaxaca, R. (1973b). Sex discrimination in wages. In O. Ashenfelter 8c A. Rees (Eds.), Discrimination in labor markets (pp. 124-151). Princeton, NJ: Princeton University Press.

Vincent, C., 8c Eastman, B. (2009). Determinants of pay in the NHL: A quantile regression approach. Journal of Sports Economics, 10, 256-277.

Von Allmen, P., Leeds, M. A., 8c Malakorn, J. (2014). Victims or beneficiaries? Wage premia and national origin in the National Hockey League. Unpublished manuscript.

Endnotes

(1) For more on the underlying data, see von Allmen, Leeds, and Malakorn (2014). Full regression results are available upon request.

(2) The Plus/minus statistic awards a player a point for each goal scored by his team when he is on the ice and deducts a point for each goal scored by an opponent when he is on the ice. A hit occurs when a player initiates contact with an opposing player that causes the player to lose control of the puck.

(3) A graphing feature for QR can be added to STATA with the command: ssc install grqreg.

(4) To my knowledge, Kowalewski (2010) was the first to apply this interpretation of QR.

(5) Oaxaca's initial estimation measured pay differentials between men and women. While there is a sizable literature that tests for discrimination against French-Canadians, discrimination against Europeans has received scant attention. One exception is Kahane, Longley, and Simmons (2013).

(6) Von Allmen et al. (2014) attribute the remaining differential to differences in bargaining power rather than to discrimination

Author's Note

An early version of this paper was presented for the North American Association of Sports Economists at the 2013 Western Economic Association International meetings in Seattle. I thank Gary Solon for introducing me to the theory behind quantile regression and providing me with notes on the subject that have become worn with use. I also thank Eva Marikova Leeds and an anonymous referee for their many helpful comments and suggestions on this manuscript.

Michael A. Leeds

Temple University

Table 1: Selected OLS and QR Results

Coefficient      OLS          10th         25th
                            Quantile     Quantile

Defenseman    0.1585 ***     0.0557       0.0204
                (3.06)       (0.85)       (0.41)
Goals         0.0195 ***     0.0134       0.0078
                (4.20)       (1.31)       (1.05)
Plus/Minus    0.0112 ***   0.0131 ***   0.0143 ***
                (4.72)       (2.72)       (4.41)
Hits            0.0007     0.0016 **    0.0014 **
                (1.24)       (2.03)       (2.02)

Coefficient      50th         75th         90th
               Quantile     Quantile     Quantile

Defenseman     0.1033 *    0.1648 ***     0.1131
                (1.90)       (3.53)       (0.74)
Goals         0.0155 ***   0.0178 ***   0.0197 ***
                (3.05)       (3.74)       (2.72)
Plus/Minus    0.0146 ***   0.0109 ***   0.0096 **
                (4.48)       (3.83)       (2.12)
Hits            0.0008       0.0003      -0.0011
                (1.34)       (0.45)       (1.18)

Other variables were: Player weight, team revenue,
time on ice, time on ice while team was short-handed,
time on ice during a power play, career games, career
games squared, draft position, and the number of
previous All Star Game appearances. The regression
also contained dummy variables indicating whether the
player was European and whether the player was a free
agent in 2010 or 2011.

Table 2: Coefficients and means for the Oaxaca Decomposition

Variable                   OLS for North       Means       Means for
                           Americans           for North   Europeans
                                               Americans

Salary                     --                  $978,494    $1.90 MM
Defenseman                 0.1106              0.3118      0.4074
Weight                     0.0028              204.6379    208.0864
Team Revenue (MM)          0.0006              101.5337    104.1355
Total Minutes on Ice       0.0001              703.3261    1109.617
Power Play Minutes         0.0036              53.2782     114.5679
Short-handed Minutes       0.0016              56.6667     80.3704
Career Games               0.0011              250.4029    422.7531
Career Games2              1.12[(10).sup.-6]   1,330,336   299,688
Goals in prior year        0.0159              5.8897      9.3827
Plus/Minus in prior year   0.0120              -1.1439     1.7778
Hits in prior year         0.0005              61.4628     71.1111
Draft Position             -0.0103             4.7434      3.7778
All-Star Appearances       0.1860              0.0791      0.6420
Free Agent in 2011         0.01050             0.5204      0.6049
Constant                   12.3578             --          --