Using the Pareto distribution to improve estimates of topcoded earnings.
Armour, Philip ; Burkhauser, Richard V. ; Larrimore, Jeff 等
Using the Pareto distribution to improve estimates of topcoded earnings.
I. INTRODUCTION
The public-use March Current Population Survey (CPS) is the primary
source of data for tracking levels and trends in U.S. labor earnings and
for understanding the factors which influence these trends. Literatures
that have used CPS earnings data are vast and include research on the
changing returns to education or college major in the labor market
(Autor 2014; Gemici and Wiswall 2014), earnings differentials across
occupations (Glied, Ma, and Pearlstein 2015), earnings volatility (Hardy
and Ziliak 2014), elasticity of taxable income (Burns and Ziliak
Forthcoming), and more generally the causes of changes in wage
structures and earnings inequality (see, e.g., Autor, Katz, and Kearney
2008; Card and DiNardo 2002; Goldin and Katz 2007; Juhn, Murphy, and
Pierce 1993. Acemoglu 2002; Altonji and Blank 1999; and Katz and Autor
1999 provide reviews of this literature). However, these literatures
that use the public-use CPS data have been hampered by an attenuated
view of the right tail of the labor earnings distribution due to the
topcoding of high earnings in the public-use CPS data. (1) Importantly,
the failure to properly correct for inconsistent topcoding has been
found to result in biased estimates for a range of outcomes including
returns to education (Hubbard 2011), the size of the earnings gaps
between gender and race groups (Burkhauser and Larrimore 2009a), the
relative economic resources of individuals with and without disabilities
(Burkhauser and Larrimore 2009b), and population-level income inequality
(Burkhauser et al. 2012).
To correct for topcoding biases, public-use CPS-based researchers
have generally pursued one of four paths: (1) ignoring the topcoding
problem; (2) making an ad hoc adjustment to topcoded earnings values;
(3) using a Pareto distribution to estimate earnings at the top of the
distribution; or (4) using cell means or rank-proximity swapped data
that is based on the still-censored internal CPS data. For example, a
common ad hoc technique, based on estimates from Pareto imputations of
top earnings, is to replace topcoded earnings with a multiple of the
topcode threshold so all individuals with topcoded earnings in a year
are assumed to have earnings at 1.3, 1.4, or 1.5 times the topcode
threshold (Autor, Katz, and Kearney 2008; Juhn, Murphy, and Pierce 1993;
Katz and Murphy 1992; Lemieux 2006). However, such an approach may
misstate top earnings if the wrong multiple is used or if the
appropriate multiple changes over time. Similarly, researchers using a
Pareto imputation of top earnings may misstate those earnings if they
are unable to obtain a reasonable fit for the Pareto distribution when
using available public-use data.
Making use of internal March CPS files with their much higher
censoring levels, we show that previous ad hoc estimates and Pareto
estimations of top earnings based on public-use data understate mean
earnings at the top of the earnings distribution and hence also
understate earnings inequality. However, as this internal data are also
censored, albeit at higher levels, any results based purely on the
internal data will also fail to capture a portion of the income at the
very top of the distribution and therefore will also understate
inequality. Hence, while cell means based on internal data, such as
those produced by Larrimore et al. (2008), and the rank-proximity
swapped data series which the census began providing in 2010 each allow
researchers to replicate results from the internal census data, findings
using these series will be subject to the same limitations and
understatement of true top incomes as the internal values.
Recognizing the limitations of the existing options to correct for
topcoding, we proceed by using a continuous maximum likelihood estimator
along with internal CPS data to produce a series of more accurate
estimates of top earnings in the CPS data. Our estimates start with
actual top earnings from the internal CPS combined with a Pareto
estimate using these data for internally censored observations. With
this hybrid approach, we create an enhanced cell-mean series that allows
researchers who have access only to the public-use data to more
accurately capture top earnings levels and trends, including estimated
incomes for those above the internal censoring threshold.
To show the value of our new measure, we use it together with the
public-use CPS to replicate the level and trend in labor earnings
inequality from 1963 to 2004 that Kopczuk, Saez, and Song (2010) find
using social security (SSA) administrative records for the subsample of
U.S. workers who paid social security taxes in the commerce and industry
sector of the labor market. Having done so, we then extend our analysis
to 2013 and consider all workers. While earnings inequality levels are
higher when considering all workers rather than just commerce and
industry workers, its growth is more modest.
II. DATA
The March CPS survey contains a comprehensive set of questions on
sources of household earnings, including labor earnings which are the
focus of this study. (2) These data are collected annually by the Census
Bureau, and the CPS is one of the primary sources of data for research
on income and earnings trends in the United States (see, e.g., Autor,
Katz, and Kearney 2008; Burkhauser et al. 2012; Card and DiNardo 2002;
Feng, Burkhauser, and Butler 2006; Gottschalk and Danziger 2005).
A known limitation of the March CPS data is that incomes are
topcoded in the public-use data and censored at higher thresholds in the
internal data. These topcoding and censoring thresholds change on an ad
hoc basis. Figure 1 provides an overview of these changes for annual
wage earnings from 1967 to 1986 and for primary labor earnings, which
are primarily wages, from 1987 to 2013. (3) Internal topcoding
thresholds, with the exception of 1984, have always been higher than
those in the public-use data but became substantially so after 1984. As
a result, while the number of individuals who are topcoded in the
internal data has risen somewhat since then, the number topcoded in the
public-use data has risen much more. Figure 1 shows this growth, as
measured by percent of individuals with earnings above the public
topcode (right axis), is erratic, rising when the Census Bureau holds
topcodes nominally constant and quickly falling when they raise the
topcodes.
We use both the public-use and internal CPS data to illustrate the
impact of different correction techniques for topcoded earnings on
earnings trends. Our preferred technique is derived from the internal
CPS data, but researchers without access to the internal data can use it
with the public version of the CPS data.
III. ESTIMATING TOP EARNINGS
Most researchers measuring long-term trends in earnings with
public-use CPS data use ad hoc techniques to correct for topcoding, such
as imputing topcoded earnings as a fixed multiple above the topcode
point, with most researchers using a multiple between 1.3 and 1.5
(Autor, Katz, and Kearney 2008; Juhn, Murphy, and Pierce 1993; Lemieux
2006). Implicit in this approach, regardless of the multiplier, is an
assumption that the multiple is constant across years and across changes
in the threshold level.
The multiples in this approach are partially derived from attempts
to fit top earnings to a Pareto distribution. In particular, following
the long-standing assumption that top earnings can be described by the
Pareto distribution, numerous researchers impute the top of the earnings
distribution based on those fit by a Pareto distribution (Bishop, Chiou,
and Formby 1994; Fichtenbaum and Shahidi 1988; Heathcote, Perri, and
Violante 2010; Hubbard 2011; Mishel, Bernstein, and Shierholz 2013;
Piketty and Saez 2003; Schmitt 2003). (4)
The Pareto distribution is defined by the cumulative distribution
function (CDF):
(1) P(X < x) = 1 - [([x.sub.c]/x).sup.[alpha]]
where x is a given value of earnings (weakly) larger than
[x.sub.c], [x.sub.c] is the scale or cutoff parameter, and a is the
shape parameter of the distribution. Because the Pareto distribution is
scale-free, the mean above any threshold y is given as:
(2) M(y) = ([alpha]/[alpha] - 1)y.
This provides a simple link to the fixed multiple concept. By
setting y as the topcode threshold, M(y) is the Pareto-imputed mean
income above the threshold.
To use the Pareto distribution to estimate top earnings, one must
first estimate the appropriate shape parameter. The most common approach
is to assume that the distribution is Pareto above some lower cutoff
point ([x.sub.c]) and choose a second cutoff point above that
point--typically the topcode threshold itself ([x.sub.t]) (Parker and
Fenwick 1983; Quandt 1966; Saez 2000; Shyrock and Siegel 1975). The
Pareto shape parameter is then:
(3) [alpha] = ln (C/T)/ln([x.sub.t]/[x.sub.c])
where C represents the number of individuals with earnings above
the lower cutoff and T represents the number of individuals with
earnings above the topcode threshold. Juhn, Murphy, and Pierce (1993)
report that their choice of cutoff points in the public-use CPS did not
substantially impact their results. However, Schmitt (2003) using more
recent public-use CPS data found that the choice of cutoff point could
matter greatly, depending on the frequency of topcoding in the empirical
distribution.
As we will illustrate below, this approach fails to provide
reasonable estimates of top earnings in more recent public-use CPS data.
This is partially because the earnings distribution may not be Pareto
far enough below the public-use topcode threshold (if at all) to obtain
reasonable estimates of the scale parameter and because using only two
distribution points may poorly measure the parameter.
We address the first of these concerns by estimating the shape of
the Pareto distribution using the internal data with its less
restrictive censoring through 2007 and using the rank-proximity swap
data (which offers the earnings values from the internal data but
assigned to random individuals) for years since 2008.5 This allows us to
reduce the portion of the distribution over which earnings must fit the
Pareto distribution--l%-2% rather than the 10% or 20% with the
public-use CPS data (Mishel, Bernstein, and Shierholz 2013, for example,
assume that 20% of the earnings distribution fits the Pareto
distribution).
To address the second concern, we adapt an alternate, but rarely
used, approach to estimating the Pareto scale parameter--applying a
maximum likelihood formula to the empirical distribution. We modify the
widely used maximum likelihood Hill estimator (Hill 1975) such that
earners under the topcode contribute an observation indicating their
reported earnings to the likelihood function, while earners above the
topcode contribute an observation indicating that they earn at least the
topcoded amount. This differential contribution utilizes all available
information to estimate the Pareto parameter, and it has been used to
fit other distributions, such as the generalized beta of the second
kind, to topcoded earnings data (Jenkins et al. 2011). While Polivka
(2000) uses this modified Hill estimator to analyze categorical weekly
earnings data, to our knowledge it has not been applied to continuous
annual earnings data. Under this approach, the continuous, closed-form
solution for estimating the Pareto parameter is:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
where M is the number of individuals with earnings between the
lower cutoff and censoring point, T is the number of individuals with
earnings at or above the topcode or censoring point, and [x.sub.i] is
the earnings of an individual. Using this formula allows individuals
between the cutoff and censoring points to contribute to the CDF with
their actual earnings, while those at the censoring point contribute to
the CDF with the information that they have earnings at least as high as
the censoring point.
To improve the estimate of top earnings further, rather than
imputing all values the Census Bureau censors in the public-use data, we
use actual internal data when available for estimating top earnings and
only use the Pareto imputation for internally censored observations
where the true value is unknown. In order to facilitate the use of these
better estimates of top incomes, which combine actual internal data with
imputations of censored observations, we create an enhanced cell-mean
series consisting of the mean earnings of publicly topcoded individuals
based on our combined set of internal data and Pareto estimates of
censored observations. Researchers can use this series, available in
Table Al, in conjunction with the public-use March CPS to obtain the
best available estimate of top earnings based on these publicly
available data.
In Figure 2, we compare the relative accuracy of the standard
proportional and our maximum likelihood Pareto imputation approaches,
along with the fixed multiple approach from Lemieux (2006) and Katz and
Murphy (1992) in capturing the top part of the earnings distribution
censored in the public-use CPS. Because the Pareto cutoff point matters
for both approaches, when using the public-use data, we follow the
approach of Mishel, Bernstein, and Shierholz (2013) and assume that the
distribution is Pareto above the 80th percentile of the distribution.
(6) Because we are using internal CPS data for the estimation using our
maximum likelihood technique, we can use a much higher cutoff, and
assume that the distribution is Pareto above the 99th percentile. (7)
To compare the accuracy of the various series, we compare the mean
annual earnings of the top 5% of the distribution for each with those in
the Larrimore et al. (2008) cell-mean series based on the internal CPS
data. The Larrimore et al. (2008) cell-mean series uses the internal CPS
data to provide the mean value for each source of income for any
individual whose income from that source is topcoded. Because, as seen
in Figure 1, fewer than 5% of individuals have topcoded earnings in any
given year (and even fewer are censored internally), the mean earnings
of the top 5% from the Larrimore et al. (2008) series will perfectly
match the mean earnings of the top 5% from the actual internal data and
the Census Bureau's rank-proximity swap series. But it is not
designed to correct for internal censoring, and it treats each source of
income at or above the internal censoring point as if it were equal to
the censoring point. As a result, the Larrimore et al. (2008) series,
the rank-proximity swap data, and the official Census Bureau statistics
are known to represent underestimates of the true top earnings of the
population.
While the top earnings using the Pareto imputation based on
public-use data and those using the fixed multiple series each slightly
exceed the top earnings from the Larrimore et al. (2008) cell-mean
series in early years, neither does so after 1993 when changes in Census
Bureau collection procedures greatly improved the reporting of earnings
by top earners (see Jones and Weinberg 2000 and Ryscavage 1995 for
details on this change). Because the cell-mean series is a lower bound
for top earnings, it is clear that these previous efforts to capture the
top part of the earnings distribution based solely on public-use CPS
data understate their level at the upper tail since at least 1993.
In contrast to these earlier techniques, our maximum likelihood
Pareto estimation of internally censored observations, in conjunction
with the internal data when available, produces mean earnings of the top
5% which exceed those of Larrimore et al. (2008). In years before 1993,
the impact of this adjustment is small. However, in more recent years,
the addition of an imputation of censored earnings increases the average
earnings of the top 5% by as much as 10% over the values from Larrimore
et al. (2008). (8)
In comparing the series, it may appear counterintuitive that
imputing censored earnings using a Pareto distribution increased top
earnings by more after 1993 relative to the Larrimore et al. (2008)
cell-mean series, when the Census Bureau increased their censoring
threshold, than it did prior to that year. However, in addition to
increasing the censoring threshold, the Census Bureau implemented other
survey design changes in that year--such as electronic data
collection--that fundamentally changed the shape of the upper tail of
the observed income distribution. This can be observed in Table 1, which
shows the percent of respondents in the data with earnings in
high-income ranges (not adjusted for inflation). In each year from 1990
through 1992, between 0.18% and 0.24% of respondents reported earnings
of $ 199,999 or more--and 0.06% of respondents reported income at or
above the $299,999 internal censoring threshold for those years. In
1993, the fraction of respondents reporting an income of at least
$199,999 nearly doubled to 0.43% of respondents. Similarly, the fraction
with incomes of at least $299,999 tripled to 0.19% of respondents.
If the top of the income distribution does follow the Pareto
distribution, this increase in the number of individuals with earnings
near the censoring threshold suggests that there is also a longer right
tail of earnings above the threshold. Thus, the improvements in data
collection in 1993 increased information about both the observed and
unobserved portions of the distribution. The results in Figure 2, where
we use the internal data with a Pareto imputation for censored values,
demonstrate that the break in the data series in the raw internal data
(Jones and Weinberg 2000 and Ryscavage 1995) may, in fact, underestimate
the improvements in capturing top incomes occurring in that year.
Recognizing that this trend break is the result of new collection
procedures and not changes to topcoding, we correct for the break using
the standard approach from Atkinson, Piketty, and Saez (2011) and
Burkhauser et al. (2012) and upwardly adjust inequality measures from
all years before 1993, thus assuming no inequality change in the
1992-1993 trend break year.
IV. COMPARISON TO SSA RECORDS
Kopczuk, Saez, and Song (2010) provide the first research using
administrative records data to analyze long-run earnings inequality.
Their study uses SSA earnings data from 1937 to 2004 to examine earnings
inequality of commerce and industry workers between the ages of 25 and
60 with wages over $2,575 in 2004, indexed by nominal average wage
growth for earlier years. (9) This minimum earnings restriction
represents one-fourth of the earnings an individual working full time
for a year (2,000 hours) at the federal minimum wage would receive each
year. This study is the current gold standard of annual earnings
inequality trends and hence an excellent benchmark for testing the
validity of our CPS-based results. If results from Kopczuk, Saez, and
Song (2010) can be replicated in the CPS data, then it validates the use
of CPS data for analyzing earnings trends. To this end, we limit our
data sample to commerce and industry workers and impose the same age and
minimum earnings restriction so that we can compare Gini coefficient
results across the two datasets.
In Figure 3, we compare the earnings Gini for this subsample of
workers from Kopczuk, Saez, and Song (2010) to our Pareto-adjusted
income series as well as to estimates using the Larrimore et al. (2008)
series, which were previously the best estimates of top earnings in the
CPS data. (10) For each series, these Gini coefficients are estimated
directly from the data after imposing the specified topcode correction.
While we do not have access to internal CPS data before 1967, to extend
the comparison we go back to 1963 using public-use CPS data. (11) Over
these earlier years from 1963 to 1966, topcoding was so rare that no
additional topcode corrections are required. (12)
Between 1967 and 1994, the inequality trend between the CPS data
with our Pareto correction and the Kopczuk, Saez, and Song (2010) series
using social security records is remarkably similar. In 1995, top
earnings in the CPS series falls, resulting in a level of inequality
that is approximately two-Gini points below the Kopczuk, Saez, and Song
series. However, after that divergence the inequality trend between 1995
and 2004 continues to grow at a similar pace across the two series.
Despite this divergence, our new series using the Pareto correction more
closely matches the estimates from Kopczuk, Saez, and Song (2010) than
does the Larrimore et al. (2008) cell-mean series.
This provides evidence that our correction improves the ability of
the public-use CPS data to measure accurately and analyze U.S. earnings
levels and trends.
V. IMPACT OF AGE AND EARNINGS RESTRICTIONS
After largely matching the earnings inequality trends from Kopczuk,
Saez, and Song (2010), we now focus on the extent to which limiting the
sample to commerce and industry workers and imposing age and earnings
restrictions influences observed inequality trends. In Figure 4, while
still excluding self-employment earnings, we compare the Gini
coefficient for labor earnings that we get using our enhanced cell-mean
series for all workers with any earnings to the Gini coefficient for
labor earnings we got using the sample restrictions imposed by Kopczuk,
Saez, and Song in Figure 3. In the restricted sample, earnings
inequality increases by 16.7%-0.378 to 0.441--from 1963 to 2013. When
looking at workers in all industries, the level of inequality was
similar, but the growth slowed from 16.7% to 11.1%. When we remove the
restriction of considering only workers aged 25-60, and consider workers
of all ages with earnings above the $2,575 minimum earnings restriction,
the level of inequality increases (in 2013 the Gini coefficient for our
all-age group is 1.0% higher than the initial commerce and industry
workers sample, 0.445 compared to 0.441), but its growth since 1963 is
even slower. Without the age restriction, earnings inequality in 2013
was 5.2% above than in 1963.
Finally, we remove the $2,575 minimum earnings restriction and
include all workers with earnings of at least $1 in the sample.
Inequality, in 2013, in this fuller sample of workers is 10.2% higher
than it is in the initial commerce and industry workers sample. But
rather than increasing since 1963, earnings inequality is 2% lower in
2013 than it was in 1963. In contrast to the levels and trends in
earnings inequality, Kopczuk, Saez, and Song (2010) and we observe in
their subsample of workers, in our full sample of workers we find the
level of inequality is higher but its growth is less.
VI. CONCLUSION
Inconsistent censoring in the public-use March CPS limits its
usefulness in measuring labor earnings levels and trends. We find that
previous approaches for imputing topcoded earnings systematically
understate top earnings. In particular, both the fixed multiple approach
and Pareto estimates based solely on public-use CPS data understate the
level of top earnings in the internal CPS data--which is also subject to
censoring and thus represents a lower bound. Our hybrid approach of
internal data and Pareto imputations provides better estimates of top
earnings in the CPS data. Using our hybrid approach, we create an
enhanced cell-mean series for use with the public-use data that will
allow researchers to more closely approximate the actual level of top
earnings in CPS data. Using public-use CPS data together with our
enhanced cell-mean series and mimicking Kopczuk, Saez, and Song (2010)
sample restrictions, we observe labor earnings inequality levels that
are more consistent with those Kopczuk, Saez, and Song (2010) report for
the subsample of U.S. workers in commerce and industry captured by
administrative social security records. As a result, we believe that our
series represents the best available measure of estimating top earnings
in the CPS data and demonstrates that the CPS data can provide
reasonable estimates of U.S. labor earnings trends.
doi: 10.1111/ecin.12299
Online Early publication November 23, 2015
ABBREVIATIONS
CDF: Cumulative Distribution Function
CPS: Current Population Survey
ORG: Outgoing Rotation Group
SSA: Social Security Administration
APPENDIX
TABLE A1
Enhanced Cell Means for Wage and Salary Earnings
(1967-1986) and for Primary Earnings (1987-2013)
Mean Wage
and Salary
Income Earnings above
Year Public Topcode
1967 68,718.88
1968 67,672.02
1969 70,602.84
1970 72,338.20
1971 69,964.24
1972 72,067.52
1973 72,276.09
1974 69,694.40
1975 68,484.37
1976 69,622.58
1977 70,377.94
1978 72,473.37
1979 77,877.91
1980 76,067.81
1981 116,517.60
1982 108,677.47
1983 110,527.71
1984 152,540.90
1985 147,726.89
1986 151,170.99
Mean Primary
Income Earnings above
Year Public Topcode
1987 155,167.85
1988 153,957.31
1989 161,368.84
1990 161.071.86
1991 149,446.92
1992 157,823.42
1993 240,177.96
1994 240,310.44
1995 362,741.41
1996 374,699.39
1997 398,231.55
1998 387,378.22
1999 347,774.63
2000 419,886.77
2001 390,670.08
2002 470,904.67
2003 445,997.33
2004 477,597.05
2005 474,259.17
2006 538,416.97
2007 467,984.50
2008 464,928.55
2009 501,245.62
2010 522,694.47
2011 572,896.45
2012 626,923.39
2013 558,843.72
Notes: Figures based on authors' calculation using internal
CPS data and maximum likelihood Pareto fit at the 99th
percentile of the earnings distribution. "Income Year" records
income in the year prior to the year of the March CPS survey.
Enhanced cell means were not calculated for years before
1967 due to the lack of topcoding on earnings, when one individual
or fewer was topcoded each year. Enhanced cell means
for years since 2008 use the same hybrid procedure as used in
earlier years but base the Pareto imputation off of the rank-proximity
swap data from the Census Bureau rather than the
raw internal data. Because the rank-proximity swap data provide
the values in the internal data (albeit not matched to the
right people), we observe that this procedure allows for an
uninterrupted break in our enhanced cell-mean series up to
the most recent years of data.
Source: Authors' calculation using internal March CPS
Data.
REFERENCES
Acemoglu, D. "Technical Change, Inequality, and the Labor
Market." Journal of Economic Literature, 40(1), 2002, 7-72.
Acemoglu, D., and D. H. Autor. "Skills, Tasks and
Technologies: Implications for Employment and Earnings," in
Handbook of Labor Economics, Vol. 4B. edited by O. Ashenfelter and D.
Card. Amsterdam, The Netherlands: Elsevier, 2010, 1043-172.
Altonji, J. G., and R. M. Blank. "Race and Gender in the Labor
Market," in Handbook of Labor Economics, Vol. 3C, edited by O.
Ashenfelter and D. Card. Amsterdam, The Netherlands: Elsevier, 1999,
3143-259.
Atkinson, A. B., T. Piketty, and E. Saez. 'Top Incomes in the
Long Run of History." Journal of Economic Literature, 49(1),
2011,3-71.
Autor, D. H. "Skills, Education and the Rise of Earnings
Inequality among the 'Other 99 Percent'." Science,
344(6186), 2014, 843-51.
Autor, D. H., L. F. Katz, and M. S. Kearney. "Trends in U.S.
Wage Inequality: Revising the Revisionists." Review of Economics
and Statistics, 90(2), 2008, 300-23.
Bandourian, R., J. B. McDonald, and R. S. Turley. "A
Comparison of Parametric Models of Income Distribution Across Countries
and Over Time." Estadistica, 55(164-165), 2003, 135-52.
Bishop, J. A., J. R. Chiou, and J. P. Formby. "Truncation Bias
and the Ordinal Evaluation of Income Inequality." Journal of
Business and Economic Statistics, 12, 1994, 123-27.
Bordley, R. F., J. B. McDonald, and A. Mantrala. "Something
New, Something Old: Parametric Models for the Size Distribution of
Income." Journal of Income Distribution, 6(1), 1996,91-103.
Burkhauser, R. V., S. Feng, S. Jenkins, and J. Larrimore.
"Trends in United States Income Inequality Using the Internal March
Current Population Survey: The Importance of Controlling for
Censoring." Journal of Economic Inequality, 9(3), 2011, 393-415.
--. "Recent Trends in Top Income Shares in the United States:
Reconciling Estimates from March CPS and IRS Tax Return Data."
Review of Economics and Statistics, 94(2), 2012, 371-88.
Burkhauser, R. V., and J. Larrimore. "Using Internal CPS Data
to Reevaluate Trends in Labor-Earnings Gaps." Monthly Labor Review,
132(8), 2009a, 3-18.
--. "Trends in the Relative Household Income of Working-Age
Men with Work Limitations: Correcting the Record Using Internal Current
Population Survey Data." Journal of Disability Policy Studies,
20(3), 2009b, 162-69.
Burns, S. K., and J. P. Ziliak. Forthcoming. "Identifying the
Elasticity of Taxable Income." The Economic Journal, doi:
10.1111/ecoj. 12299.
Card, D., and J. E. DiNardo. "Skill-Biased Technological
Change and Rising Wage Inequality: Some Problems and Puzzles."
Journal of Labor Economics, 20(4), 2002, 733-82.
Feng, S., R. V. Burkhauser, and J. S. Butler. "Levels and
Long-Term Trends in Earnings Inequality: Overcoming Current Population
Survey Censoring Problems Using the GB2 Distribution." Journal of
Business and Economic Statistics, 24(1), 2006, 57-62.
Fichtenbaum, R., and H. Shahidi. "Truncation Bias and the
Measurement of Income Inequality." Journal of Business and Economic
Statistics, 6, 1988, 335-37.
Gemici, A., and M. Wiswall. "Evolution of Gender Differences
in Post-Secondary Human Capital Investments: College Majors."
International Economic Review, 55(1), 2014, 23-56.
Glied, S. A., S. Ma, and I. Pearlstein. "Understanding Pay
Differentials among Health Professionals, Nonprofessionals, and Their
Counterparts in Other Sectors." Health Affairs, 34(6), 2015,
929-35.
Goldin, C., and L. F. Katz. "Long-Run Changes in the Wage
Structure." Brookings Papers on Economic Activity, 2, 2007,135-67.
Gottschalk, P., and S. Danziger. "Inequality of Wage Rates,
Earnings and Family Income in the United States, 1975-2002." Review
of Income and Wealth, 51(2), 2005,231-54.
Hardy, B., and J. P. Ziliak. "Decomposing Trends in Income
Volatility: The 'Wild Ride' at the Top and Bottom."
Economic Inquiry, 52(1), 2014, 459-76.
Heathcote, J., F. Perri, and G. L. Violante. "Unequal We
Stand: An Empirical Analysis of Economic Inequality in the United
States: 1967-2006." Review of Economic Dynamics, 13(1), 2010,
15-51.
Hill, B. M. "A Simple General Approach to Inference about the
Tail of a Distribution." The Annals of Statistics, 3(5), 1975,
1163-74.
Hubbard, W. H. J. "The Phantom Gender Difference in the
College Wage Premium." Journal of Human Resources, 46(3),
2011,568-86.
Jenkins, S. P., R. V. Burkhauser, S. Feng, and J. Larrimore.
"Measuring Inequality Using Censored Data: A Multiple-Imputation
Approach to Estimation and Inference." Journal of the Royal
Statistical Society, Series A, 174(1),2011,63-81.
Jones, A. F., and D. H. Weinberg. The Changing Shape of the
Nation's Income Distribution. Washington, DC: U.S. Census Bureau,
2000.
Juhn, C., K. M. Murphy, and B. Pierce. "Wage Inequality and
the Rise in Returns to Skill." Journal of Political Economy,
101(3), 1993, 410-42.
Katz, L. F., and D. H. Autor. "Changes in the Wage Structure
and Earnings Inequality," in Handbook of Labor Economics, Vol. 3A,
edited by O. Ashenfelter and D. Card. Amsterdam, The Netherlands:
Elsevier, 1999, 1463-555.
Katz, L. F., and K. M. Murphy. "Changes in Relative Wages,
1963-87: Supply and Demand Factors." Quarterly Journal of
Economics, 107, 1992, 35-78.
Kopczuk, W., E. Saez, and J. Song. "Earnings Inequality and
Mobility in the United States: Evidence from Social Security Data since
1937." Quarterly Journal of Economics, 125,2010.91-128.
Larrimore, J., R. V. Burkhauser, S. Feng, and L. Zayatz.
"Consistent Cell Means for Topcoded Incomes in the Public Use March
CPS (1976-2007)." Journal of Economic and Social Measurement,
33(2-3), 2008, 89-128.
Lemieux, T. "Increased Residual Wage Inequality: Composition
Effects, Noisy Data, or Rising Demand for Skill." American Economic
Review, 96(2), 2006, 461-98.
McDonald, J. B. "Some Generalized Functions for the Size
Distribution of Income." Econometrica, 52(3), 1984, 647-63.
Mishel, L., J. Bernstein, and H. Shierholz. The State of Working
America. 12th ed. Ithaca, NY: Cornell University Press, 2013.
Parker, R., and R. Fenwick. "The Pareto Curve and Its Utility
for Open-Ended Income Distributions in Survey Research." Social
Forces, 61, 1983, 872-85.
Piketty, T., and E. Saez. "Income Inequality in the United
States, 1913-1998." Quarterly Journal of Economics, 118(1), 2003,
1-39.
Polivka, A. "Using Earnings Data from the Monthly Current
Population Survey." Unpublished Manuscript, 2000.
Quandt, R. "Old and New Methods of Estimation and the Pareto
Distribution." Metrika, 10, 1966,55-82.
Ryscavage, P. "A Surge in Growing Income Inequality?"
Monthly Labor Review, 118(8), 1995, 51-61.
Saez, E. "Using Elasticities to Derive Optimal Income Tax
Rates." Review of Economic Studies, 68, 2000,205-29.
Schmitt, J. Creating a Consistent Hourly Wage Series from the
Current Population Survey's Outgoing Rotation Group, 1979-2002.
Washington, DC: Center for Economic and Policy Research, 2003.
Shyrock, H., and H. Siegel. The Methods and Materials of
Demography. Washington, DC: U.S. Government Printing Office, 1975.
PHILIP ARMOUR, RICHARD V. BURKHAUSER and JEFF LARRIMORE *
* The research in this paper was conducted, in part, while the
authors were Special Sworn Status researchers of the U.S. Census Bureau
at the New York Census Research Data Center at Cornell University. This
paper has been screened to ensure that no confidential data are
disclosed. All opinions are those of the authors and should not be
attributed to the Census Bureau, the Federal Reserve Board, the Federal
Reserve Banks, or their staff. We thank Stephen Jenkins and participants
at the Census Bureau's Center for Economic Studies Research
Conference for their helpful comments on earlier drafts of this paper.
Support for this research from the National Science Foundation (award
nos. SES-0427889, SES-0322902, and SES-0339191) and the Employment
Policy and Measurement Rehabilitation Research and Training Center at
the University of New Hampshire, which is funded by the National
Institute for Disability and Rehabilitation Research (NIDRR, grant no.
H133B100030) are cordially acknowledged.
Armour: Associate Economist, RAND Corporation, Santa Monica, CA
90407. Phone 310-393-0411, Fax 310-3934818, E-mail parmour@rand.org
Burkhauser: Sarah Gibson Blanding Professor of Policy Analysis,
Cornell University, Ithaca, NY 14853; Research Fellow, University of
Melbourne, Parkville, Australia. Phone 607-255-2097, Fax 607-255-4071,
E-mail rvbl@comell.edu
Larrimore: Economist, Federal Reserve Board, Washington, DC 20551.
Phone 202-973-7315, Fax 202-452-3849, Email jeff.larrimore@frb.gov
(1.) Some wage inequality research focuses on the wage questions in
the May outgoing rotation group (ORG) sample of the CPS, which is also
subject to topcoding. Similar techniques to those used in the March CPS
data have been employed in the May ORG sample to correct for topcoding,
including replacing topcoded earnings with a fixed multiple of the
topcode threshold (see, e.g., Acemoglu and Autor 2010).
(2.) The March CPS asks about income in the previous year, so the
income year is always 1 year prior to the survey year. All references to
years in this paper refer to the income year.
(3.) Because of Census Bureau changes in their aggregation
techniques, we use wage and salary earnings for years prior to income
year 1987 and all primary labor earnings thereafter. Because the vast
majority of primary earnings are from wages and salaries, this break
does not appear to have a noticeable impact on our results.
(4.) A smaller literature has used more flexible functional forms,
such as the four-parameter generalized beta of the second kind (GB2)
distribution or its special cases such as the Singh-Maddala (Burr type
12) distribution and Dagum (Burr type 3) distribution as alternatives to
the Pareto (see, e.g., Bandourian, McDonald, and Turley 2003; Bordley,
McDonald, and Mantrala 1996; Burkhauser et al. 2011; Feng, Burkhauser,
and Butler 2006; and McDonald 1984). However, while such approaches
offer additional flexibility in fitting the distribution, they have been
criticized as being less easily interpretable than the single-parameter
Pareto distribution which has gained wide usage including in the top
income share literature using tax return data (see Atkinson, Piketty,
and Saez 2011 for a detailed discussion of the use of the Pareto
distribution in this literature).
(5.) Our access to internal CPS data extends through 2007. However,
when the Census Bureau began providing rank-proximity swapped incomes in
the public-use data for topcoded incomes in 2010, they did so for
earlier years retroactively. This approach is intended to yield the same
distribution for each earnings source as in the internal data, but
randomly swaps earnings values among topcoded individuals to protect
confidentiality. Because this data contain the same values as the
internal data, but randomly assigned to individuals, it provides the
same base of information as the internal data for our Pareto procedure.
Hence, using these data with the same Pareto imputation technique for
observations above the internal censoring point allows us to extend our
series to include years after 2007. To ensure consistency, we used both
the internal data and the rank-proximity swap data as a base for the
Pareto estimation for overlapping years when we have access to both
files. When doing so, we found consistent results. Results of this
consistency check are available upon request from the authors.
(6.) We also used cutoffs at the 85th, 90th, and 95th percentiles.
In general, increasing the income cutoff for the lower bound of the
estimation lowered the estimated mean earnings of the top 5%.
(7.) We also used cutoffs at the 95th, 97th, and 98th percentiles
and produced largely consistent results for the mean earnings of the top
5%.
(8.) As a further test of the validity of the Pareto at this income
level, we compare the Pareto scale parameter for the 95th, 97th, 98th,
and 99th percentiles. The Pareto parameters are generally stable, with
the average difference between the maximum and minimum scale parameter
in this range being just 16% apart. Pareto scale parameters are
available upon request from the authors.
(9.) Commerce and industry workers are all nonfarm,
nonself-employment wage and salary workers not working in agriculture,
forestry, fishing, hospitals, educational services, social services,
religious organizations, private households, and public administration.
(10.) While the cell-mean procedures used both by Larrimore et al.
(2008) and in our enhanced cell-mean series removes the variance of
topcoded earnings, Larrimore et al. (2008) demonstrated that the
fraction of respondents who are topcoded is small enough that obtaining
the level of their income is sufficient to correct for the trend in the
Gini coefficient even without their variance. Larrimore et al. (2008)
observed that the trend using the Gini coefficient for household income
using the cell-mean series nearly perfectly matches the internal CPS
data in both levels and trends. Because the Census Bureau's
rank-proximity swap data also are based purely on the internal data, the
Gini coefficient using that series would similarly match the results
using the Larrimore et al. (2008) series presented here.
(11.) CPS data from 1961 are also available; however, the survey
format changed between 1961 and 1963 which make the data incomparable
between these years. Hence, we start our series in 1963, which is the
earliest year for which we can create a consistent CPS series.
(12.) No more than one worker was topcoded on wage and salary
earnings each year over this period.
TABLE 1
Percent of Earners in High Earnings Brackets
(in Nominal Dollars) in Years Before and After
Census's 1993 Changes to Data Collection
Procedures
Before Changes After Changes
to Data to Data
Collection Collection
Procedures Procedures
High earnings 1990 1991 1992 1993 1994 1995
bracket
$199,999-$299,998 0.16 0.12 0.18 0.24 0.26 0.24
$299,999 or above 0.06 0.06 0.06 0.19 0.20 0.21
Note: Public-use CPS data with rank-proximity swap data.
COPYRIGHT 2016 Western Economic Association International
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2016 Gale, Cengage Learning. All rights reserved.