Statistical inferences for testing marginal rank and (generalized) Lorenz dominances.
Zheng, Buhong
1. Introduction
Rank dominance, Lorenz dominance, and generalized Lorenz dominance
are the three most commonly used tools in ranking income distributions;
rank dominance and generalized Lorenz dominance yield social welfare
rankings of income distributions, while Lorenz dominance provides
inequality rankings. In their important contributions, Kolm (1969) and
Atkinson (1970) establish that Lorenz dominance implies and is implied
by all inequality measures satisfying the Pigou-Dalton principle of
transfers; Saposnik (1981) proves that rank dominance is equivalent to
welfare dominance by all increasing welfare functions; Shorrocks (1983)
shows that generalized Lorenz dominance is equivalent to welfare
dominance by all increasing and concave welfare functions.
The empirical applications of these dominance methods have been
greatly enhanced by the important contributions of Beach and Davidson
(1983), Sendler (1979), and Gail and Gastwirth (1978), who provide the
Lorenz curve with (asymptotically) distribution-free statistical
inference procedures. Beach and Davidson's results also lead
directly to the statistical inference of the generalized Lorenz curve,
which was formally stated by Bishop, Chakraborti, and Thistle (1989).
Although the asymptotic distribution of sample quantiles were well-known
in the statistical literature (e.g., Cramer 1946), Bishop, Chow, and
Formby (1991) were the first to formally test rank dominance.
The applicability of these inference procedures, however, is
limited by the requirement that the samples drawn from different
distributions must be independent.(1) Although this requirement is not
very restrictive in many cross-sectional or cross-time studies, it
certainly cannot be fulfilled in addressing marginal changes in income
quantiles and in Lorenz and generalized Lorenz curves. Marginal changes
in, say, a Lorenz curve refer to the changes in the Lorenz curve of the
same distribution after an exogenous shock or an endogenous change has
occurred to the distribution. The dominance methods applied to the
comparison of the distributions before and after the marginal change are
referred to as marginal dominances. An example of interest is the impact
of wives' participation in the labor force on family income
inequality. It is commonly believed that wives' participation in
the labor force reduced family income inequality during the 1950s and
1960s in the U.S. but has increased inequality in recent years. Many
recent empirical studies, however, have revealed that working wives
still reduce family income inequality (Cancian, Danziger, and Gottschalk
[1993] and Treas [1987] provide surveys on these studies). All of these
empirical works employ samples to estimate marginal changes, but none of
them applies statistical inference tests. It is also worth noting that
none of them uses Lorenz curve dominance.
The present paper extends the existing statistical inferences of
rank dominance and (generalized) Lorenz dominance to testing marginal
dominances. It advances upon Beach and Davidson (1983) by deriving the
full (asymptotic) joint variance - covariance structure for marginal
changes in the ordinates of Lorenz and generalized Lorenz curves. It
also provides inference for testing marginal changes in income
quantiles. In proving the major results, I adopt a different yet more
tractable approach (the Bahadur representation) than that used in either
Sendler (1979) or Beach and Davidson (1983). As a consequence, the
covariance structure can be derived in a straightforward manner and the
property that the structure can be consistently estimated can be seen
immediately.
The rest of the paper is organized as follows. The next section
defines marginal rank and (generalized) Lorenz dominances. Section 3
provides large sample properties of the estimates of the marginal
changes. The full (asymptotic) variance-covariance structures are also
provided. Section 4 illustrates the inference procedures by examining
the issue of working wives and income distribution in the U.S., Section
5 shows that the developed inferences can be modified and applied to
more general cases where samples are partially dependent.
2. Marginal Changes and Marginal Dominances
Consider a joint distribution between two variables x [element of]
[0, [infinity]) and y [element of] [0, [infinity]) with a continuous
cumulative distribution function (c.d.f.) F(x, y). Without loss of
generality, we may interpret x as family income before wives'
participation in the labor force and y as family income after
wives' participation in the labor force. The marginal distributions of x and y are denoted as H(x) and K(y), that is, H(x) [equivalent to]
F(x, [infinity]) and K(y) [equivalent to] F([infinity], y). For
convenience, we further assume that functions H and K are strictly
monotonic and the first two moments of x and y exist and are finite.
Thus, for a given population share p, which is the same for both x and
y, there exist unique and finite income quantiles [Xi](p) and [Zeta](p)
such that H([Xi](p)) = p and K([Zeta](p)) = p.
The Lorenz and generalized Lorenz curve ordinates of H(x) and K(y)
corresponding to p are usually defined as
[Phi](p) [equivalent to] 1/[[Mu].sub.x] [integral of] xdH(x)
between limits [Xi](p) and 0 and [Psi](p) [equivalent to] 1/[[Mu].sub.y]
[integral of] ydK(y) between limits [Zeta](p) and 0 (2.1)
and
[Theta](p) [equivalent to] [integral of] xdH(x) between limits
[Xi](p) and 0 = [[Mu].sub.x][Phi](p) and [Theta](p) [equivalent to]
[integral of] ydK(y) between limits [Zeta](p) and 0 =
[[Mu].sub.y][Psi](p), (2.2)
where [[Mu].sub.x] and [[Mu].sub.y] are the mean incomes of x and
y, respectively.
With these notations, we can formally define marginal changes and
marginal dominances.
DEFINITION 2.1. Given a joint distribution F (x, y) and a
population share p, the marginal change in the quantile is defined as
the difference between [Xi](p) and [Zeta](p), that is,
[[Delta].sup.Q](p) = [Zeta](p) [Xi](p); the marginal change in the
Lorenz ordinate is [[Delta].sup.L](p) = [Psi](p) - [Phi](p); and the
marginal change in the generalized Lorenz ordinate is [[Delta].sup.G](p)
= [Theta](p) - [Theta](p). Marginal rank dominance holds if
[[Delta].sup.Q](p) does not change sign for all p [element of] [0, 1]
and is nonzero for some p [element of] [0, 1]; marginal Lorenz dominance
holds if [[Delta].sup.L](p) does not change sign for all p [element of]
[0, 1] and is nonzero for some p [element of] (0, 1); marginal
generalized Lorenz dominance holds if [[Delta].sup.G](p) does not change
sign for all p [element of] [0, 1] and is nonzero for some p [element
of] [0, 1].
In empirical studies, population quantiles and Lorenz and
generalized Lorenz curves are usually characterized by a set of
ordinates corresponding to the abscissae {[p.sub.i] [where] i = 1, 2, .
. ., K} and [p.sub.K+1] = 1. Assuming 0 [less than] [p.sub.1] [less
than] [p.sub.2] [less than] . . . [less than] [p.sub.K] [less than] 1,
we have two sets of (K + 1) population quantiles {[[Xi].sub.i]} and
{[[Zeta].sub.i]}, two sets of K population Lorenz curve ordinates
{[[Phi].sub.i]} and {[[Psi].sub.i]}, and two sets of (K + 1) population
generalized Lorenz curve ordinates {[[Theta].sub.i]} and
{[[Theta].sub.i]}. For each i, i = 1, 2, . . ., K, these ordinates
(quantiles) are related as shown in Equation 2.2; also [[Phi].sub.K+ 1]
= [[Mu].sub.x] and [[Psi].sub.K+1] = [[Mu].sub.y].
Assume a paired sample of size n, ([x.sub.1], [y.sub.1]),
([x.sub.2], [y.sub.2]), . . ., ([x.sub.n], [y.sub.n]), is independently
and identically drawn from population with c.d.f. F(x, y). Then for each
[p.sub.i], consistent sample estimates of [[Xi].sub.i] and
[[Zeta].sub.i] are [x.sub.([r.sub.i])] and [y.sub.([r.sub.i])] (Settling
1980, Theorem 2.3.1), where [x.sub.(l)] and [y.sub.(l)] are the lth
order statistics of {[x.sub.i]} and {[y.sub.i]} and [r.sub.i] =
[n[p.sub.i]]. The sample estimators of generalized Lorenz and Lorenz
ordinates are
[Mathematical Expression Omitted], (2.3)
and
[Mathematical Expression Omitted]. (2.4)
Thus, marginal changes in quantiles [Mathematical Expression
Omitted], Lorenz ordinates [Mathematical Expression Omitted], and
generalized Lorenz ordinates [Mathematical Expression Omitted] can be
obtained.
3. Asymptotic Distributions of Marginal Changes
This section provides asymptotic distributions for the following
three vectors of marginal changes: [Mathematical Expression Omitted],
[Mathematical Expression Omitted], and [Mathematical Expression
Omitted], [Mathematical Expression Omitted]. Our derivation is different
from those used in Gail and Gastwirth (1978), Sendler (1979), and Beach
and Davidson (1983). The new method involves the use of the Bahadur
representation (Bahadur 1966; Ghosh 1971), which makes the derivation
more tractable and more accessible to economists. In this paper,
however, I only report the main results; the detailed proofs can be
found in Zheng (1996).
The Bahadur representation establishes the relationship between
population quantiles and sample quantiles. By introducing the indicator
variable
[Mathematical Expression Omitted], (3.1)
the elegant Bahadur representation can be stated as follows (e.g.,
David 1981, p. 255):
[Mathematical Expression Omitted], (3.2)
where h(x) is the density function of H(x) and [o.sub.p] denotes
"small in probability."
Now first consider the marginal changes in income quantiles,
[Mathematical Expression Omitted]. Clearly, for each i, we have
[Mathematical Expression Omitted], (3.3)
where k(y) is the density function of K(y).
Using Equation 3.3 and through direct calculation, one can easily
establish the following result.
THEOREM 1. Under the conditions that H and K are strictly monotonic
and differentiable and that the first two moments of x and y exist and
are finite, the (K + 1)-random vector of marginal changes in sample
quantiles, [Mathematical Expression Omitted], is asymptotically normal
in that [Mathematical Expression Omitted] has a (K + 1)-variate normal
distribution with mean zero and covariance matrix [Lambda] =
{[[Delta].sub.ij]} with
[[Delta].sub.ij] = [p.sub.i](1 - [p.sub.j]) /
h([[Xi].sub.i])h([[Xi].sub.j]) + [p.sub.i](1 - [p.sub.j]) /
k([[Zeta].sub.i])k([[Zeta].sub.j]) - F([[Xi].sub.i], [[Zeta].sub.j]) -
[p.sub.i][p.sub.j] / h([[Xi].sub.i])k([[Zeta].sub.j]) - F([[Xi].sub.j],
[[Zeta].sub.i]) - [p.sub.i][p.sub.j] / h([[Xi].sub.j])k([[Zeta].sub.i])
(3.4)
for i [less than or equal to] j. In particular, the variance of
[Mathematical Expression Omitted] is
[Mathematical Expression Omitted]. (3.4a)
To establish the asymptotic distributions for [Mathematical
Expression Omitted] and [Mathematical Expression Omitted], first note
that they are both functions of [Mathematical Expression Omitted],
[Mathematical Expression Omitted], [Mathematical Expression Omitted] and
[Mathematical Expression Omitted]. Hence, it is necessary to derive the
joint asymptotic distribution of [Mathematical Expression Omitted],
which is a consistent estimator of [Beta] = ([[Theta].sub.1],
[[Theta].sub.2], . . ., [[Theta].sub.K], [[Theta].sub.K+1],
[[Theta].sub.1], [[Theta].sub.2], . . ., [[Theta].sub.K],
[[Theta].sub.K+1])[prime]. Zheng (1996) shows that [Mathematical
Expression Omitted] can be expressed as
[Mathematical Expression Omitted], (3.5)
where [u.sub.n](x) [similar to] [v.sub.n](x) denotes that
[u.sub.n](x) - [v.sub.n](x) converges in probability to zero. It follows
from Slutsky's theorem (Theorem 1.5.4 of Serfling 1980) that both
sides of Equation 3.5 have the same limiting distribution. Replacing
([x.sub.([r.sub.i])] - [[Xi].sub.i]) with the Bahadur representation, we
have
[Mathematical Expression Omitted]. (3.6)
Similarly,
[Mathematical Expression Omitted]. (3.7)
Thus, the asymptomatic distribution of [Mathematical Expression
Omitted] can be derived by considering Equations 3.6 and 3.7 jointly for
i = 1, 2, . . ., K + 1.
The following theorem also generalizes Theorem 1 of Beach and
Davidson (1983).
THEOREM 2. Under the conditions of Theorem 1, the 2(K + 1)-random
vector of generalized Lorenz curve ordinates, [Mathematical Expression
Omitted], is asymptotically normal in that [Mathematical Expression
Omitted] has a 2(K + 1)-variate normal distribution with mean zero and
covariance matrix
[Mathematical Expression Omitted], (3.8)
where
[[Omega].sub.ij] = [integral of] (x - [[Xi].sub.i])(x -
[[Xi].sub.j]) dH(x) between limits [[Xi].sub.i] and 0 - [integral of] (x
- [[Xi].sub.i]) dH(x) between limits [[Xi].sub.i] and 0 [integral of] (x
- [[Xi].sub.j]) dH(x) between limits [[Xi].sub.j] and 0 for i [less than
or equal to] j, (3.9)
[v.sub.ij] = [integral of] (y - ([[Zeta].sub.i])(y -
[[Zeta].sub.j]) dK(y) between limits [[Zeta].sub.i] and 0 - [integral
of] (y - [[Zeta].sub.i]) dK(y) between limits [[Zeta].sub.i] and 0
[integral of] (y - [[Zeta].sub.j]) dK(y) between limits [[Zeta].sub.j]
and 0 for i [less than or equal to] j, (3.10)
and(2)
[Mathematical Expression Omitted]. (3.11)
Note that the covariance terms [[Omega].sub.ij] and [v.sub.ij]
given in Theorem 2 are expressed in a different form than that given in
Beach and Davidson (1983). However, it is straightforward to verify that
[[Omega].sub.ij] is equivalent to equation 8 of Beach and Davidson by
utilizing [[Theta].sub.i] = [integral of] x dH(x) between limits
[[Xi].sub.i] and 0 = [p.sub.i][[Gamma].sub.i] and [Mathematical
Expression Omitted], where [[Gamma].sub.i] is the conditional mean and
[Mathematical Expression Omitted] is the conditional variance of income
less than or equal to [[Xi].sub.i].
Based on Beach and Davidson (1983), Bishop, Formby and Thistle
(1989) construct the inference procedure for testing generalized Lorenz
curves with independent samples. Using the results in Theorem 2, we can
derive the asymptotic distribution of the sample marginal changes in
generalized Lorenz curve ordinates and, hence, extend Bishop, Formby,
and Thistle's inference procedures to the cases of paired samples.
THEOREM 3. Under the conditions of Theorem 1, the vector of sample
marginal changes in generalized Lorenz curve ordinates, [Mathematical
Expression Omitted], is asymptotically normal in that [n.sup.1/2]
[[[Delta].sup.G] - [[Delta].sup.G]] tends to a (K + 1)-variate normal
distribution with mean zero and covariance matrix [Sigma] =
{[[Epsilon].sub.ij]} with
[[Epsilon].sub.ij] = [[Omega].sub.ij] + [v.sub.ij] -
([[Tau].sub.ij] + [[Tau].sub.ji]), i, j = 1, 2, . . ., K + 1. (3.12)
Thus, the asymptotic variance of [Mathematical Expression Omitted]
is [[Epsilon].sub.ii] = [[Omega].sub.ii] + [v.sub.ii] - 2[[Tau].sub.ii].
Theorem 2 can be further used to derive the asymptotic distribution
of the sample marginal changes in the Lorenz curve ordinates. The
following result comes directly from the use of the well-known delta
method (e.g., Rao 1965, p. 321) on limiting distributions of
differentiable functions of random variables.
THEOREM 4. Under the conditions of Theorem 1, the vector of sample
marginal changes in Lorenz curve ordinates, [Mathematical Expression
Omitted], is asymptotically normal in that [n.sup.1/2] [[[Delta].sup.L]
- [[Delta].sup.L]] tends to a K-variate normal distribution with mean
zero and covariance matrix [Pi] = J[Xi]J[prime] with
[Mathematical Expression Omitted]. (3.13)
Hence, the variance of [Mathematical Expression Omitted] is
[[Pi].sub.ii] = [[Phi].sub.ii] + [[Phi].sub.ii] -
2/[[Mu].sub.x][[Mu].sub.y] [[[Tau].sub.ii] -
[[Phi].sub.i][[Tau].sub.(K+1)i] - [[Psi].sub.i][[Tau].sub.i(K+1)] +
[[Phi].sub.i][[Psi].sub.i][[Epsilon].sub.xy]], (3.14)
where
[Mathematical Expression Omitted] (3.15)
and
[Mathematical Expression Omitted] (3.16)
are the asymptotic variances of [Mathematical Expression Omitted]
and [Mathematical Expression Omitted], respectively. Here [Mathematical
Expression Omitted], [Mathematical Expression Omitted], and
[[Epsilon].sub.xy] denote the variances of x and y and the covariance
between x and y.
Theorem 4 can in turn be used to establish statistical inferences
for testing marginal changes in population quantile shares and quantile
means (Zheng 1996).(3) Having derived various asymptotic distributions
of marginal changes, we can perform conventional statistical inferences
to test marginal rank dominances. The variance-covariance structures
that we derived enable us to construct consistent estimators in a
straightforward manner.
To estimate the variances of marginal changes in income quantiles,
one needs to estimate F([[Xi].sub.i], [[Zeta].sub.j]), h([[Xi].sub.i])
and h([[Zeta].sub.j]). Clearly, F([[Xi].sub.i], [[Zeta].sub.j]) can be
consistently estimated as
1/n [summation of] I{([x.sub.l], [y.sub.l]) [less than or equal to]
([[Xi].sub.i], [[Zeta].sub.j])} where l = 1 to n,
where ([x.sub.l], [y.sub.l]) [less than or equal to] ([[Xi].sub.i],
[[Zeta].sub.j]) stands for the condition that the observation
([x.sub.l], [y.sub.l]) must satisfy [x.sub.l] [less than or equal to]
[[Xi].sub.i] and [y.sub.l] [less than or equal to] [[Zeta].sub.j]
simultaneously. In the literature, there exist several nonparametric
approaches to density estimation. Silverman (1986) provides a
comprehensive survey on various methods of estimation. In this paper, I
adopt the kernel method because the consistency of the estimation has
been well established in the literature. Procedurally, the kernel
estimator of h([Xi]) is given by
[Mathematical Expression Omitted], (3.20)
where K is a kernel function and g is a "window width"
that depends on the sample size n. Under certain conditions on K and g,
[Mathematical Expression Omitted] is a consistent estimator of h([Xi]).
The kernel function and window width function used in this paper are
(Silverman 1986)
[Mathematical Expression Omitted], otherwise, (3.21)
and
g = 0.9A[n.sup.-1/5], (3.22)
where A = min (standard deviation, interquartile range/1.34).
The estimation of the covariance matrix of [Xi] of Theorem 2 is
straightforward. It is easy to verify that the following estimators are
all consistent and asymptotically unbiased:
[Mathematical Expression Omitted], (3.23)
[Mathematical Expression Omitted], (3.24)
and
[Mathematical Expression Omitted]. (3.25)
where [x.sub.([r.sub.i])] and [y.sub.([r.sub.j])] are sample
quantiles corresponding to [p.sub.i] and [p.sub.j].
In carrying out the inference tests, we follow the suggestion of
Bishop, Formby, and Thistle (1989) and use the union-intersection
approach. Specifically, this approach considers a joint multiple
comparison of K marginal changes and compares the test statistics with
the critical student maximum modulus (SMM) value. If all marginal
changes are nonpositive (nonnegative) and some are significantly
different from zero, then marginal dominance follows; if some changes
are significantly positive and some are significantly negative, then we
have marginal crossing; if all changes are insignificantly different
from zero, then the two distributions, before and after the event, are
regarded as the same. This method has been successfully applied in
addressing distributional changes (see, e.g., Bishop, Formby, and
Thistle 1992).
4. An Illustration: Working Wives and U.S. Family Income
Distribution
Over the decades since World War II, the participation of women,
particularly married women, in the labor force has risen rapidly. In
1951, about 23% of married women were in the paid labor force (Danziger
1980). By 1989, the participation rate had increased to about 70%
(Bishop, Chiou, and Formby 1997). As wives' earnings have become a
more important source of family income over time, considerable concern
and interest have developed regarding the impact of the increasing
number of working wives on family income distribution.
Mincer (1974) was probably the first to provide a rigorous attempt
to address this issue and suggested that working wives improve income
distribution. The fact that the most rapid increases in female labor
force participation rates have occurred among women from high-income
families, however, led Thurow (1975) to speculate that working wives are
now "becoming a source of family inequality." Although most
empirical studies in the literature (e.g., earlier works by Bergmann et
al. [1980], Danziger [1980], and Horvath [1980] and recent ones by
Blackburn and Bloom [1987], Treas [1987], Cancian, Danziger, and
Gottschalk [1993], Bishop, Chiou, and Formby [1997], and Cancian and
Reed [1998]) do not confirm Thurow's speculation, many people still
believe that increasing wives' participation in the labor force is
enlarging the income gap among U.S. families.
As an application of the statistical inferences developed in this
paper, I reinvestigate the issue of working wives and U.S. family income
distribution. In contrast to most previous studies, I do not rely on any
summary measures of inequality such as the Gini coefficient; I use
Lorenz curves as a measure of inequality. I also calculate the standard
errors for the estimates of Lorenz curve ordinates and statistically
test the marginal impact of working wives on family income distribution.
The data I use are four subsamples of the 1% Public Use Microdata Sample
(PUMS) of the 1990 Census of Population and Housing. Following Cancian,
Danziger, and Gottschalk (1993), I limit the sample to those
observations with positive total family incomes excluding wives'
earnings and where both husband and wife are younger than 65. I also
follow most previous studies by focusing my investigation on married
families and define variable x as the total family income less
wife's earnings and variable y as the total family income
(including wife's earnings) as defined by the Census Bureau.
Table 1 reports marginal changes in the family Lorenz curve due to
wives' participation in the labor force. It also examines the
marginal impacts of working wives in two subgroups: whites and
nonwhites. Columns (1) and (2) are the estimated family income Lorenz
curve ordinates before and after wives' participation in the labor
force. Columns (3), (4), and (5) provide the estimates of marginal
changes in Lorenz curves of the whole population, whites, [TABULAR DATA
FOR TABLE 1 OMITTED] and nonwhites. The sample marginal changes of the
whole population and whites are all positive and significant, and the
sample marginal changes of nonwhites are significant at the first and
fifth through eighth deciles (the SMM critical value is 2.515 at the 10%
level). This implies that working wives have significantly reduced
family inequality of both whites and nonwhites as well as the whole
population. Column (6) is the difference between the marginal changes of
whites and nonwhites. An inspection of this column indicates that the
(absolute) marginal impacts of working wives on the Lorenz curves of
whites and nonwhites are statistically different at the second, third,
and fourth deciles.
Since working wives with positive incomes enhance social welfare
for all symmetric and increasing welfare functions, one can expect that
the after-participation income distribution always rank dominates, hence
generalized Lorenz dominates, the before-participation distribution.
Thus, we cannot use this example to test marginal rank and generalized
Lorenz dominances by asking whether or not working wives have improved
the social welfare of the before-participation distribution. We could,
nevertheless, test rank and generalized Lorenz dominances by comparing
the marginal impacts of whites with those of nonwhites, that is, whether
working wives have more impact on the income quantiles (generalized
Lorenz curve) of whites than nonwhites. The results of these comparisons
are summarized in Table 2 and are graphically illustrated in Figures 1
and 2.
Columns (1), (2), and (3) of Table 2 report the comparison on
quantiles. Columns (1) and (2) are marginal changes in income quantiles
of whites and nonwhites; column (3) reflects the difference between the
two marginal changes. For example, at the first decile, working wives
increase family income by $6000 in the white subgroup and by $3643 in
the nonwhite subgroup, [TABULAR DATA FOR TABLE 2 OMITTED] and the
difference ($2357) is significant at the 10% level. The marginal changes
for the last decile (p = 1.0) are not computed because of the top-coding
problem. By inspection, we can see the following pattern: Working wives
have more (absolute) impact on whites than nonwhites at lower deciles
and have less impact at higher deciles. Thus, the comparison is
inconclusive. However, the comparison of generalized Lorenz curves
reveals that, cumulatively, working wives have more impact on the family
income of whites than on that of nonwhites (the critical SMM value is
2.560 for the 10 joint comparisons). The average marginal impact of
whites ($11,333) is not significantly different from that of nonwhites
($11,586).(4)
5. An Extension
Rather than concluding the paper with a usual summary, this section
provides an important extension of the inferences developed above. I
will show that the results can be modified and applied to more general
cases where samples are partially dependent.
Although measuring marginal changes and testing marginal dominances
are important topics in income distribution studies, we encounter them
far more often with partially dependent samples than with completely
dependent (paired) samples. Many cross-time income samples (e.g.,
Current Population Survey [CPS] and Panel Study of Income Dynamics
[PSID] are neither completely independent nor completely dependent by
design; some (but not all) individuals may be interviewed in several
consecutive years. Until now, the problem of sample dependence has been
either completely ignored or avoided by using the independent portions
of the samples or choosing data from several years apart. Although
researchers generally agree that it is very important to take the nature
of dependence into account in computing standard errors, an appropriate
method of dealing with this problem is lacking. In what follows, I make
an attempt to provide such a method, though one may not be able to
completely solve the problem. I first illustrate the basic approach by
testing mean incomes from two samples of different sizes where parts of
the samples are overlapping; I then provide consistent estimators for
testing marginal dominances.(5)
Assume two samples of sizes m and n, {[x.sub.l]} and {[y.sub.s]},
are drawn from two adjacent years' income distributions with means
[[Mu].sub.x] and [[Mu].sub.y] and variances [Mathematical Expression
Omitted] and [Mathematical Expression Omitted]. Further assume that the
first q (q [less than or equal to] min {m, n}) observations of the two
samples are overlapping, that is, the first q individuals are present in
both samples and stand in the same order, and {[x.sub.q+1], . . .,
[x.sub.m]} is independent of {[y.sub.s]} and {[y.sub.q+1], . . .,
[y.sub.n]) is independent of {[x.sub.l]}. Generally speaking,
{[x.sub.q+1], . . ., [x.sub.m]} is not independent of {[x.sub.1], . . .,
[x.sub.q]} and {[y.sub.q+1], . . ., [y.sub.n]} is not independent of
{[y.sub.1], . . ., [y.sub.q]}. Thus, [Mathematical Expression Omitted]
may not equal [Mathematical Expression Omitted] and [Mathematical
Expression Omitted] may not equal [Mathematical Expression Omitted]. In
the absence of the precise information on the nature of this dependence,
however, it may not be unreasonable to assume that [Mathematical
Expression Omitted] and [Mathematical Expression Omitted]. Since
[Mathematical Expression Omitted], we only need to consider the
covariance term [Mathematical Expression Omitted].
Denoting
[Mathematical Expression Omitted],[Mathematical Expression
Omitted], [Mathematical Expression Omitted], and [Mathematical
Expression Omitted],
we can write [Mathematical Expression Omitted] as
[Mathematical Expression Omitted]. (4.1)
Since [[Rho].sub.x] is independent of {[y.sub.s]} (hence
[[Alpha].sub.y] and [[Rho].sub.y]) and [[Rho].sub.y] is independent of
{[x.sub.l]} (hence [[Alpha].sub.x] and [[Rho].sub.x]) by assumption, we
have cov([[Alpha].sub.y], [[Rho].sub.x]) = cov([[Rho].sub.y],
[[Alpha].sub.x]) = cov([[Rho].sub.y],[[Rho].sub.x]) = 0 and thus
[Mathematical Expression Omitted]. (4.2)
Noting that
[Mathematical Expression Omitted] and [Mathematical Expression
Omitted],
we further have
[Mathematical Expression Omitted]. (4.3)
Clearly,
cov(1/q [summation of] [y.sub.l] where l = 1 to q, 1/q [summation
of] [x.sub.s] where s = 1 to q)
can be directly calculated. Thus, [Mathematical Expression Omitted]
can be computed in the following two steps: First, calculate the
covariance of sample means of the overlapped samples as if they were the
complete samples; second, multiply the covariance calculated in the
first step by the percentages of the overlapped portions of the two
samples (q/m and q/n).
In the same manner, we can compute the covariance terms in
Equations 3.4 and 3.11 and, consequently, we can calculate the standard
errors for various sample marginal changes. Specifically, under the same
assumptions about the data structures of {[x.sub.l]} and {[y.sub.s]} as
described above, the estimate of the covariance term in Equation 3.4a is
[Mathematical Expression Omitted], (4.4)
where
[Mathematical Expression Omitted] and [Mathematical Expression
Omitted].
The covariance term [[Tau].sub.ij] of Equation 3.11 can be
estimated as
[Mathematical Expression Omitted], (4.5)
where [Mathematical Expression Omitted], [Mathematical Expression
Omitted], and [Mathematical Expression Omitted] and [Mathematical
Expression Omitted] are the lth-order statistics of {[x.sub.1], . . .,
[x.sub.q]} and {[y.sub.1], . . ., [y.sub.q]}, respectively.
I thank two anonymous referees and Professor Kathy Hayes, the
editor, for many helpful comments and suggestions. I also thank Brian J.
Cushing for providing me with ready-to-use 1990 census sample data. All
remaining errors are, of course, my own responsibility.
1 Bishop, Chow, and Formby (1994) provide inference procedures for
Lorenz and generalized Lorenz curves and their associated concentration
curves. Furthermore, while this paper was under review for publication,
it came to my attention that Davidson and Duclos (1997) also
independently developed a procedure that is closely related to the topic
of this paper. Yet the motivation and the proof of this paper are
different from those of Davidson and Duclos; I focus on testing marginal
changes and use the Bahadur representation in derivation.
2 Consistent with Beach and Davidson (1983), Equation 3.11 can also
be expressed as
[Mathematical Expression Omitted],
where
[p.sub.ij] = F([[Xi].sub.i], [[Zeta].sub.j]), [Mathematical
Expression Omitted], [Mathematical Expression Omitted], [Mathematical
Expression Omitted],
[Mathematical Expression Omitted], and [Mathematical Expression
Omitted].
I thank a referee for suggesting this expression.
3 To preserve space, these results are not listed here but are
available from the author on request.
4 Here, unlike in Bishop, Chiou, and Formby (1997), I do not
consider rerankings that can be analyzed by comparing the Lorenz curve
of before-participation distribution and the concentration curve of
after-participation distribution.
5 Zheng and Cushing (1997) provide statistical inferences for
testing summary inequality measures (the Theil measures and the Gini
coefficient) with dependent samples.
References
Atkinson, Anthony B. 1970. On the measurement of inequality.
Journal of Economic Theory 2:244-63.
Bahadur, R. 1966. A note on quantiles in large samples. Annals of
Mathematical Statistics 37:577-80.
Beach, Charles M., and Russell Davidson. 1983. Distribution-free
statistical inference with Lorenz curves and income shares. Review of
Economic Studies 50:723-35.
Bergmann, Barbara R., Judith R. Devine, Diane Reedy, Lewis Sage,
and Christine Wise. 1980. The effect of wives' labor force
participation on inequality in the distribution of family income.
Journal of Human Resources 15:452-6.
Bishop, John A., S. Chakraborti, and Paul D. Thistle. 1989.
Asymptotically distribution free statistical inference for generalized
Lorenz curves. Review of Economics and Statistics 71:725-7.
Bishop, John A., J. R. Chiou, and John P. Formby. 1997. Working
wives and earnings inequality. Working Paper, University of Alabama.
Bishop, John A., K. Victor Chow, and John P. Formby. 1991. A
stochastic dominance analysis of growth, recessions and the U.S. income
distribution, 1967-1986. Southern Economic Journal 57:936-46.
Bishop, John A., K. Victor Chow, and John P. Formby. 1994. Testing
for marginal changes in income distribution with Lorenz and
concentration curves. International Economic Review 35:479-88.
Bishop, John A., John P. Formby, and Paul D. Thistle. 1989.
Statistical inference, income distributions and social welfare. In
Research in economic inequality I, edited by D. J. Slottje. Greenwich,
CT: JAI Press, pp. 49-82.
Bishop, John A., John P. Formby, and Paul D. Thistle. 1992.
Convergence of the South and non-South income distribution, 1969-1979.
American Economic Review 82:262-72.
Blackburn, McKinley L., and David E. Bloom. 1987. Earnings and
income inequality in the United States. Population and Development
Review 13:575-609.
Cancian. M., S. Danziger, and P. Gottschalk. 1993. Working wives
and family income inequality among married couples. In Uneven tides,
edited by S. Danziger and P. Gottschalk. New York: Russell Sage Foundation, pp. 317-53.
Cancian, Maria, and Deborah Reed. 1998. Assessing the effects of
wives' earnings on family income inequality. Review of Economics
and Statistics 80:73-9.
Cramer, H. 1946. Mathematical methods of statistics. Princeton, NJ:
Princeton University Press.
Danziger, Sheldon. 1980. Do working wives increase family income
inequality? Journal of Human Resources 15:444-51.
David, H. 1981. Order statistics. 2nd edition. New York: Wiley and
Sons.
Davidson, Russell, and J. Duclos. 1997. Statistical inference for
the measurement of incidence of taxes and transfers. Econometrica
65:1453-65.
Gail, M., and J. Gastwirth. 1978. A scale-free goodness-of-fit test
for the exponential distribution based on the Lorenz curve. Journal of
the American Statistical Association 73:787-93.
Ghosh, J. K. 1971. A new proof of the Bahadur representation of
quantiles and an application. The Annals of Mathematical Statistics
42:1957-61.
Horvath, Francis W. 1980. Working wives reduce inequality in the
distribution of family earnings. Monthly Labor Review 103:51-3.
Kolm, Serge-C. 1969. The optimal production of social justice. In
Public economics, edited by J. Margolis and H. Guitton. London:
Macmillan, pp. 145-200.
Mincer, Jacob. 1974. Schooling, experience and earnings. New York:
Columbia University Press.
Rao, C. 1965. Linear statistical inference and its applications.
New York: Wiley and Sons.
Saposnik, Rubin. 1981. Rank dominance in income distributions.
Public Choice 36:147-51.
Sendler, W. 1979. On statistical inference in concentration
measurement. Metrika 26:109-22.
Serfling, Robert J. 1980. Approximation theorems of mathematical
statistics. New York: John Wiley and Sons.
Shorrocks, Anthony. 1983. Ranking income distributions. Economica
50:3-17.
Silverman, B. W. 1986. Density estimation for statistics and data
analysis. London: Chapman and Hall.
Thurow, L. 1975. Lessening inequality in the distribution of
earnings and wealth. Princeton, NJ: Institute of Advanced Study.
Treas, Judith. 1987. The effect of women's labor force
participation on the distribution of income in the United States. Annual
Review of Sociology 13:259-88.
Zheng, Buhong. 1996. Statistical inferences for testing marginal
changes in Lorenz and generalized Lorenz curves. Working Paper,
University of Colorado at Denver.
Zheng, Buhong, and Brian J. Cushing. 1997. Large sample statistical
inferences for testing marginal changes in inequality indices. Working
Paper, University of Colorado at Denver.