文章基本信息

标题：Statistical inferences for testing marginal rank and (generalized) Lorenz dominances.
作者：Zheng, Buhong
期刊名称：Southern Economic Journal
印刷版ISSN：0038-4038
出版年度：1999
期号：January
语种：English
出版社：Southern Economic Association
摘要：Rank dominance, Lorenz dominance, and generalized Lorenz dominance are the three most commonly used tools in ranking income distributions; rank dominance and generalized Lorenz dominance yield social welfare rankings of income distributions, while Lorenz dominance provides inequality rankings. In their important contributions, Kolm (1969) and Atkinson (1970) establish that Lorenz dominance implies and is implied by all inequality measures satisfying the Pigou-Dalton principle of transfers; Saposnik (1981) proves that rank dominance is equivalent to welfare dominance by all increasing welfare functions; Shorrocks (1983) shows that generalized Lorenz dominance is equivalent to welfare dominance by all increasing and concave welfare functions.
关键词：Income distribution;Mathematical statistics;Statistics (Mathematics)

Statistical inferences for testing marginal rank and (generalized) Lorenz dominances.

Zheng, Buhong

1. Introduction

Rank dominance, Lorenz dominance, and generalized Lorenz dominance are the three most commonly used tools in ranking income distributions; rank dominance and generalized Lorenz dominance yield social welfare rankings of income distributions, while Lorenz dominance provides inequality rankings. In their important contributions, Kolm (1969) and Atkinson (1970) establish that Lorenz dominance implies and is implied by all inequality measures satisfying the Pigou-Dalton principle of transfers; Saposnik (1981) proves that rank dominance is equivalent to welfare dominance by all increasing welfare functions; Shorrocks (1983) shows that generalized Lorenz dominance is equivalent to welfare dominance by all increasing and concave welfare functions.

The empirical applications of these dominance methods have been greatly enhanced by the important contributions of Beach and Davidson (1983), Sendler (1979), and Gail and Gastwirth (1978), who provide the Lorenz curve with (asymptotically) distribution-free statistical inference procedures. Beach and Davidson's results also lead directly to the statistical inference of the generalized Lorenz curve, which was formally stated by Bishop, Chakraborti, and Thistle (1989). Although the asymptotic distribution of sample quantiles were well-known in the statistical literature (e.g., Cramer 1946), Bishop, Chow, and Formby (1991) were the first to formally test rank dominance.

The applicability of these inference procedures, however, is limited by the requirement that the samples drawn from different distributions must be independent.(1) Although this requirement is not very restrictive in many cross-sectional or cross-time studies, it certainly cannot be fulfilled in addressing marginal changes in income quantiles and in Lorenz and generalized Lorenz curves. Marginal changes in, say, a Lorenz curve refer to the changes in the Lorenz curve of the same distribution after an exogenous shock or an endogenous change has occurred to the distribution. The dominance methods applied to the comparison of the distributions before and after the marginal change are referred to as marginal dominances. An example of interest is the impact of wives' participation in the labor force on family income inequality. It is commonly believed that wives' participation in the labor force reduced family income inequality during the 1950s and 1960s in the U.S. but has increased inequality in recent years. Many recent empirical studies, however, have revealed that working wives still reduce family income inequality (Cancian, Danziger, and Gottschalk [1993] and Treas [1987] provide surveys on these studies). All of these empirical works employ samples to estimate marginal changes, but none of them applies statistical inference tests. It is also worth noting that none of them uses Lorenz curve dominance.

The present paper extends the existing statistical inferences of rank dominance and (generalized) Lorenz dominance to testing marginal dominances. It advances upon Beach and Davidson (1983) by deriving the full (asymptotic) joint variance - covariance structure for marginal changes in the ordinates of Lorenz and generalized Lorenz curves. It also provides inference for testing marginal changes in income quantiles. In proving the major results, I adopt a different yet more tractable approach (the Bahadur representation) than that used in either Sendler (1979) or Beach and Davidson (1983). As a consequence, the covariance structure can be derived in a straightforward manner and the property that the structure can be consistently estimated can be seen immediately.

The rest of the paper is organized as follows. The next section defines marginal rank and (generalized) Lorenz dominances. Section 3 provides large sample properties of the estimates of the marginal changes. The full (asymptotic) variance-covariance structures are also provided. Section 4 illustrates the inference procedures by examining the issue of working wives and income distribution in the U.S., Section 5 shows that the developed inferences can be modified and applied to more general cases where samples are partially dependent.

2. Marginal Changes and Marginal Dominances

Consider a joint distribution between two variables x [element of] [0, [infinity]) and y [element of] [0, [infinity]) with a continuous cumulative distribution function (c.d.f.) F(x, y). Without loss of generality, we may interpret x as family income before wives' participation in the labor force and y as family income after wives' participation in the labor force. The marginal distributions of x and y are denoted as H(x) and K(y), that is, H(x) [equivalent to] F(x, [infinity]) and K(y) [equivalent to] F([infinity], y). For convenience, we further assume that functions H and K are strictly monotonic and the first two moments of x and y exist and are finite. Thus, for a given population share p, which is the same for both x and y, there exist unique and finite income quantiles [Xi](p) and [Zeta](p) such that H([Xi](p)) = p and K([Zeta](p)) = p.

The Lorenz and generalized Lorenz curve ordinates of H(x) and K(y) corresponding to p are usually defined as

[Phi](p) [equivalent to] 1/[[Mu].sub.x] [integral of] xdH(x) between limits [Xi](p) and 0 and [Psi](p) [equivalent to] 1/[[Mu].sub.y] [integral of] ydK(y) between limits [Zeta](p) and 0 (2.1)

and

[Theta](p) [equivalent to] [integral of] xdH(x) between limits [Xi](p) and 0 = [[Mu].sub.x][Phi](p) and [Theta](p) [equivalent to] [integral of] ydK(y) between limits [Zeta](p) and 0 = [[Mu].sub.y][Psi](p), (2.2)

where [[Mu].sub.x] and [[Mu].sub.y] are the mean incomes of x and y, respectively.

With these notations, we can formally define marginal changes and marginal dominances.

DEFINITION 2.1. Given a joint distribution F (x, y) and a population share p, the marginal change in the quantile is defined as the difference between [Xi](p) and [Zeta](p), that is, [[Delta].sup.Q](p) = [Zeta](p) [Xi](p); the marginal change in the Lorenz ordinate is [[Delta].sup.L](p) = [Psi](p) - [Phi](p); and the marginal change in the generalized Lorenz ordinate is [[Delta].sup.G](p) = [Theta](p) - [Theta](p). Marginal rank dominance holds if [[Delta].sup.Q](p) does not change sign for all p [element of] [0, 1] and is nonzero for some p [element of] [0, 1]; marginal Lorenz dominance holds if [[Delta].sup.L](p) does not change sign for all p [element of] [0, 1] and is nonzero for some p [element of] (0, 1); marginal generalized Lorenz dominance holds if [[Delta].sup.G](p) does not change sign for all p [element of] [0, 1] and is nonzero for some p [element of] [0, 1].

In empirical studies, population quantiles and Lorenz and generalized Lorenz curves are usually characterized by a set of ordinates corresponding to the abscissae {[p.sub.i] [where] i = 1, 2, . . ., K} and [p.sub.K+1] = 1. Assuming 0 [less than] [p.sub.1] [less than] [p.sub.2] [less than] . . . [less than] [p.sub.K] [less than] 1, we have two sets of (K + 1) population quantiles {[[Xi].sub.i]} and {[[Zeta].sub.i]}, two sets of K population Lorenz curve ordinates {[[Phi].sub.i]} and {[[Psi].sub.i]}, and two sets of (K + 1) population generalized Lorenz curve ordinates {[[Theta].sub.i]} and {[[Theta].sub.i]}. For each i, i = 1, 2, . . ., K, these ordinates (quantiles) are related as shown in Equation 2.2; also [[Phi].sub.K+ 1] = [[Mu].sub.x] and [[Psi].sub.K+1] = [[Mu].sub.y].

Assume a paired sample of size n, ([x.sub.1], [y.sub.1]), ([x.sub.2], [y.sub.2]), . . ., ([x.sub.n], [y.sub.n]), is independently and identically drawn from population with c.d.f. F(x, y). Then for each [p.sub.i], consistent sample estimates of [[Xi].sub.i] and [[Zeta].sub.i] are [x.sub.([r.sub.i])] and [y.sub.([r.sub.i])] (Settling 1980, Theorem 2.3.1), where [x.sub.(l)] and [y.sub.(l)] are the lth order statistics of {[x.sub.i]} and {[y.sub.i]} and [r.sub.i] = [n[p.sub.i]]. The sample estimators of generalized Lorenz and Lorenz ordinates are

[Mathematical Expression Omitted], (2.3)

and

[Mathematical Expression Omitted]. (2.4)

Thus, marginal changes in quantiles [Mathematical Expression Omitted], Lorenz ordinates [Mathematical Expression Omitted], and generalized Lorenz ordinates [Mathematical Expression Omitted] can be obtained.

3. Asymptotic Distributions of Marginal Changes

This section provides asymptotic distributions for the following three vectors of marginal changes: [Mathematical Expression Omitted], [Mathematical Expression Omitted], and [Mathematical Expression Omitted], [Mathematical Expression Omitted]. Our derivation is different from those used in Gail and Gastwirth (1978), Sendler (1979), and Beach and Davidson (1983). The new method involves the use of the Bahadur representation (Bahadur 1966; Ghosh 1971), which makes the derivation more tractable and more accessible to economists. In this paper, however, I only report the main results; the detailed proofs can be found in Zheng (1996).

The Bahadur representation establishes the relationship between population quantiles and sample quantiles. By introducing the indicator variable

[Mathematical Expression Omitted], (3.1)

the elegant Bahadur representation can be stated as follows (e.g., David 1981, p. 255):

[Mathematical Expression Omitted], (3.2)

where h(x) is the density function of H(x) and [o.sub.p] denotes "small in probability."

Now first consider the marginal changes in income quantiles, [Mathematical Expression Omitted]. Clearly, for each i, we have

[Mathematical Expression Omitted], (3.3)

where k(y) is the density function of K(y).

Using Equation 3.3 and through direct calculation, one can easily establish the following result.

THEOREM 1. Under the conditions that H and K are strictly monotonic and differentiable and that the first two moments of x and y exist and are finite, the (K + 1)-random vector of marginal changes in sample quantiles, [Mathematical Expression Omitted], is asymptotically normal in that [Mathematical Expression Omitted] has a (K + 1)-variate normal distribution with mean zero and covariance matrix [Lambda] = {[[Delta].sub.ij]} with

[[Delta].sub.ij] = [p.sub.i](1 - [p.sub.j]) / h([[Xi].sub.i])h([[Xi].sub.j]) + [p.sub.i](1 - [p.sub.j]) / k([[Zeta].sub.i])k([[Zeta].sub.j]) - F([[Xi].sub.i], [[Zeta].sub.j]) - [p.sub.i][p.sub.j] / h([[Xi].sub.i])k([[Zeta].sub.j]) - F([[Xi].sub.j], [[Zeta].sub.i]) - [p.sub.i][p.sub.j] / h([[Xi].sub.j])k([[Zeta].sub.i]) (3.4)

for i [less than or equal to] j. In particular, the variance of [Mathematical Expression Omitted] is

[Mathematical Expression Omitted]. (3.4a)

To establish the asymptotic distributions for [Mathematical Expression Omitted] and [Mathematical Expression Omitted], first note that they are both functions of [Mathematical Expression Omitted], [Mathematical Expression Omitted], [Mathematical Expression Omitted] and [Mathematical Expression Omitted]. Hence, it is necessary to derive the joint asymptotic distribution of [Mathematical Expression Omitted], which is a consistent estimator of [Beta] = ([[Theta].sub.1], [[Theta].sub.2], . . ., [[Theta].sub.K], [[Theta].sub.K+1], [[Theta].sub.1], [[Theta].sub.2], . . ., [[Theta].sub.K], [[Theta].sub.K+1])[prime]. Zheng (1996) shows that [Mathematical Expression Omitted] can be expressed as

[Mathematical Expression Omitted], (3.5)

where [u.sub.n](x) [similar to] [v.sub.n](x) denotes that [u.sub.n](x) - [v.sub.n](x) converges in probability to zero. It follows from Slutsky's theorem (Theorem 1.5.4 of Serfling 1980) that both sides of Equation 3.5 have the same limiting distribution. Replacing ([x.sub.([r.sub.i])] - [[Xi].sub.i]) with the Bahadur representation, we have

[Mathematical Expression Omitted]. (3.6)

Similarly,

[Mathematical Expression Omitted]. (3.7)

Thus, the asymptomatic distribution of [Mathematical Expression Omitted] can be derived by considering Equations 3.6 and 3.7 jointly for i = 1, 2, . . ., K + 1.

The following theorem also generalizes Theorem 1 of Beach and Davidson (1983).

THEOREM 2. Under the conditions of Theorem 1, the 2(K + 1)-random vector of generalized Lorenz curve ordinates, [Mathematical Expression Omitted], is asymptotically normal in that [Mathematical Expression Omitted] has a 2(K + 1)-variate normal distribution with mean zero and covariance matrix

[Mathematical Expression Omitted], (3.8)

where

[[Omega].sub.ij] = [integral of] (x - [[Xi].sub.i])(x - [[Xi].sub.j]) dH(x) between limits [[Xi].sub.i] and 0 - [integral of] (x - [[Xi].sub.i]) dH(x) between limits [[Xi].sub.i] and 0 [integral of] (x - [[Xi].sub.j]) dH(x) between limits [[Xi].sub.j] and 0 for i [less than or equal to] j, (3.9)

[v.sub.ij] = [integral of] (y - ([[Zeta].sub.i])(y - [[Zeta].sub.j]) dK(y) between limits [[Zeta].sub.i] and 0 - [integral of] (y - [[Zeta].sub.i]) dK(y) between limits [[Zeta].sub.i] and 0 [integral of] (y - [[Zeta].sub.j]) dK(y) between limits [[Zeta].sub.j] and 0 for i [less than or equal to] j, (3.10)

and(2)

[Mathematical Expression Omitted]. (3.11)

Note that the covariance terms [[Omega].sub.ij] and [v.sub.ij] given in Theorem 2 are expressed in a different form than that given in Beach and Davidson (1983). However, it is straightforward to verify that [[Omega].sub.ij] is equivalent to equation 8 of Beach and Davidson by utilizing [[Theta].sub.i] = [integral of] x dH(x) between limits [[Xi].sub.i] and 0 = [p.sub.i][[Gamma].sub.i] and [Mathematical Expression Omitted], where [[Gamma].sub.i] is the conditional mean and [Mathematical Expression Omitted] is the conditional variance of income less than or equal to [[Xi].sub.i].

Based on Beach and Davidson (1983), Bishop, Formby and Thistle (1989) construct the inference procedure for testing generalized Lorenz curves with independent samples. Using the results in Theorem 2, we can derive the asymptotic distribution of the sample marginal changes in generalized Lorenz curve ordinates and, hence, extend Bishop, Formby, and Thistle's inference procedures to the cases of paired samples.

THEOREM 3. Under the conditions of Theorem 1, the vector of sample marginal changes in generalized Lorenz curve ordinates, [Mathematical Expression Omitted], is asymptotically normal in that [n.sup.1/2] [[[Delta].sup.G] - [[Delta].sup.G]] tends to a (K + 1)-variate normal distribution with mean zero and covariance matrix [Sigma] = {[[Epsilon].sub.ij]} with

[[Epsilon].sub.ij] = [[Omega].sub.ij] + [v.sub.ij] - ([[Tau].sub.ij] + [[Tau].sub.ji]), i, j = 1, 2, . . ., K + 1. (3.12)

Thus, the asymptotic variance of [Mathematical Expression Omitted] is [[Epsilon].sub.ii] = [[Omega].sub.ii] + [v.sub.ii] - 2[[Tau].sub.ii].

Theorem 2 can be further used to derive the asymptotic distribution of the sample marginal changes in the Lorenz curve ordinates. The following result comes directly from the use of the well-known delta method (e.g., Rao 1965, p. 321) on limiting distributions of differentiable functions of random variables.

THEOREM 4. Under the conditions of Theorem 1, the vector of sample marginal changes in Lorenz curve ordinates, [Mathematical Expression Omitted], is asymptotically normal in that [n.sup.1/2] [[[Delta].sup.L] - [[Delta].sup.L]] tends to a K-variate normal distribution with mean zero and covariance matrix [Pi] = J[Xi]J[prime] with

[Mathematical Expression Omitted]. (3.13)

Hence, the variance of [Mathematical Expression Omitted] is

[[Pi].sub.ii] = [[Phi].sub.ii] + [[Phi].sub.ii] - 2/[[Mu].sub.x][[Mu].sub.y] [[[Tau].sub.ii] - [[Phi].sub.i][[Tau].sub.(K+1)i] - [[Psi].sub.i][[Tau].sub.i(K+1)] + [[Phi].sub.i][[Psi].sub.i][[Epsilon].sub.xy]], (3.14)

where

[Mathematical Expression Omitted] (3.15)

and

[Mathematical Expression Omitted] (3.16)

are the asymptotic variances of [Mathematical Expression Omitted] and [Mathematical Expression Omitted], respectively. Here [Mathematical Expression Omitted], [Mathematical Expression Omitted], and [[Epsilon].sub.xy] denote the variances of x and y and the covariance between x and y.

Theorem 4 can in turn be used to establish statistical inferences for testing marginal changes in population quantile shares and quantile means (Zheng 1996).(3) Having derived various asymptotic distributions of marginal changes, we can perform conventional statistical inferences to test marginal rank dominances. The variance-covariance structures that we derived enable us to construct consistent estimators in a straightforward manner.

To estimate the variances of marginal changes in income quantiles, one needs to estimate F([[Xi].sub.i], [[Zeta].sub.j]), h([[Xi].sub.i]) and h([[Zeta].sub.j]). Clearly, F([[Xi].sub.i], [[Zeta].sub.j]) can be consistently estimated as

1/n [summation of] I{([x.sub.l], [y.sub.l]) [less than or equal to] ([[Xi].sub.i], [[Zeta].sub.j])} where l = 1 to n,

where ([x.sub.l], [y.sub.l]) [less than or equal to] ([[Xi].sub.i], [[Zeta].sub.j]) stands for the condition that the observation ([x.sub.l], [y.sub.l]) must satisfy [x.sub.l] [less than or equal to] [[Xi].sub.i] and [y.sub.l] [less than or equal to] [[Zeta].sub.j] simultaneously. In the literature, there exist several nonparametric approaches to density estimation. Silverman (1986) provides a comprehensive survey on various methods of estimation. In this paper, I adopt the kernel method because the consistency of the estimation has been well established in the literature. Procedurally, the kernel estimator of h([Xi]) is given by

[Mathematical Expression Omitted], (3.20)

where K is a kernel function and g is a "window width" that depends on the sample size n. Under certain conditions on K and g, [Mathematical Expression Omitted] is a consistent estimator of h([Xi]). The kernel function and window width function used in this paper are (Silverman 1986)

[Mathematical Expression Omitted], otherwise, (3.21)

and

g = 0.9A[n.sup.-1/5], (3.22)

where A = min (standard deviation, interquartile range/1.34).

The estimation of the covariance matrix of [Xi] of Theorem 2 is straightforward. It is easy to verify that the following estimators are all consistent and asymptotically unbiased:

[Mathematical Expression Omitted], (3.23)

[Mathematical Expression Omitted], (3.24)

and

[Mathematical Expression Omitted]. (3.25)

where [x.sub.([r.sub.i])] and [y.sub.([r.sub.j])] are sample quantiles corresponding to [p.sub.i] and [p.sub.j].

In carrying out the inference tests, we follow the suggestion of Bishop, Formby, and Thistle (1989) and use the union-intersection approach. Specifically, this approach considers a joint multiple comparison of K marginal changes and compares the test statistics with the critical student maximum modulus (SMM) value. If all marginal changes are nonpositive (nonnegative) and some are significantly different from zero, then marginal dominance follows; if some changes are significantly positive and some are significantly negative, then we have marginal crossing; if all changes are insignificantly different from zero, then the two distributions, before and after the event, are regarded as the same. This method has been successfully applied in addressing distributional changes (see, e.g., Bishop, Formby, and Thistle 1992).

4. An Illustration: Working Wives and U.S. Family Income Distribution

Over the decades since World War II, the participation of women, particularly married women, in the labor force has risen rapidly. In 1951, about 23% of married women were in the paid labor force (Danziger 1980). By 1989, the participation rate had increased to about 70% (Bishop, Chiou, and Formby 1997). As wives' earnings have become a more important source of family income over time, considerable concern and interest have developed regarding the impact of the increasing number of working wives on family income distribution.

Mincer (1974) was probably the first to provide a rigorous attempt to address this issue and suggested that working wives improve income distribution. The fact that the most rapid increases in female labor force participation rates have occurred among women from high-income families, however, led Thurow (1975) to speculate that working wives are now "becoming a source of family inequality." Although most empirical studies in the literature (e.g., earlier works by Bergmann et al. [1980], Danziger [1980], and Horvath [1980] and recent ones by Blackburn and Bloom [1987], Treas [1987], Cancian, Danziger, and Gottschalk [1993], Bishop, Chiou, and Formby [1997], and Cancian and Reed [1998]) do not confirm Thurow's speculation, many people still believe that increasing wives' participation in the labor force is enlarging the income gap among U.S. families.

As an application of the statistical inferences developed in this paper, I reinvestigate the issue of working wives and U.S. family income distribution. In contrast to most previous studies, I do not rely on any summary measures of inequality such as the Gini coefficient; I use Lorenz curves as a measure of inequality. I also calculate the standard errors for the estimates of Lorenz curve ordinates and statistically test the marginal impact of working wives on family income distribution. The data I use are four subsamples of the 1% Public Use Microdata Sample (PUMS) of the 1990 Census of Population and Housing. Following Cancian, Danziger, and Gottschalk (1993), I limit the sample to those observations with positive total family incomes excluding wives' earnings and where both husband and wife are younger than 65. I also follow most previous studies by focusing my investigation on married families and define variable x as the total family income less wife's earnings and variable y as the total family income (including wife's earnings) as defined by the Census Bureau.

Table 1 reports marginal changes in the family Lorenz curve due to wives' participation in the labor force. It also examines the marginal impacts of working wives in two subgroups: whites and nonwhites. Columns (1) and (2) are the estimated family income Lorenz curve ordinates before and after wives' participation in the labor force. Columns (3), (4), and (5) provide the estimates of marginal changes in Lorenz curves of the whole population, whites, [TABULAR DATA FOR TABLE 1 OMITTED] and nonwhites. The sample marginal changes of the whole population and whites are all positive and significant, and the sample marginal changes of nonwhites are significant at the first and fifth through eighth deciles (the SMM critical value is 2.515 at the 10% level). This implies that working wives have significantly reduced family inequality of both whites and nonwhites as well as the whole population. Column (6) is the difference between the marginal changes of whites and nonwhites. An inspection of this column indicates that the (absolute) marginal impacts of working wives on the Lorenz curves of whites and nonwhites are statistically different at the second, third, and fourth deciles.

Since working wives with positive incomes enhance social welfare for all symmetric and increasing welfare functions, one can expect that the after-participation income distribution always rank dominates, hence generalized Lorenz dominates, the before-participation distribution. Thus, we cannot use this example to test marginal rank and generalized Lorenz dominances by asking whether or not working wives have improved the social welfare of the before-participation distribution. We could, nevertheless, test rank and generalized Lorenz dominances by comparing the marginal impacts of whites with those of nonwhites, that is, whether working wives have more impact on the income quantiles (generalized Lorenz curve) of whites than nonwhites. The results of these comparisons are summarized in Table 2 and are graphically illustrated in Figures 1 and 2.

Columns (1), (2), and (3) of Table 2 report the comparison on quantiles. Columns (1) and (2) are marginal changes in income quantiles of whites and nonwhites; column (3) reflects the difference between the two marginal changes. For example, at the first decile, working wives increase family income by $6000 in the white subgroup and by $3643 in the nonwhite subgroup, [TABULAR DATA FOR TABLE 2 OMITTED] and the difference ($2357) is significant at the 10% level. The marginal changes for the last decile (p = 1.0) are not computed because of the top-coding problem. By inspection, we can see the following pattern: Working wives have more (absolute) impact on whites than nonwhites at lower deciles and have less impact at higher deciles. Thus, the comparison is inconclusive. However, the comparison of generalized Lorenz curves reveals that, cumulatively, working wives have more impact on the family income of whites than on that of nonwhites (the critical SMM value is 2.560 for the 10 joint comparisons). The average marginal impact of whites ($11,333) is not significantly different from that of nonwhites ($11,586).(4)

5. An Extension

Rather than concluding the paper with a usual summary, this section provides an important extension of the inferences developed above. I will show that the results can be modified and applied to more general cases where samples are partially dependent.

Although measuring marginal changes and testing marginal dominances are important topics in income distribution studies, we encounter them far more often with partially dependent samples than with completely dependent (paired) samples. Many cross-time income samples (e.g., Current Population Survey [CPS] and Panel Study of Income Dynamics [PSID] are neither completely independent nor completely dependent by design; some (but not all) individuals may be interviewed in several consecutive years. Until now, the problem of sample dependence has been either completely ignored or avoided by using the independent portions of the samples or choosing data from several years apart. Although researchers generally agree that it is very important to take the nature of dependence into account in computing standard errors, an appropriate method of dealing with this problem is lacking. In what follows, I make an attempt to provide such a method, though one may not be able to completely solve the problem. I first illustrate the basic approach by testing mean incomes from two samples of different sizes where parts of the samples are overlapping; I then provide consistent estimators for testing marginal dominances.(5)

Assume two samples of sizes m and n, {[x.sub.l]} and {[y.sub.s]}, are drawn from two adjacent years' income distributions with means [[Mu].sub.x] and [[Mu].sub.y] and variances [Mathematical Expression Omitted] and [Mathematical Expression Omitted]. Further assume that the first q (q [less than or equal to] min {m, n}) observations of the two samples are overlapping, that is, the first q individuals are present in both samples and stand in the same order, and {[x.sub.q+1], . . ., [x.sub.m]} is independent of {[y.sub.s]} and {[y.sub.q+1], . . ., [y.sub.n]) is independent of {[x.sub.l]}. Generally speaking, {[x.sub.q+1], . . ., [x.sub.m]} is not independent of {[x.sub.1], . . ., [x.sub.q]} and {[y.sub.q+1], . . ., [y.sub.n]} is not independent of {[y.sub.1], . . ., [y.sub.q]}. Thus, [Mathematical Expression Omitted] may not equal [Mathematical Expression Omitted] and [Mathematical Expression Omitted] may not equal [Mathematical Expression Omitted]. In the absence of the precise information on the nature of this dependence, however, it may not be unreasonable to assume that [Mathematical Expression Omitted] and [Mathematical Expression Omitted]. Since [Mathematical Expression Omitted], we only need to consider the covariance term [Mathematical Expression Omitted].

Denoting

[Mathematical Expression Omitted],[Mathematical Expression Omitted], [Mathematical Expression Omitted], and [Mathematical Expression Omitted],

we can write [Mathematical Expression Omitted] as

[Mathematical Expression Omitted]. (4.1)

Since [[Rho].sub.x] is independent of {[y.sub.s]} (hence [[Alpha].sub.y] and [[Rho].sub.y]) and [[Rho].sub.y] is independent of {[x.sub.l]} (hence [[Alpha].sub.x] and [[Rho].sub.x]) by assumption, we have cov([[Alpha].sub.y], [[Rho].sub.x]) = cov([[Rho].sub.y], [[Alpha].sub.x]) = cov([[Rho].sub.y],[[Rho].sub.x]) = 0 and thus

[Mathematical Expression Omitted]. (4.2)

Noting that

[Mathematical Expression Omitted] and [Mathematical Expression Omitted],

we further have

[Mathematical Expression Omitted]. (4.3)

Clearly,

cov(1/q [summation of] [y.sub.l] where l = 1 to q, 1/q [summation of] [x.sub.s] where s = 1 to q)

can be directly calculated. Thus, [Mathematical Expression Omitted] can be computed in the following two steps: First, calculate the covariance of sample means of the overlapped samples as if they were the complete samples; second, multiply the covariance calculated in the first step by the percentages of the overlapped portions of the two samples (q/m and q/n).

In the same manner, we can compute the covariance terms in Equations 3.4 and 3.11 and, consequently, we can calculate the standard errors for various sample marginal changes. Specifically, under the same assumptions about the data structures of {[x.sub.l]} and {[y.sub.s]} as described above, the estimate of the covariance term in Equation 3.4a is

[Mathematical Expression Omitted], (4.4)

where

[Mathematical Expression Omitted] and [Mathematical Expression Omitted].

The covariance term [[Tau].sub.ij] of Equation 3.11 can be estimated as

[Mathematical Expression Omitted], (4.5)

where [Mathematical Expression Omitted], [Mathematical Expression Omitted], and [Mathematical Expression Omitted] and [Mathematical Expression Omitted] are the lth-order statistics of {[x.sub.1], . . ., [x.sub.q]} and {[y.sub.1], . . ., [y.sub.q]}, respectively.

I thank two anonymous referees and Professor Kathy Hayes, the editor, for many helpful comments and suggestions. I also thank Brian J. Cushing for providing me with ready-to-use 1990 census sample data. All remaining errors are, of course, my own responsibility.

1 Bishop, Chow, and Formby (1994) provide inference procedures for Lorenz and generalized Lorenz curves and their associated concentration curves. Furthermore, while this paper was under review for publication, it came to my attention that Davidson and Duclos (1997) also independently developed a procedure that is closely related to the topic of this paper. Yet the motivation and the proof of this paper are different from those of Davidson and Duclos; I focus on testing marginal changes and use the Bahadur representation in derivation.

2 Consistent with Beach and Davidson (1983), Equation 3.11 can also be expressed as

[Mathematical Expression Omitted],

where

[p.sub.ij] = F([[Xi].sub.i], [[Zeta].sub.j]), [Mathematical Expression Omitted], [Mathematical Expression Omitted], [Mathematical Expression Omitted],

[Mathematical Expression Omitted], and [Mathematical Expression Omitted].

I thank a referee for suggesting this expression.

3 To preserve space, these results are not listed here but are available from the author on request.

4 Here, unlike in Bishop, Chiou, and Formby (1997), I do not consider rerankings that can be analyzed by comparing the Lorenz curve of before-participation distribution and the concentration curve of after-participation distribution.

5 Zheng and Cushing (1997) provide statistical inferences for testing summary inequality measures (the Theil measures and the Gini coefficient) with dependent samples.

References

Atkinson, Anthony B. 1970. On the measurement of inequality. Journal of Economic Theory 2:244-63.

Bahadur, R. 1966. A note on quantiles in large samples. Annals of Mathematical Statistics 37:577-80.

Beach, Charles M., and Russell Davidson. 1983. Distribution-free statistical inference with Lorenz curves and income shares. Review of Economic Studies 50:723-35.

Bergmann, Barbara R., Judith R. Devine, Diane Reedy, Lewis Sage, and Christine Wise. 1980. The effect of wives' labor force participation on inequality in the distribution of family income. Journal of Human Resources 15:452-6.

Bishop, John A., S. Chakraborti, and Paul D. Thistle. 1989. Asymptotically distribution free statistical inference for generalized Lorenz curves. Review of Economics and Statistics 71:725-7.

Bishop, John A., J. R. Chiou, and John P. Formby. 1997. Working wives and earnings inequality. Working Paper, University of Alabama.

Bishop, John A., K. Victor Chow, and John P. Formby. 1991. A stochastic dominance analysis of growth, recessions and the U.S. income distribution, 1967-1986. Southern Economic Journal 57:936-46.

Bishop, John A., K. Victor Chow, and John P. Formby. 1994. Testing for marginal changes in income distribution with Lorenz and concentration curves. International Economic Review 35:479-88.

Bishop, John A., John P. Formby, and Paul D. Thistle. 1989. Statistical inference, income distributions and social welfare. In Research in economic inequality I, edited by D. J. Slottje. Greenwich, CT: JAI Press, pp. 49-82.

Bishop, John A., John P. Formby, and Paul D. Thistle. 1992. Convergence of the South and non-South income distribution, 1969-1979. American Economic Review 82:262-72.

Blackburn, McKinley L., and David E. Bloom. 1987. Earnings and income inequality in the United States. Population and Development Review 13:575-609.

Cancian. M., S. Danziger, and P. Gottschalk. 1993. Working wives and family income inequality among married couples. In Uneven tides, edited by S. Danziger and P. Gottschalk. New York: Russell Sage Foundation, pp. 317-53.

Cancian, Maria, and Deborah Reed. 1998. Assessing the effects of wives' earnings on family income inequality. Review of Economics and Statistics 80:73-9.

Cramer, H. 1946. Mathematical methods of statistics. Princeton, NJ: Princeton University Press.

Danziger, Sheldon. 1980. Do working wives increase family income inequality? Journal of Human Resources 15:444-51.

David, H. 1981. Order statistics. 2nd edition. New York: Wiley and Sons.

Davidson, Russell, and J. Duclos. 1997. Statistical inference for the measurement of incidence of taxes and transfers. Econometrica 65:1453-65.

Gail, M., and J. Gastwirth. 1978. A scale-free goodness-of-fit test for the exponential distribution based on the Lorenz curve. Journal of the American Statistical Association 73:787-93.

Ghosh, J. K. 1971. A new proof of the Bahadur representation of quantiles and an application. The Annals of Mathematical Statistics 42:1957-61.

Horvath, Francis W. 1980. Working wives reduce inequality in the distribution of family earnings. Monthly Labor Review 103:51-3.

Kolm, Serge-C. 1969. The optimal production of social justice. In Public economics, edited by J. Margolis and H. Guitton. London: Macmillan, pp. 145-200.

Mincer, Jacob. 1974. Schooling, experience and earnings. New York: Columbia University Press.

Rao, C. 1965. Linear statistical inference and its applications. New York: Wiley and Sons.

Saposnik, Rubin. 1981. Rank dominance in income distributions. Public Choice 36:147-51.

Sendler, W. 1979. On statistical inference in concentration measurement. Metrika 26:109-22.

Serfling, Robert J. 1980. Approximation theorems of mathematical statistics. New York: John Wiley and Sons.

Shorrocks, Anthony. 1983. Ranking income distributions. Economica 50:3-17.

Silverman, B. W. 1986. Density estimation for statistics and data analysis. London: Chapman and Hall.

Thurow, L. 1975. Lessening inequality in the distribution of earnings and wealth. Princeton, NJ: Institute of Advanced Study.

Treas, Judith. 1987. The effect of women's labor force participation on the distribution of income in the United States. Annual Review of Sociology 13:259-88.

Zheng, Buhong. 1996. Statistical inferences for testing marginal changes in Lorenz and generalized Lorenz curves. Working Paper, University of Colorado at Denver.

Zheng, Buhong, and Brian J. Cushing. 1997. Large sample statistical inferences for testing marginal changes in inequality indices. Working Paper, University of Colorado at Denver.