Disproportionality at the "front end" of the child welfare services system: an analysis of rates of referrals, "hits," "misses," and "false alarms".
Mumpower, Jeryl L.
The disproportional representation of minority children in the
child welfare system has been a topic of concern for many years, dating
back at least to the work of Billingsley and Giovanni (1972). Nearly
forty years after the issue was raised, neither scholars nor
practitioners have reached agreement about the precise nature, extent,
or causes of racial and ethnic disproportionality or the most
appropriate measures for addressing the problem, despite the substantial
body of research that has and continues to grapple with the question
(e.g., Ards et al., 2003; Barth, 2005; Casey Family Programs, 2006;
Chapin Hall Center for Children, 2008; Courtney & Skyles, 2003;
Derezotes et al., 2005; Fluke et al., 2003; Hill, 2006; Needell et al.,
2003; Shaw et al., 2008; U.S. General Accountability Office, 2007). Many
advocates and researchers attribute disproportionality to some form of
discrimination, either at the individual or institutional level. In
counterpoint, some, such as Bartholet (2009), have argued that if Black
children are disproportionately victimized by maltreatment then they
should appropriately be removed to foster care at rates proportionate to
their maltreatment rates, which will be disproportionate with respect to
the overall population.
Theories about the root causes of disproportionality have been
categorized into those that emphasize three types of factors (Chibnall
et al., 2003; Hill, 2006): parent and family risk factors (giving rise
to disproportionate needs), community risk factors (living in high-risk
neighborhoods that lead to increased surveillance), and organizational
and systemic factors (including biases in decision making, cultural
insensitivity, and structural racism.) According to Barth (2005), four
dominant models have been proposed to explain racial disproportionality
in the child welfare system: the risk, incidence, and benefit model; the
child welfare services decision making model; the placement dynamics
model; and the multiplicative model. Courtney and Skyles (2003) observed
that two general types of mechanisms contribute to disproportionality. A
racial or ethnic group may enter the child welfare system at a rate that
is disproportionate to its presence in the overall population; this is
the "front end" (i.e., child maltreatment reporting,
substantiation, etc.) of the problem. Similarly, a racial or ethnic
group may exit the child welfare system at a slower rate than other
groups; this is the "back end" (i.e., family reunification,
adoption, etc.) of the problem.
The present paper focuses on decisions that are made at the front
end of the problem--disproportionality in reporting and substantiation.
It takes a molar perspective; the analyses make use of national level
data and data from a single large state, California.
The analysis relies on binary classification techniques based on
signal detection theory (Green & Swets, 1966) to address the
following questions, among others:
* What is the best estimate of the probability that instances of
child maltreatment will be detected by child protective services (CPS)
agencies? Are there differences among racial and ethnic groups in the
probability of detection?
* What is the overall accuracy of the child welfare screening
system? How accurate is the system with respect to maltreatment? How
accurate is the system with respect to non-maltreatment? Are there
differences among racial and ethnic groups in accuracy?
* What are the error rates in the system? What is the rate of false
negatives (failing to detect maltreatment when it is present)? What is
the rate of false positives (identifying cases as potentially involving
abuse or neglect when they do not in fact involve such maltreatment)?
Are there differences among racial and ethnic groups in the rates of
false positives and false negatives?
* What is the probability that allegations of child maltreatment
will be substantiated? Are there differences among racial and ethnic
groups in the probability that allegations will be substantiated?
This paper conceptualizes and analyzes the front end of the child
welfare system in a manner analogous to how medical or public health
studies analyze screening tests. If referrals to the child welfare
system represent a screening tool that is analogous to mammography with
relation to breast cancer or PSA tests with relation to prostate cancer,
the following questions can be addressed: How well does this screening
mechanism function? How accurate is the screening process? What
percentage of cases is detected? Does the screening process work the
same for all racial and ethnic groups and, if not, in what ways does it
contribute to disproportionality?
Aspects of all these questions have been addressed previously by
researchers concerned with disproportionality in the child welfare
system. The distinctive contribution of this work is to address all
these questions simultaneously within an integrated analytic framework
that makes explicit their linkages to one another.
METHOD
Simple binary classification analyses, as well as more
sophisticated versions of signal detection analysis, have been used in a
wide variety of psychological, social, and medical research contexts
(e.g., see Swets, 1996; Swets et al., 2000).1 Such analytic techniques
have not been widely used in child welfare research, but Shlonsky and
Wagner (2005) have noted that they have been used in some analyses of
risk assessment instruments. Ruscio (1998) made use of a similar
conceptual framework to that proposed here in the context of attempts to
improve clinical decision making in child welfare cases.
The schema for the binary classification analyses presented in this
paper appears in Figure 1. Two key variables are included in all
analyses. The first is maltreatment, which is defined to have only two
possible states--either the presence of maltreatment or its absence. The
second variable is simply whether or not a referral--an allegation of
neglect or abuse--has been made. There are, then, four possible
exhaustive and mutually exclusive outcomes. (1) There are true positives
(TP), which are defined as children for whom a referral has been made
and for whom that allegation has been substantiated; a true positive is
sometimes called a "hit". (2) There are false positives (FP),
which are defined as children for whom a referral was made but was later
dismissed; false positives are sometimes called "false alarms"
(or Type I errors; Neyman & Pearson, 1933, 1936). (3) There are true
negatives (TN), which are defined as children for whom no referral was
made and who are not maltreated; true negatives are the equivalent to
what is sometimes called a "correct rejection." (4) There are
false negatives (FN), which are defined as children for whom no referral
was made but who are in fact neglected or abused; a false negative is
sometimes called a "miss" (or Type II error; Neyman &
Pearson, 1933, 1936).
Seven measures of performance can be derived from this simple
binary classification schema.
1. The incidence rate is the rate of child maltreatment across the
entire population. It is equivalent to the sum of true positives (hits)
plus false negatives (misses), divided by the sum of the entire
population--(TP + FN/(TP+TN+FP+FN).
2. The positive predictive value is the probability that a child
who is referred will be ascertained to have been mistreated. This is
computed by dividing the number of true positives (hits) by the total
number of referrals--TP/(TP+FP). The maximum positive predictive value
is one, which would signify that every referred child is found to have
been mistreated.
3. The negative predictive value is the probability that a child
who is not referred is not mistreated. This is computed by dividing the
number of true negatives (correct rejections) by the total number of
children who were not referred--TN/(TN+FN). The maximum negative
predictive value is one, which would mean that every child who was not
referred was also not mistreated.
4. Sensitivity is the proportion of maltreatment cases that are
referred and substantiated. Sensitivity is sometimes called the true
positive rate and is a critically important measure in most diagnostic
contexts. In medicine, sensitivity measures the proportion of breast
cancers that are detected by mammograms or the proportion of prostate
cancers that are detected by a PSA test. In the child welfare context,
sensitivity addresses the question of what proportion of child
maltreatment cases are referred. Sensitivity is computed by dividing the
number of true positives by the sum of true positives plus false
negatives--(TP/(TP+FN). The maximum sensitivity value is one, which
would mean that every maltreated child was referred.
5. Specificity is the proportion of non-maltreated children who
were not referred. Specificity is sometimes described as the true
negative rate. This is also a critically important measure in most
diagnostic contexts. A measure with perfect sensitivity (that is, one
that gave a positive test result for all cases of breast cancer,
prostate cancer, child maltreatment, etc.) will not be very helpful if
it achieves a perfect record of predicting positive cases simply by
always predicting a positive value for each and every case. A good
predictor also needs to yield a negative result when the target
condition is not present. Specificity is the probability that children
who are not maltreated are not referred. It is computed by dividing the
number of true negatives (correct rejections) by the sum of true
negatives (correct rejections) plus false positives (false
alarms)--TN/(TN+FP). The maximum specificity value is one, which would
mean that no non-maltreated child was referred. The false positive rate
is the mirror image of specificity; it is simply one minus specificity.
The false positive rate is the probability of a "false alarm,"
or the probability that a child who is not maltreated will be referred.
6. Accuracy measures the proportion of correct diagnoses, weighting
both positive and negative diagnoses equally. It is computed by summing
the number of true positives (hits) and true negatives (correct
rejections) and dividing by the sum of the entire population--(TP +
TN)/(TP+TN+FP+FN). Accuracy is an indicator of overall performance. It
penalizes equally both types of errors--false positives (false alarms)
and false negatives (misses). In child welfare, however, it may not be
appropriate to equally weight the two types of errors Failing to detect
a case of abuse or neglect (a "miss") might be regarded, for
instance, as a more serious error than making a referral that ends up
being unsubstantiated (a "false alarm").
7. Because false positives and false negatives may not be regarded
as equally serious, it is useful to compare the rate of each type of
error. The False Positive/False Negative ratio (i.e., the ratio of false
alarms and misses) is one way to make such a comparison. A ratio value
of one means that both types of errors occur with equal frequency. A
value of more than one means that false positives (false alarms) are
more frequent than false negatives (misses). A value of less than one
means that there are more false negatives (misses) than false positives
(false alarms).
DATA AND DEFINITIONS
Direct estimates based on empirical data are available for some
cell entries or marginal totals in the binary classification schema. By
combining data from several sources, making some unremarkable
assumptions, and using simple arithmetic, it is possible to derive
reasonable estimates of all remaining values for the complete matrix at
the national level and for the State of California. Doing so permits
analyses that address questions about disproportionality in a novel and,
hopefully, enlightening manner.
Data for the present analyses come from three sources.
Fourth National Incidence Study of Child Abuse and Neglect (NIS-4)
The first source is the Fourth National Incidence Study of Child
Abuse and Neglect (NIS-4), a report to Congress from the Administration
for Children and Families, U.S. Department of Health and Human Services
(Sedlak et al., 2010). NIS-4 is intended to provide estimates of the
incidence of child abuse and neglect in the United States, serving as
the nation's needs assessment on child abuse and neglect. NIS-4
included children who were investigated by CPS agencies, but also used a
sentinel survey methodology to obtain data on other children who were
recognized as maltreated by community professionals. NIS-4 estimates
therefore include both abused and neglected children who are in the
official CPS statistics and those who are not. NIS-4 is based on data
from a nationally representative sample collected during a three-month
study period that spanned 2005-2006. The NIS uses standard definitions
of abuse and neglect so that estimates of the numbers of maltreated
children and incidence rates have a calibrated, standard meaning across
various sites, sources, and cycles.
National Child Abuse and Neglect Data System (NCANDS)
The second data source is the National Child Abuse and Neglect Data
System (NCANDS). In particular, the analyses use data contained in Child
Maltreatment 2006 (U.S. Department of Health and Human Services, 2008),
which provides national and state statistics about child maltreatment
derived from data collected by CPS agencies. National statistics are
based primarily on case-level data. The present analysis used data from
Child Maltreatment 2006 rather than from more recent reports so that the
analyses combining data from NIS-4 and NCANDS would be based on the same
time period.
Child Welfare Services Reports for California
The third data source is the series of Child Welfare Services
Reports for California (Needell et al., 2010). The Child Welfare Dynamic
Report System is part of the California Child Welfare Performance
Indicators Project, reflecting a collaborative effort between the
California Department of Social Services and the University of
California at Berkeley. This data source is used for analyses at the
State of California level.
Endangerment and Harm Standards
The present analyses define maltreatment to include both abuse and
neglect and rely on the same standard definitions of maltreatment,
abuse, and neglect as used in the NIS-4, NCANDS, and California data
bases. NIS-4 uses two standards in estimating the incidence of child
maltreatment--the Harm Standard and the Endangerment standard. The Harm
Standard is relatively stringent in that it classifies a child as
maltreated only if he or she has already experienced demonstrable harm
as a result of maltreatment. Incidence estimates based on the
Endangerment Standard include all the Harm Standard children, but also
include children who were not yet harmed by maltreatment, but who
experienced abuse or neglect that placed them in danger of being harmed.
The two standards lead to substantially different estimates of the
incidence of child maltreatment.
Definitional and Data Issues
Definitions of maltreatment, abuse, and neglect are imprecise and
imperfect. The associated ambiguity is amplified because both policy
makers and case workers are forced frequently to dichotomize along a
continuous scale, drawing a line that distinguishes between behaviors
that are classified as abuse or neglect and those that fall just short
of that threshold. Adding further complication, some key parameter
values in the following analyses cannot be directly observed but must be
estimated or inferred. Despite the uncertainties, the analytic framework
used in this paper makes it possible to address certain significant
questions that would be difficult to assess in any other manner. Also
important, the framework makes it straightforward for interested parties
to re-do the analyses replacing estimated or inferred values with
estimates or inferences of their own choosing. Along these lines, the
present paper reports several sensitivity analyses in which certain key
assumptions were replaced with alternative plausible assumptions in
order to evaluate the degree to which the results are sensitive to such
changes in estimates of key parameter values.
Another significant factor related to data sources is that unlike
three previous NIS studies (NIS-1, with data from 1979-1980; NIS-2, with
data from 1986, and NIS-3 with data from 1993), the NIS-4 reported
statistically significant race differences in the incidence of
maltreatment, with higher rates in most cases for Black children than
for White or Hispanic children (Sedlak et al., 2010). Supplementary
analyses (Sedlak, McPherson, & Das, 2010) lead to the conclusion
that the statistically reliable race differences in rates of some
categories of child maltreatment found in NIS-4 is due at least partly
as a consequence of (1) the greater precision of the NIS-4 estimates and
(2) the enlarged gap between Black and White children in economic
well-being.
Discussions about appropriate interpretation of the NIS-4 results
are sure to continue for some time. The integrity of the analyses
reported in this paper does not depend, however, on whether the NIS-4
data provide evidence of statistically reliable, independent effects for
race. The present analyses focus solely on the ability to diagnose or
detect instances of child maltreatment. Nothing in the analyses depends
on assumptions about the causes of child maltreatment. Because the
analyses are concerned exclusively with diagnoses, not causes, it is
immaterial whether there are significant effects for race and, if there
are, whether such effects are wholly or partially explained by poverty
or other socio-economic predictors.
A final question concerns the appropriateness of combining data
from NCANDS and NIS-4, as is done in the analyses of national sample
data reported below. In support of this procedure, data files from
NCANDS were used in the design of NIS-4 (Sedlak, 2010, Acknowledgements
page), the NIS-4 report specifically notes parallels between its results
and those of NCANDS (Sedlak, 2010, p. 20), NCANDS was used as a basis
for computing annualization multipliers for the NIS data (Sedlak, 2010,
p. 2-6; 2-17), NCANDS data were used in making other statistical
adjustments to the NIS data (Sedlak, 2010, p. A-4), and the NIS is
described in terms of its extension beyond (not its differences with)
NCANDS (e.g., https://www.nis4.ore/faq.asp). Moreover, the data reported
in both NCANDS and NIS are collected from professionals who work in
child welfare or in other child services contexts. The biggest
differences between the two data sets is that NCANDS relies on an
administrative data extraction approach that makes use of the
definitions used by state-level child protective service agencies,
whereas NIS makes use of a professional survey methodology that employs
standardized definitions. A supplementary study comparing NIS-4 with
NCANDS is forthcoming from the Administration for Children and Families.
A study by Fallon et. al (2010) comparing NIS-3 and NCANDS found no
differences that would invalidate the analytic approach used in this
paper.
ANALYSES AND RESULTS
National sample statistics
A binary classification analysis of national data regarding
referrals and substantiations of child maltreatment for the Endangerment
Standard is given in Table 1. This serves as a base case analysis to
which subsequent analyses can be compared.
According to NIS-4 (Sedlak et al., 2010), the national incidence of
Endangerment Standard Maltreatment is 39.5 per 1,000 children, as shown
in the marginal entry for Total Maltreatment. According to Child
Maltreatment 2006 (U.S. Department of Health and Human Services, 2008),
the national incidence of referrals is 43.7 per 1,000 children, as shown
in the marginal entry for Total Referral. According to Child
Maltreatment 2006 (U.S. Department of Health and Human Services, 2008),
the national incidence of victimization (as indicated by substantiated
referrals) is 12.1 per 1,000 children, as shown in the cell entry for
Maltreatment/Referral.
All other cell values can be derived by simple arithmetic, based on
these three critical cell entries or marginal totals. Specifically, the
Total No Maltreatment marginal (960.5) is derived by subtracting the
Total Maltreatment marginal (39.5) from the Grand Total (1000). The No
Maltreatment/Referral cell (31.6) is derived by subtracting the
Maltreatment/Referral cell (12.1) from the Total Referral marginal
(43.7). The Maltreatment/No Referral cell (27.4) is derived by
subtracting the Maltreatment/Referral cell (12.1) from the Total
Maltreatment marginal (39.5). The No Maltreatment/No Referral cell
(928.9) is derived by subtracting the No Maltreatment/Referral cell
(31.6) from the Total No Maltreatment marginal (960.5). Finally, the
Total No Referral marginal (956.3) can be derived by subtracting the
Total Referral marginal (43.7) from the Grand Total (1000). The same
basic logic is used in constructing all tables used in subsequent
analyses.
The base case analysis yields the following values, shown in column
(1) of Table 2. The positive predictive value is .277; in other words,
27.7% of referrals are true positives, or "hits"--they were
substantiated as cases involving child maltreatment. (2) The negative
predictive value is .971; an estimated 97.1% of those not referred are
also not guilty of child maltreatment. These are correct rejections. At
the same time, the analysis implies that an estimated 2.9% of those who
were not referred are in fact guilty of child abuse or neglect; these
are "misses".
The sensitivity is .306; in other words, the analysis estimates
that 30.6% of all child maltreatment cases during 2006 were
substantiated by CPS agencies (and 69.4% were not). The false alarm rate
(1--specificity) is .033; an estimated 3.3% of all children who are not
a victim of child abuse or neglect were nonetheless referred to a CPS
agency. Accuracy is .941; an estimated 94.1% of all cases are correctly
classified as true positives (both guilty of maltreatment and referred)
or true negatives (both not guilty and not referred). In terms of the
relative frequency of the two possible types of errors, the false
positive/false negative ratio is 1.15 indicating that the rate of false
positives is slightly higher than the rate of false negatives.
The other columns in Table 2 present the results of sensitivity
analyses which vary certain key assumptions. In column (2), the analysis
is re-done using the more restrictive Harm Standard, which estimates
that the national incidence of child maltreatment is about half that
implied by the Endangerment Standard. The major consequence of changing
this assumption is to increase the value of sensitivity to .708. When
the Harm Standard rate is used, the analysis estimates that
approximately 70.8% of all child maltreatment cases are investigated and
substantiated. Using the Harm Standard changes the estimated frequency
of false positives so that it becomes more than six times the estimated
frequency of false negatives. (For the sake of brevity all analyses in
subsequent sections will use the Endangerment Standard.)
Columns (3) and (4) of Table 2 re-do the analysis redefining
"referral" to limit it to those cases that were screened in by
CPS agencies. Restricting the predictor to screened-in referrals
improves the positive predictive value considerably, to .449 as compared
to .277 when the analysis was based on all referrals regardless of
whether or not they were screened-in, investigated, or substantiated.
All other performance indices are little changed. (For the sake of
brevity, all analyses in subsequent sections will use total referrals.)
National Sample Statistics, By Race And Ethnicity
Binary classification analysis of national data can be used to
examine the extent of disproportionality during the referral and
substantiation stages of the child welfare entry process. Data for
Black, Hispanic, and White populations for the Endangerment Standard
appear in Table 3. (3) Certain differences among these populations are
readily apparent. As previously discussed, the NIS-4 study estimates
different child maltreatment incidence rates for the three groups: 49.6
of 1,000 for blacks, 30.2 for Hispanics, and 28.6 for Whites. As earlier
studies have found (e.g., Yaun et al., 2003), the total estimated rate
of referrals is considerably higher for Blacks than for the other two
groups (70.7 for Blacks, as compared to 38.6 for Hispanics and 38.2 for
Whites). The estimated rate of true positives, or hits,
(Maltreatment/Referral cell) is higher for Blacks (19.8) than for
Hispanics (10.8) and Whites (10.7), but so is the rate of errors. For
Blacks, there are an estimated 50.9 false positives (No
Maltreatment/Referral cell) as compared to 27.8 for Hispanics and 27.5
for Whites. There are also more false negatives (Maltreatment/ No
Referral cell) for Blacks (29.8) than for Hispanics (19.4) and Whites
(10.7). In other words, Blacks are referred at a rate more than 80%
higher than are Hispanics or Whites and they are about 85% more likely
to have that referral substantiated a true positive or "hit".
But Blacks are also about 80% more likely to be the subject of an
unsubstantiated allegation (a false positive, or false alarm) and
roughly 50% more likely not to be referred when abuse or neglect is
present (a false negative, or "miss".)
Summary statistics for these data, presented in Table 4, clarify
the nature and extent of racial and ethnic differences. The negative
predictive value is lower for Blacks (.968) than for the other two
groups (.980 for Hispanics and .981 for whites) because of the
comparatively higher rate of false negatives ("misses") for
Blacks. The sensitivity is higher for Blacks (.399) than for the other
two groups because a higher proportion of maltreatment cases are
referred and substantiated. The false alarm rate is also higher for
Blacks (.054), almost twice as high as the rate for Hispanics (.029) or
Whites (.028), indicating that it is more likely for Blacks to be
referred in a case that is not subsequently substantiated. The overall
accuracy measure is particularly instructive--it is lower for Blacks
(.919) than for Hispanics (.953) or Whites (.955)--the error rate is
roughly twice as high for Blacks as for the other groups, and the errors
are of both types. Finally, the False Positive/False Negative ratio (the
ratio of false alarms/misses) is higher for Blacks than for the other
two groups, which is to say that unsubstantiated referrals are
comparatively more frequent for Blacks than for the other two groups.
If conceptualized as a diagnostic system designed to detect child
abuse and neglect, the child welfare referral and substantiation system
clearly does not perform in the same manner for Blacks as for Hispanics
and Whites. Although a comparatively higher proportion of maltreatment
cases involving Blacks enters into the child welfare system, the system
is less accurate for Blacks than for the other groups, yielding a higher
rate of both false alarms and misses. These results provide support for
the conclusion that black children are both over-reported and
under-reported in the child welfare system (Barth, 2005).
State Of California Statistics, By Race And Ethnicity
A similar analysis addressing the issue of disproportionality was
conducted using 2008 data for the State of California. Analysis at the
state level was conducted for three reasons: First, it was important to
see if a similar pattern of disproportionality was observed with the
most recent available data. This was particularly important because the
recent NIS-4 study suggested that significant changes might be occurring
in what heretofore has been a relatively stable picture regarding
patterns of child neglect and abuse with respect to race and ethnicity.
Second, it was important to evaluate whether a similar analytic approach
as used with national data could be applied equally well at a lower
level of geographic aggregation. Third, to test the robustness of the
findings, it seemed wise to perform additional analyses in which the key
elements requiring estimation differed. (4)
Binary classification analysis of State of California data for 2008
was used to examine the extent of disproportionality during the referral
and substantiation stages of the child welfare entry process. Overall
data as well as data for Black, Hispanic, and White populations appear
in Table 5. As in the previous analyses, the NIS-4 study is used as a
basis for estimates of child maltreatment incidence rates: an overall
rate of 39.5 per 1,000, 49.6 out of 1,000 for blacks, 30.2 for
Hispanics, and 28.6 for Whites.
Certain differences among these populations are readily apparent.
For the 2008 California data, the total rate of referrals is higher for
Blacks than for the other two groups. The differences among groups are
even more pronounced in California than for the national sample. For
Black children the rate of referral is 115.1 per 1,000--a rate roughly
two and a half times that of the other groups--48.4 per 1,000 for
Hispanics and 40.2 per 1,000 for Whites, and 48.7 per 1,000 overall. The
rate of true positives, or hits, (Maltreatment/ Referral cell) is two to
three times higher for Blacks (24.6) than for the overall population
(9.7), Hispanics (10.1), or Whites (8.4). But, just as in the national
sample, the rate of errors associated with Black children is also
substantially higher than for the other groups.
The California data exhibit a different pattern of errors from that
in the national sample. The estimated rate of false negatives
(Maltreatment/ No Referral cell) is somewhat lower for Blacks (24.6)
than the overall false negative rate (29.8) although still somewhat
higher than for Hispanics (20.1) and Whites (20.2). Differences among
groups in the false positive rates, however, are marked. For Blacks, the
false positive rate (No Maltreatment/ Referral cell) is 90.1; this
compares to an overall rate of 39 and rates of 38.3 for Hispanics and
31.8 for Whites. In other words, Blacks are involved in unsubstantiated
referrals at a rate about 2.3 times the overall rate and nearly three
times the rate for Hispanics and Blacks. In California, Blacks are more
likely than Hispanics or Whites to be referred, more likely to be
involved in a substantiated referral, and much more likely to be
involved in an unsubstantiated referral.
Summary statistics for these data, presented in Table 6, clarify
the nature and extent of racial and ethnic similarities and differences.
The analyses indicate that the positive predictive values are virtually
identical across all groups, ranging from a low of .199 to a high of
.217. This indicates that the percentage of referrals that are
substantiated--roughly 20%--is essentially the same for all groups. The
analysis thus provides support for the conclusion reached by Fluke and
colleagues (2003), who concluded that disproportionality appears to be
more pronounced at some decision making points in the process than
others. In this instance, while there are substantial differences
between groups in the rate of referral there is little difference in
terms of the percentage of referrals that are later substantiated.
The sensitivity for Blacks (.504) is much higher than for the
overall group (.246), Hispanics (.334), or Whites (.294). This indicates
that a much higher proportion of abused or neglected Black children
enter the child welfare system than do abused or neglected Hispanic or
White children. The analysis estimates that about half of the estimated
instances of child maltreatment among Black children resulted in entry
into the child welfare system, as compared to only a quarter of such
instances for the overall population and a third or fewer of the cases
involving Hispanic and White children. The comparatively high value for
sensitivity for Blacks is accounted for at least partially by the fact
that Black children are two to three times more likely to be referred
into the system. The high rate of referral is also a factor, however, in
the false alarm rate of .095 for Blacks in California during 2008. This
is almost twice as high as the false alarm rate for Blacks in the 2006
national sample. Further, it is two to three times higher than the false
alarm rate for the overall California population (.041), for Hispanics
(.039) and for Whites (.033). As with the national data, the overall
accuracy measure is instructive--it is substantially lower for Blacks
(.885) than for the overall population (.931), Hispanics (.942), or
Whites (.948). The error rate is roughly twice as high for Blacks as for
the other groups. These are primarily false positive errors--false
alarms in which Blacks are referred into the system but the allegations
are not substantiated. In California, as at the national level, if the
child welfare referral and substantiation system is thought of as a
diagnostic system designed to detect child abuse and neglect, the system
makes far more diagnostic errors for blacks than for the other major
racial and ethnic groups. Finally, the False Positive/False Negative
ratio (the ratio of false alarms/misses) is much different for Blacks
than for other groups. The ratio for Blacks (3.66) shows that the system
is far more likely to produce false positive errors as opposed to false
negative ones, when compared to the overall population (1.31), Hispanics
(1.91), or Whites (1.57).
DISCUSSION: COULD RANDOM ERROR ACCOUNT FOR THE OBSERVED RESULTS?
In much the same way that mammograms help to detect and diagnose
breast cancer or PSA tests help to detect and diagnose prostate cancer,
the system of referrals, investigations, and substantiation that
constitute the front end of the child welfare system can be
conceptualized as a complex screening system for detecting and
diagnosing child maltreatment. If viewed in this manner, how well does
the system perform overall? Further, does it function equally well for
all racial and ethnic groups? And what role, if any, does it play in the
disproportional representation of minorities in the child welfare
system? Analyses of a 2006 national sample suggested that just over 30%
of all child abuse and neglect cases were indentified and substantiated
by CPS agencies. This means, of course, that about 70% of all cases were
not detected. (5) A little over a quarter of all referrals were
substantiated. The false alarm rate was a little over 3%. The rate of
false positive errors was somewhat higher than the rate of false
negative errors. The question "how good is this level of
performance?" goes far beyond the scope of this paper. Clearly,
there is substantial room for improvement.
In any case, we can address questions about whether the referral
and substantiation components of the child welfare system operated in
the same manner for all racial and ethnic groups and whether these
components contributed to the disproportional representation of
minorities in the child welfare system. The short answer to the first
question is "No" and the short answer to the second question
is "Yes".
The analysis of both national and California data indicated that
the Blacks are treated differently from other groups in a number of
respects:
The rate of referrals for Blacks is higher than for other groups.
This has been widely recognized but the present analysis underscores its
fundamental importance. The rate of substantiation of referrals (the
positive predictive value) was found to be approximately the same for
Blacks as for other groups, suggesting that disproportionality was not
appreciably amplified during the screening-in, investigation, or
substantiation stages of the process. (For a supporting conclusion
concerning the use of standardized risk assessment instruments, see
Baird et al., 1999.) This means that a primary driver of
disproportionality appears at the earliest stage of the process when
referrals are made by mandated reporters and other sources.
The rate of true positives is also higher for Blacks than for other
groups. Concretely speaking, this means that Blacks are
disproportionally represented in terms of substantiated referrals. This
is partially attributable to the fact that the estimated rate of
maltreated children is higher for Blacks than for other groups. From
this perspective, disproportional representation might be partially
attributable to greater levels of need. But the study also found that
sensitivity (the proportion of positives identified as true positives)
was higher for Blacks than for other groups. This difference is subject
to various interpretations. A simple interpretation could be that the
differences in sensitivity--the greater likelihood that instances of
maltreatment will be referred and substantiated--contribute to the
overrepresentation of Blacks in the front end of the child welfare
system. An equally plausible interpretation might be that it is a
symptom of the underrepresentation of Hispanics and Whites. Yet another
interpretation might be that all groups are underrepresented, although
Blacks somewhat less so, since the rate of substantiated cases
identified by CPS agencies is far lower than the incidence rate
suggested by the NIS-4 needs assessment.
The data analyses strongly support the conclusion that the process
is simply less accurate for Blacks than for other groups. The accuracy
statistic for Blacks was lower than for other groups at both national
and California levels. This implies a greater rate of errors for Blacks
than for Hispanics or Whites, which is precisely what was found. For the
national sample, more errors of both types were found--false negatives
(misses), or failures to detect and diagnose cases of maltreatment, as
well as false positives (false alarms), or cases involving referrals
that were not substantiated. For the California analysis, the level of
accuracy for Black was even lower than for the national sample and the
errors were more likely to be false positives than false negatives.
Researchers in child welfare have repeatedly found evidence of
racial and ethnic disproportionality in the child welfare system, just
as was found in the present study. Identifying the causes of
disproportionality has proved to be a difficult task, however. The
pattern of results from the present analysis suggests that one possible
and heretofore neglected explanation for disproportionality is random
error.
Systematic error in the form of prejudice or bias, leading to
discrimination based on either individual or institutional racism, has
often been cited as a possible cause of disproportionality or disparity.
It is not generally recognized, however, that disproportional
representation might arise from lack of accuracy or reliability stemming
from random errors, which may be thought of as simply "honest
mistakes." A few examples involving a hypothetical mandated
reporter help to illustrate this point. Imagine a population of 1000
cases of potential child abuse or neglect with a true incidence rate of
10%. If the reporter were perfectly valid and reliable, we would observe
the results in Table 7. The reporter would refer the 100 cases in which
maltreatment was present and would not refer the 900 cases in which
maltreatment was not present.
Systematic bias would result in a different pattern of results.
Imagine that the reporter exhibits systematic bias against minority
group families, but not majority group families, such that all positive
cases among minority group families are correctly diagnosed, but 10% of
the cases in which no maltreatment is present are incorrectly diagnosed
as positive, as shown in Table 8. This is precisely the situation that
discussions of measurement error relating to disproportionality commonly
focus on--a biased test that consistently overestimates pathology, or
underestimates positive attributes, in underrepresented minority groups.
In this hypothetical example, 190 referrals would be made, an increase
of 90 over the 100 referrals that would have been made by a perfectly
valid and reliable reporter. Ninety of these 190 referrals, however,
would be false positive errors--mistakes involving over-diagnosis of
maltreatment among minority group families.
Alternatively, imagine a reporter who is competent but human--in
other words, the reporter is valid (usually gets the right answer and is
not systematically biased), but is not perfectly reliable (i.e., this
reporter sometimes makes a mistake). Further, imagine that the reporter
is more unreliable for minority group families than for majority group
families. For example, suppose that for 1000 majority group families,
the reporter makes diagnostic misclassifications 10% of the time, with
symmetrical error rates for both false positive and false negative
errors. This would yield the pattern of results presented in Table 9.
The reporter would refer 180 cases and make 100 errors--referring 90
cases that shouldn't be (false positive errors) and failing to
refer 10 cases that should have been referred (a false negative error).
Suppose that for minority group families, the reporter is also
reasonably accurate, however, he or she makes mistakes more frequently.
Assume the reporter makes diagnostic misclassifications 20% of the time,
with symmetrical error rates for both false positives and false
negatives. For 1000 minority families, this would yield the pattern of
results presented in Table 10. The reporter would refer 260 cases and
make 200 errors. Note that more minority group families are referred
(26%) in this example than in the previous example (Table 9) involving
majority group families (18%). Further, twice as many errors would be
made for the minority group families as compared to majority group
families, and these would be of both types. There would be twice as many
false positive errors (making referrals when maltreatment was not
present) for minority group families (180) than for majority group
families (90). Similarly, there would be twice as many false negative
errors (not making referrals when maltreatment was present) for minority
group families (20) than for majority group families (10).
These hypothetical analyses illustrate that disproportionality may
occur even without systematic bias (for an analysis of the same type of
phenomenon in the context of college admissions, see Mumpower, Nath,
& Stewart, 2002). More minority group members may be referred in
child maltreatment cases (or be arrested and arraigned, or assigned to
special education, and so forth) not as a result of systematic bias or
even treatment disparities, but because reporters' diagnostic
judgments for minority group members involve a greater degree of random
error than for majority group members.
At this point it is simply speculative whether random error could
help to explain the observed results for Blacks in the child welfare
system, as suggested by the above hypothetical example. Note, however,
the similarities between the observed data and the hypothetical examples
cited above. In comparison to the majority group, one would expect to
observe (1) lower rates of accuracy; (2) disproportionately many
referrals; (3) disproportionately more errors of both types; and (4) not
much difference in the number of true positives. This is just what was
found for Blacks in the data analyses reported in earlier sections of
this paper.
Further, the contribution of random error to disproportionality
could be amplified if reporters lower their threshold for referrals to
compensate for lower levels of accuracy for minority group members.
Suppose that reporters recognized that they are less accurate for
minority group members but wanted to avoid the false negative problem of
failing to refer possible cases of child maltreatment. Adopting such a
precautionary principle, they might then lower the threshold necessary
for referral. Increasing the rate of referrals in this manner will
generally lead to an increase in the rate of true positives, but will
also further increase the rate of false positives. (For a discussion of
the tradeoffs involved in raising or lowering admission thresholds in
emergency psychiatry, see Way et al., 1998).
Clearly, the proposed mechanism does not account in a wholly
satisfactory fashion for all the observed data. In particular, the
proposed mechanism does not provide a satisfactory account for why the
data for Hispanics appears much more similar to that for Whites than
that for Blacks, an outcome that is reminiscent of the so-called
Hispanic paradox in public health (Franzini et al., 2001) in which
Hispanics have been found generally to have substantially better health
than would be predicted on the basis of socioeconomic risk factors. On
the other hand, despite repeated efforts to do so, research has yet to
uncover clear evidence to support the proposition that
disproportionality results largely from systematic discrimination at
either the individual or institutional level. Perhaps both systematic
and random errors play a role in the disproportional representation of
Black children in the child welfare system.
CONCLUSION
The present study demonstrated that the front end of the child
welfare services system--the referral and substantiation
components--does not function the same for Blacks as it does for other
racial and ethnic groups in terms of diagnosing and detecting instances
of child maltreatment. Blacks are disproportionately represented in
terms of their referral rate into the system. Further, the system is
less accurate for Blacks--the rate of correct diagnoses is lower and the
rate of errors, especially false positive errors, is higher than for
other groups. Instances of child maltreatment for Blacks are generally
detected at a proportionally higher rate than for other groups but this
attributable largely to the higher rate of referral. In short, the
system does not perform in the same manner for Blacks as it does for
other racial and ethnic groups. A series of hypothetical examples were
used to demonstrate that random error could produce a pattern of results
much like that observed for Black children in the present study.
If random error plays an important role in accounting for the
observed results, what can be done to change the situation for the
better? From an analytic standpoint the answer is easy--the level of
accuracy for Blacks needs to be improved so that it is at least as good
as it is for Hispanics, Whites, and other racial and ethnic groups. (For
analogous results demonstrating the key role of accuracy in this regard,
see the analysis by Mumpower et al. (2002) on affirmative action
policies in college admissions and Way et al. (1998) on admissions to
psychiatric emergency rooms.) Of course, this begs the question of how
to go about improving accuracy. Clearly, this supports the critical
importance of education and training, but discussion about how to
accomplish this goal goes beyond the scope of the present paper.
Finally, several caveats should be issued about the limitations of
the present study. Most notably, there are four.
First, the paper ignored altogether the "back end" of the
child welfare system which is an important contributor to
disproportional representation of minorities in the child welfare system
(e.g., Courtney & Sklyes, 2003; Derezotes et al, 2005). This is not
to say that the types of placements that children of different racial
and ethnic groups go into, the likelihood of reunification and the
likelihood of timely adoption or guardianship are not important
contributors to the phenomenon of disproportionality. Clearly, they are,
but they lie beyond the scope of the present paper.
Second, the present analysis does not address the question of
whether Blacks and other groups are reported for child maltreatment at
similar or different rates when controlling for poverty. Using statewide
data from Missouri, Drake et al. (2009) did not find high levels of
racial disproportionality once poverty was controlled for. Likewise the
supplementary analyses of race differences in child maltreatment rates
in NIS-4 (Sedlak, McPherson, & Das, 2010) found that race did not
have significant independent predictive power for most (but not all)
measures of child maltreatment after taking into account poverty and
other correlated predictors. Nothing in the present analyses, however,
relies on any assumptions about the causes underlying child
maltreatment. If race has absolutely no independent predictive power
after controlling for poverty and other risk factors, the major
conclusion of the present paper would be unchanged: the system makes
more diagnostic errors of both types--false positives and false
negatives--for Blacks than for Whites or Hispanics. It would likely be
quite informative to reanalyze the data in terms of class rather than
race, if that were possible, but to date the required data are not
available.
Third, the present paper implicitly assumes that the conceptual and
operational definitions of child maltreatment used by NCANDS, NIS and
similar sources are appropriate for all racial and ethnic groups and
that their incidence estimates are accurate and stable. Each of these
assumptions is probably on shakier ground than we might hope. The
present analysis treats unsubstantiated referrals as if they signified
"no maltreatment" and thus classifies them as false alarms.
Some in the child welfare community (e.g., Besharov, 1993; 2000)
have argued that there is a substantial problem with over-reporting of
child abuse such that cases of inadequate cognitive and social nurturing
are inappropriately labeled child neglect or child abuse. Such false
alarms, it is argued, lead to inappropriate disruption of families who
would have benefited more from supportive intervention. Others have
concluded that the empirical evidence demonstrates few, if any,
significant clinical differences between substantiated and
unsubstantiated referrals in terms of the clinical services that they
require or receive (Drake, 1966; Hussey et al., 2005; Kohl et al.,
2009). Based on an analysis of data the National Survey of Child and
Adolescent Well Being, the Administration for Children and Families
(U.S. Department of Health and Human Services, N.D.) concluded that
children with substantiated cases of maltreatment do not appear to fare
more poorly than children in unsubstantiated cases and that children in
unsubstantiated maltreatment cases may have as many social service needs
as those in substantiated cases. They report, however, that caseworkers
perceived greater social service needs among those with substantiated
cases than among those with unsubstantiated cases.
The implications of this debate for the present analysis are not
clear. If one accepts the point of view that unsubstantiated referrals
simply represent mistakes, then it is clearly appropriate to classify
these as false alarms. But, even if differences between substantiated
and unsubstantiated cases are negligible in terms of service provision,
the primary conclusion stands: the front end of the child welfare
services system--the referral and substantiation components--does not
function the same for Blacks as it does for other racial and ethnic
groups. Substantiation is an imperfect proxy for the variable that we
are truly interested in--child maltreatment--and it dichotomizes a
continuously distributed variable with attendant problems for analysis.
Despite its imperfections, substantiation remains a widely reported and
analyzed variable in child welfare and the present analysis reveals
distinct differences among Blacks and other racial and ethnic groups in
terms of typical patterns of referrals and substantiation.
Fourth, and finally, the present analyses have made use of the best
available point estimates of relevant rates of referral and
substantiation, for both the overall population and for ethnic and
racial subgroups. It is important to remember that the sample data from
which those point estimates are derived are of imperfect validity and
reliability. Moreover, in the type of two-by-two table used for most of
the analyses the error terms within cells are necessarily not
independent. The present analyses represent a good effort based on the
best available data to estimate key parameters relating to hits, misses,
and false alarms at the front end of the child welfare system but the
results should be interpreted with appropriate caution given the
fallibility of the data upon which they are based.
Thanks are due to Prof. Edwina L. Dorch, Prof. Leroy H. Pelton, and
an anonymous reviewer for constructive criticism and comments on this
paper. Any remaining shortcomings of the paper are the sole
responsibility of the author.
REFERENCES
Ards, S.D., Meyers, S.L., Malkis, A ., Sugrue, E., & Zhou, L.
(2003). Racial disproportionality in reported and substantiated child
maltreatment and neglect: An examination of systematic bias. Child and
Youth Services Review,, 25, 375-392.
Baird, C., Ereth, J., & Wagner, D. (1999). Research-based risk
assessment: Adding equity to CPS decision making. Madison, WI:
Children's Research Center.
Barth, R.P. (2005). Child welfare and race: Models of
disproportionality. In D.M. Derezotes et al. (Eds.) Race matters in
child welfare: The overrepresentation of African American children in
the system. Washington, D.C.: CWLA Press.
Bartholet, E. (2009). The racial disproportionality movement in
child welfare: False facts and dangerous directions. Arizona Law Review,
51, 873-932.
Besharov, D. (1993). Overrreporting and underreporting are twin
problems. In R. J. Gelles & D. R. Loseke (Eds.), Current
controversies on family violence. Newbury Park, CA: Sage, 257-272.
Besharov, D. J. (2000). Child abuse realities: Over-reporting and
poverty. Virginia Journal of Social Policy and the Law, 8, 165-203.
Billingsley, A., & Giovannoni, J. M. (1972). Children of the
storm: Black children and American child welfare. New York: Harcourt,
Brace, Jovanovich.
Casey Family Programs. (2006). Disproportionality in the child
welfare system: The disproportionate representation of children of color
in foster care. Retrieved on March 2, 2010, from
http://www.ncsl.org/print/cyf/fostercarecolor.pdf
Chapin Hill Center for Children. (2008). Understanding racial and
ethnic disparity in child welfare and juvenile justice. Chicago: Chapin
Hall Center for Children at the Univesity of Chicago.
Chibnall, S., Dutch, N. M., Jones-Harden, B., Brown, A., Gourdine,
R., Smith, J., Boone, A., & Snyder, S. (2003). Children of color in
the child welfare system: Perspectives from the child welfare community.
Washington, DC: U.S. Department of Health and Human Services,
Children's Bureau.
Courtney, M., & Sklyes, A. (2003). Racial disproportionality in
the child welfare system. Child and Youth Services Review, 25, 355-358.
Derezotes, D. M., Poertner, J. & Testa, M.F. (2005). Race
matters in child welfare: The overrepresentation of African American
children in the system. Washington, D.C.: CWLA Press.
Drake, B. (1996). Unraveling "unsubstantiated." Child
Maltreatment, 1, 168-175.
Drake, B., Lee, S. M., & Jonson-Reid, M. (2009). Race and child
maltreatment reporting: Are Blacks overrepresented? Child and Youth
Services Review,, 31, 309-316.
Fallon, B., Trocme, N., Fluke, J., MacLaurin, B., Tonmyr, L., &
Yuan, Y.Y. (2009). Methodological challenges in measuring child
maltreatment. Child Abuse and Neglect, 34, 70-79.
Franzini, L., Ribble, J. C., & Keddie, A. M. (2001).
Understanding the Hispanic paradox. Ethnicity and Disease, 11, 496-518.
Green, D.M., & Swets J.A. (1966). Signal detection theory and
psychophysics. New York: Wiley
Hill, R. B. (2006). Synthesis of research on disproportionality in
child welfare: An update. Casey-CSSP Alliance for Racial Equity in the
Child Welfare System. Retrieved on March 2, 2010 from
http://www.racemattersconsortium.org/docs/BobHillPaper_FINAL.pdf
Hussey, J.M., Marshall, J.M., English, D.J., Knight, E.D., Lau,
A.S., Dubowitz, H., et al. (2005). Defining maltreatment according to
substantiation: Distinction without a difference? Child Abuse and
Neglect, 29, 479-492
Kohavi, R., & Provost, F. (1998). Guest editor's
introduction: On applied research in machine learning. Machine Learning,
30, 127-32
Kohl, P. L., Jonson-Reid, M., & Drake, B. (2009). Time to leave
substantiation behind: Findings from a national probability study. Child
Maltreatment, 14, 17-26.
Mumpower, J. L., Nath, R., & Stewart, T. R. (2002). Affirmative
action, duality of error, and the consequences of mispredicting the
academic performance of African-American college applicants. Journal of
Policy Analysis and Management, 21, 63-77.
Needell, B., Brookhart, M.A., & Lee, S. (2003). Black children
and foster care placement in California. Child and Youth Services
Review, 25, 375-392.
Needell, B., Webster, D., Armijo, M., Lee, S., Dawson, W.,
Magruder, J., Exel, M., Glasser, T., Williams, D., Zimmerman, K., Simon,
V., Putnam-Hornstein, E., Frerer, K., Cuccaro-Alamin, S., Lou, C., Peng,
C., Holmes, A. & Moore, M. (2010). Child Welfare Services Reports
for California. Retrieved 3/18/2010, from University of California at
Berkeley Center for Social Services Research website. URL:
http://cssr.berkeley.edu/ucb_childwelfare
Neyman, J., & Pearson, E.S. (1933). On the problem of the most
efficient tests of statistical hypotheses. Philosophical Transactions of
the Royal Society, Series A 231, 289-337.
Neyman, J., & Pearson, E.S. (1936). Sufficient statistics and
uniformly most powerful test of statistical hypotheses. Statistical
Research Memoirs 1936, 1, 113-137.
Ruscio, J. (1998). Information integration in child welfare cases:
An introduction to statistical decision making. Child Maltreatment, 3,
145-156.
Sedlak, A.J., McPherson, K., & Das, B. (2010). Supplementary
Analyses of Race Differences in Child Maltreatment Rates in the NIS-4.
Washington, DC: U.S. Department of Health and Human Services,
Administration for Children and Families.
Sedlak, A.J., Mettenburg, J., Basena, M., Petta, I., McPherson, K.,
Greene, A., & Li, S. (2010). Fourth national incidence study of
child maltreatment and neglect (NIS-4): Report to Congress. Washington,
DC: U.S. Department of Health and Human Services, Administration for
Children and Families.
Shaw, T. V., Putnam-Hornstein, E., Magruder, J., & Needell, B.
(2008). Measuring racial disparity in child welfare. Child Welfare, 87,
23-36.
Shlonsky, A., & Wagner, D. (2005). The next step: Integrating
actuarial risk assessment and clinical judgment into an evidence-based
practice framework in CPS case management. Child and Youth Services
Review, 27, 409-427.
Swets, J.A. (1996). Signal detection theory and ROC analysis in
psychology and diagnostics: Collected papers. Hillsdale, NJ: Lawrence
Erlbaum Associates, Inc.
Swets, J.A., Dawes, R.M., & Monahan, J. (2000). Psychological
science can improve diagnostic decisions. Psychological Science in the
Public Interest, 1, 1-26.
Taylor, H.C., & Russell, J.T. (1939). The relationship of
validity coefficients to the practical applications of tests in
selection. Journal of Applied Psychology, 23, 565-578.
U.S. Department of Health and Human Services, Administration on
Children, Youth and Families. (2008). Child Maltreatment 2006.
Washington, DC: U.S. Government Printing Office.
U.S. Department of Health and Human Services, Administration on
Children, Youth and Families. (ND). Does substantiation of child
maltreatment relate to child well-being and service receipt? Findings
from the NCSAW Study: Research Brief No. 9.. Washington, DC. Retrieved
on July 26, 2010 from
http://www.acf.hhs.gov/programs/opre/abuse_neglect/nscaw/
reports/substan_child/s ubstan_child.pdf
U.S. Government Accountability Office. (2007). African American
Children in Foster Care: Additional HHS Assistance Needed to Help States
Reduce the Proportion in Care. Washington, DC: U.S. Government Printing
Office. (GAO-07-816)
Way, B. B., Allen, M. H., Mumpower, J. L., Stewart, T. R., &
Banks, S. M. (1998). Interrater agreement among psychiatrist in
psychiatric emergency assessments. American Journal of Psychiatry, 155,
1423-8.
Yaun, J., Hedderson, J., & Curtis, P. (2003). Disproportionate
representation of race and ethnicity in child maltreatment:
Investigation and victimization, Children and Youth Services Review, 25,
359-373.
JERYL L. MUMPOWER
Texas A&M University
(1) The Taylor-Russell framework (Taylor & Russell, 1939)
specifies a similar analysis schema. Likewise, in computer science, a
similar approach is referred to as a confusion matrix (Kohavi &
Provost, 1998).
(2) There is substantial debate in the literature about the
validity of substantiation as an indicator of maltreatment, as discussed
further in the concluding section.
(3) NIS restricts its breakdowns by race and ethnicity to Blacks,
Hispanics, and Whites. Other groups are too small to permit
statistically reliable estimates. For this reason, the present analyses
are also restricted to these same three groups. Because NIS does not
provide data that breaks down substantiation rates by race, the analyses
presume that the overall rate is the same across groups. The validity of
this assumption is unverified, but data from the State of California,
presented in the subsequent section, suggests that it is not an
unreasonable one.
(4) Specifically, unlike the national data, the State of California
data provided a direct measure of the ratio of substantiated to
unsubstantiated referrals for each of the three major racial and ethnic
groups. On the other hand, estimates of the overall incidence of child
maltreatment in this analysis had to rely on national level 2006 data
from NIS-4.
(5) These estimates are based on the Endangerment Standard. If the
Harm Standard is used instead, the percentages are essentially reversed.
The analysis would estimate that approximately 71% of all cases are
detected and 29% are missed.
Table 1
2006 National Child Welfare Referral and Substantiation Data,
Endangerment Standard (Incidence Rates per 1,000 children)
No Referral Referral Total
Maltreatment 27.4 12.1 (3) 39.5 (1)
No Maltreatment 928.9 31.6 960.5
Total 956.3 43.7 (2) 1000.0
(1) Source: NIS-4, Table 3-3 (Sedlak et al., 2010) 3-3 (U.S.
Dept. of Health and Human Services, 2008)
(2) Source: Child Maltreatment 2006, Table 2-1 (U.S. Dept. of
Health and Human Services, 2008)
(3) Source: Child Maltreatment 2006, Table
Table 2
Summary Statistics for 2006 National Child Welfare
Referral and Substantiation Data
All Referrals Screened-In Referrals
Only
(1) Base Case: (2) (3) (4)
Endangerment Harm Endangerment Harm
Standard Standard Standard Standard
Incidence rate 39.5 17.1 39.5 17.1
Positive
predictive
value 0.277 0.277 0.449 0.449
Negative
predictive
value 0.971 0.995 0.972 0.995
Sensitivity 0.306 0.708 0.306 0.708
False
Alarm rate
(1-specificity) 0.033 0.032 0.015 0.016
Accuracy 0.941 0.963 0.958 0.980
FP/FN
ratio 1.15 6.32 0.54 2.97
Table 3
2006 National Child Welfare Referral and Substantiation Data,
Endangerment Standard (Incidence Rates per 1,000 children), by
Race and Ethnicity
Black
No Referral Referral Total
Maltreatment 29.8 19.8 (2) 49.6 (1)
No Maltreatment 899.5 50.9 (3) 950.4
Total 929.3 70.7 1000.0
Hispanic
No Referral Referral Total
Maltreatment 19.4 10.8 (2) 30.2 (1)
No Maltreatment 942.0 27.8 (3) 969.8
Total 961.4 38.6 1000.0
White
No Referral Referral Total
Maltreatment 17.9 10.7 (2) 28.6 (1)
No Maltreatment 943.9 27.5 (3) 971.4
Total 961.8 38.2 1000.0
(1) Source: NIS-4, Table 4-4 (Sedlak et al., 2010)
(2) Source: Child Maltreatment 2006, Table 3-11 (U.S. Dept. of
Health and Human Services, 2008)
(3) Source: Estimate based on Child Maltreatment 2006, Tables
2-1 and 3.3 (U.S. Dept. of Health and Human Services, 2008).
Table 4
Summary Statistics for 2006 National Child Welfare Referral and
Substantiation Data, Endangerment Standard (Incidence Rates per
1,000 children), by Race and Ethnicity
Black Hispanic White
Incidence rate 49.6 30.2 28.6
Positive predictive value 0.280 0.280 0.280
Negative predictive value 0.968 0.980 0.981
Sensitivity 0.399 0.358 0.374
False Alarm rate (1-specificity) 0.054 0.029 0.028
Accuracy 0.919 0.953 0.955
FP/FN ratio 1.71 1.43 1.54
Table 5
2008 State of California Child Welfare Referral and
Substantiation Data, Endangerment Standard (Incidence Rates per
1,000 children), by Race and Ethnicity (1)
Overall (n=10,003,896)
No Referral Referral Total
Maltreatment 29.8 9.7 39.5 (2)
No Maltreatment 921.5 39.0 960.5
Total 951.3 48.7 1000.
Hispanic (n=4,891,254)
No Referral Referral Total
Maltreatment 20.1 10.1 30.2 (2)
No Maltreatment 931.5 38.3 969.8
Total 951.6 48.4 1000.
Black (n=585,702)
No Referral Referral Total
Maltreatment 24.6 25 49.6 (2)
No Maltreatment 860.3 90.1 950.4
Total 884.9 115.1 1000.
White (n=3,103,380)
No Referral Referral Total
Maltreatment 20.2 8.4 28.6 (2)
No Maltreatment 939.6 31.8 971.4
Total 959.8 40.2 1000.
(1) All data from Needell et al. (2010) unless otherwise noted
(2) Source: NIS-4, Table 4-4 (Sedlak et al., 2010)
Table 6
Summary Statistics 2008 State of California Child Welfare
Referral and Substantiation Data, Endangerment Standard
(Incidence Rates per 1,000 children), by Race and Ethnicity
Overall Black Hispanic White
Incidence rate 39.5 49.6 30.2 28.6
Positive predictive value 0.199 0.217 0.209 0.209
Negative predictive value 0.969 0.972 0.979 0.979
Sensitivity 0.246 0.504 0.334 0.294
False Alarm rate (1-specificity) 0.041 0.095 0.039 0.033
Accuracy 0.931 0.885 0.942 0.948
FP/FN ratio 1.31 3.66 1.91 1.57
Table 7
Hypothetical Diagnostic Results for a Perfectly Valid and
Reliable Child Welfare Services Reporter
Not Referred Referred Totals
Maltreatment 0 100 100
Present (False (True
Negatives) Positives)
No maltreatment 900 0 900
present (True (False
Negatives) Positives)
Referral Status 900 0 1000
Total Errors = False Negatives + False Positives = 0 + 0 = 0
Table 8
Hypothetical Diagnostic Results for a Systematically Biased
Reporter for Minority Group Families
Not Referred Referred Totals
Maltreatment Present 0 100 100
(False Negatives) (True Positives)
No maltreatment 810 90 900
present (True Negatives) (False Positives)
Referral Status 810 190 1000
Total Errors = False Negatives + False Positives = 0 + 90 = 90
Table 9
Hypothetical Diagnostic Results for an Imperfectly Reliable
Reporter for Majority Group Families
Not Referred Referred Totals
Maltreatment 10 90 100
Present (False Negatives) (True Positives)
No maltreatment 810 90 900
present (True Negatives) (False Positives)
Referrals 820 180 1000
Total Errors = False Negatives + False Positives = 10 + 90 = 100
Table 10
Hypothetical Diagnostic Results for an Imperfectly Reliable
Reporter for Minority Group Families
Not Referred Referred Totals
Maltreatment 20 80 100
Present (False Negatives) (True Positives)
No maltreatment 720 180 900
present (True Negatives) (False Positives)
Referrals 740 260 1000
Total Errors = False Negatives + False Positives = 20 + 180 = 200
Figure 1
Binary Classification Analysis Schema for the Child Welfare System
No Referral Referral
Maltreatment False True Positives/ Total,
Negatives/ "Hits" (TP) Maltreated
"Misses" (FN) Children (TP +
FN)
No Maltreatment True Negatives/ False Total,
"Correct Positives/ Non-Maltreated
Rejections" "False Alarms" Children (TN +
(TN) (FP) FP)
Total, Total, Referred Grand Total (TP
Non-Referred Children (TP + + TN + FP + FN)
Children (TN + FP)
FN)