The Measure of Things.
Witkin, Stanley L.
While you and i have lips and voices which are for kissing and to
sing with who cares if some oneyed son for a bitch invents an instrument
to measure Spring with?
--e. e. Cummings
Measurement is everywhere. Its myriad applications are encountered
every day. Think about this morning. Did you awaken to a sound triggered
by your time measuring device? Did you step on a weight-measuring device
when you stumbled into the bathroom? Did you fill your car with a
particular amount of fuel as measured by a fuel-dispensing machine?
Social work too is replete with measurement. Measurement
instruments inform us about a wide range of things--from a person's
psychological state to her or his aptitude for a particular occupation.
In fact, the pathway to clienthood almost always includes
measurement-based information. Often, this information is expressed as
numbers, giving it an air of objectivity and trustworthiness. Converting
concepts to numbers also generates a need for expert interpretation
giving test information a certain authority. These qualities have led
social workers to rely on measurement instruments for credible and
useful information. Yet, despite this reliance, an aura of mystery
surrounds measurement that keeps social workers uncharacteristically
silent about this influential source of information.
My own social work education taught me about the mystery and power
of measurement. My first-year MSW field placement was at a family
services agency. The director, who called herself a psychiatric social
worker, believed that psychological tests could be an important addition
to our social work practice. She hired a psychologist, someone named Dr.
Gibeau, to teach us how to administer and interpret some psychological
tests, most notably the Minnesota Multiphasic Personality Inventory (MMPI). Developed around 1940 to measure psychopathology, the MMPI (at
the time I was learning about it) consisted of about 540 true-false
questions on a range of topics. Scores on the MMPI were graphed across
its 10 clinical subscales and three "validity" scales. To a
neophyte social worker, this scientific-looking display was impressive,
but not very informative. For Dr. Gibeau, however, interpreting an MMPI
profile was like reading a book. I remember watching in awe as he
effortlessly spun a narrative from the squiggly line t hat stretched
across the scales. He would say something like, "This 19-year--old
female has periodic suicidal ideation--probably a long-term problem
precipitated by being on her own. She has a tendency to run away from
problems or retreat into fantasy. The therapeutic task is to get her to
interpret reality more accurately." While telling this story, Dr.
Gibeau would point out the various scales and combinations of scales
that supported his interpretation.
We were believers. Using this instrument of psychological science,
we could gain access to the inner recesses of our clients' psyches,
revealing their fears and pathologies, even those of which they
themselves were unaware. There was no fooling the test. Attempts to
"fake good" would be detected by the validity scales, which
were designed for this purpose. Yes, this powerful tool would reveal a
truth of which we were previously unaware. We could not wait to begin
using it.
Administering the MMPI felt great. No longer was I merely one more
insecure social work student anxiously trying to deal with the range of
issues clients presented to me. I was an applied scientist, trained to
use a sensitive psychological probe. Reflective listening and empathy
were fine, but this went directly to the source. Others in the agency
felt similarly and soon MMPIs were being administered regularly to
clients. In fact, we hardly talked to people anymore until after they
took the test. Once that was done, and we knew the "real"
issues they were struggling with, we could begin meaningful therapeutic
work. We also learned, to our surprise, that more of our clients had
serious psychological problems than we previously believed. Never did we
suspect that psychopathology was so rampant! But we were prepared.
I look back on that time with a mixture of amusement, regret, and
embarrassment. Like most social work students, even today, I knew little
about measurement. Certainly I was not about to challenge the
knowledgeable and experienced doctor. But what if I had mustered the
courage to ask about the reliability and validity of the test? Would I
have known what to make of the answer? What if the good doctor said it
had "high reliability"; what exactly
did that mean? Or, perhaps he would have been more specific and said
it had an alpha coefficient of .87. OK, now what?
The problem is that social work education is woefully inadequate
when it comes to teaching students about measurement. Sure, we encourage
the proper reverence for this form of scientific expression. And the
mandatory research courses provide basic information about questionnaire
construction, reliability, and validity in the context of conducting or
interpreting research. But the context in which most social workers
confront measurement issues is not formal research, but practice: How to
evaluate the labels, predictions, and judgments generated by measurement
instruments such as those that "reveal" that a person is
"clinically depressed," a good or bad candidate for a program,
or of "average" intelligence. How are social workers to assess
this information? What should they be telling their clients who are
requested or required to take these tests?
Educational and psychological testing is a big business in this
country. Schools, industry, and human services all rely heavily on tests
to provide information that will assist them in evaluation and decision
making. This reliance is encouraged by the current political climate
that promotes tests, especially standardized tests, as the best--that
is, the most fair, objective, and accurate way to determine achievement
and effectiveness, and by an economic climate in which testing is
considered a way to reduce costs, limit liability, and increase
efficiency in hiring. In addition, the rhetoric of testing appeals to
Americans' belief in fairness and accountability. Words like
"objective," "scientific," and
"standardized" convey an image of impartiality and realness,
and the quantification of test results bespeaks its importance. As de
Saint-Exupery's Little Prince (1943) explained,
Grownups love numbers. When you tell them about a new friend, they
never ask you questions about the essential things. They never say to
you: What does his voice sound like? Which games does he prefer? Does he
collect butterflies? Instead they ask you: How old is he? How many
brothers has he got? How much does he weigh? How much does his father
make? Only in that way do they get the feeling they know him. (p. 14)
Never mind that most folks have little understanding of these tests
and their potential dangers. They have rapidly become a
taken-for-granted part of educational, personnel, and human services
practice.
Because social workers operate in the interstices between dominant
social, economic, and political interests and those on the margins of
society, our position in relation to this trend needs careful
consideration. As with other capitalist behemoths, our stance toward the
testing industry should be as allies and resources for our clients.
Fulfilling these roles requires us to function as mediators,
interpreters, and advocates for those who are the subjects of testing.
To carry out these functions we need to understand enough about
measurement to engage in a meaningful dialogue about specific measures
to help clients decide whether to complete a test (if they have a
choice), and if they do, what the results mean. It also means knowing
enough to support or oppose the use of particular tests by human
services agencies or other institutions where such practices can affect
people's wellbeing, and to be able to take an informed position
about the growing use of tests for social decision making and
evaluation.
Measurement constitutes a powerful discourse that has important
consequences for the lives of the people with whom we work. To
participate in measurement discourse, we need to understand its
language--vocabulary, grammar, and syntax--if we are to be effective
interpreters, resources, and advocates. This understanding enables an
"insider" critique, an assessment of measurement applications
using traditional assumptions, definitions, and rules. In addition, we
need an alternative understanding to generate an "outsider"
critique, assessment from a position that does not presume the
assumptions and tenets of traditional measurement theory. Outsider
critiques often are necessary to effect systemic or structural change.
Brief illustrations of these two types of critique follow.
Insider Critique
Many measurements are developed rather poorly, even by traditional
criteria. Consider, for example, reliability, a basic and critical
psychometric property of a measuring instrument. Typically, reliability
is understood as something like "consistency"--that is, a
measure is said to be reliable to the extent that it will produce the
same score under different conditions of measurement. For example, if
you were to take an intelligence test, you would not want your score to
vary because of differences among test administrators or what day of the
week you took the test. Variations in your test score should reflect
changes in your intelligence (or whatever the test measures), not
differences in administrators or days.
Estimating changes in scores under different conditions of
measurement yields different types of reliability. If a group of people
completed a test on Monday and then completed the same test again on
Friday, correlating the two sets of scores provides an estimate of how
much they vary as a function of the passage of time. This estimate is
often called test--retest reliability. The same logic applies to any
condition of measurement, such as differences in test instructions, test
items, or test location. Thus, there can be numerous types of
reliability--many of which may be relevant to particular decisions or
judgments.
Many tests in use report only one or two types of reliability
estimates. A popular one is "internal consistency" measured by
a reliability coefficient called alpha (mentioned in the "Dr.
Gibeau" example). I am not sure why alpha is so popular other than
it is relatively convenient and with computers, easy to calculate.
Basically, alpha estimates the extent to which the items on a test
correlate with one another. This may be useful for us to know, but it
certainly is not the only or even the most important type of reliability
that we might be interested in. For example, if my client was given a
measure of depression, I would certainly want to know if the particular
way the test was presented to her, that the test administrator was a
man, or that the test was given at a counseling clinic will affect her
score. All of these factors are conditions of measurement, and
reliability can be estimated for each of them. Unfortunately, it is
difficult to find reliability information on conditions other than test
items ( internal consistency) and time (test--retest), possibly because
it would be too costly or time consuming. Nevertheless, other conditions
may be important for the use of tests in practice, and the absence of
such information must be weighed in assessing the value and
interpretation of a test.
Outsider Critique
Traditional measurement theory assumes a correspondence between
test scores and some existing reality. In contrast, the act of
measurement may generate reality. For example, suppose you took the MMPI
and it suggested that you had responded like people diagnosed with a
particular DSM psychiatric disorder. Did the test discover a
pre-existing condition about you or did it generate one (whose existence
is presupposed by the test)?
The concept of validity provides a good illustration of these
different (insider and outsider) perspectives. Generally, a test is
valid to the extent that it measures what it purports to measure. A test
that purports to measure parenting ability, for example, is valid to the
extent that the scores obtained are indicative of the construct
"parenting ability" (however defined) and not some other
construct. However, from an outsider perspective, validity might be
considered a claim that a test has a certain kind of authority--the
authority to reflect some aspect of reality. Such claims function
rhetorically to confer power on test owners and test interpreters. If,
as Morawski (1994) suggested, we consider validity functionally rather
than representationally, our interest shifts to how validity claims
construct--rather than reflect--what we take to be real. From this
perspective, measures do not possess validity; rather validity claims
are exercises of power. These claims are enabled by the very definition
of va lidity as test construct-reality correspondence, whereas topics
such as power and ideology, topics of interest to social workers, are
excluded (see Cherryholmes, 1989).
Social Work Critique
As social workers, our concerns go beyond the psychometric
properties of instruments. Therefore, social work commitments and
interests generate a type of outsider critique. We want to know how an
instrument applies to the people with whom we work. Because poor,
oppressed populations often are not well-represented during the
development of measurement instruments, social workers have reason to
question an instrument's relevance for their clients. Even
accepting the notion of validity does not entail its constancy across
different people, conditions, or times. In other words, a measure might
appear to measure one thing with a group of white, male, middle-class
college students, but something different (or nothing at all) with
African American, female, poor, single mothers.
Social workers also are interested in issues like social justice
and human rights, strengths, diversity, and human dignity. Thus, they
need to ask, to what extent does the use of a particular measure further
or thwart these ideals? They also need to consider questions like:
* To what extent are the cultural and life experiences of people of
color, gay and lesbian people, people with disabilities, and other
disadvantaged groups considered by the test?
* What are the practice implications of having clients complete
this test? For example, do they get categorized into psychiatric
syndromes?
* Of what theory is this test an expression?
* What can the test tell me beyond what I already know or could
know about this individual?
These questions, among others, express our concerns as professional
social workers. Keeping these concerns on par with measurement issues
will help us unwittingly avoid adopting the mindset of test developers
and administrators. As Rogers (1995) noted, "In an important way,
the very existence of tests creates a climate where it is tempting to
lapse into rigid and mechanistic ways of thinking about human
issues" (p. 529). By interrogating and participating in the
discourse surrounding the use and interpretation of tests, we can
counteract these temptations and better serve our clients.
References
Cherryholmes, C. C. (1989). Power and criticism: Poststructural
investigations in education (2nd ed.). New York: Teachers College.
cummings, e. e. (1991). Voices to voices, lip to lip. In G. J.
Firmage (Ed.), Complete poems 1904--1962 (p. 262). New York: Liverlight.
de Saint-Exupery, A. (1943). The little prince. (K. Woods, Trans.).
San Diego: Harcourt Brace Jovanovich.
Morawski, J. G. (1994). Practicing feminisms, reconstructing
psychology. Ann Arbor: University of Michigan Press.
Rogers, T. B. (1995). The psychological testing enterprise: An
introduction. Pacific Grove, CA: Brooks/Cole.