文章基本信息

标题：The Measure of Things.
作者：Witkin, Stanley L.
期刊名称：Social Work
印刷版ISSN：0037-8046
出版年度：2001
期号：April
语种：English
出版社：Oxford University Press
关键词：Psychological tests;Social service;Social services

The Measure of Things.

Witkin, Stanley L.

While you and i have lips and voices which are for kissing and to sing with who cares if some oneyed son for a bitch invents an instrument to measure Spring with?

--e. e. Cummings

Measurement is everywhere. Its myriad applications are encountered every day. Think about this morning. Did you awaken to a sound triggered by your time measuring device? Did you step on a weight-measuring device when you stumbled into the bathroom? Did you fill your car with a particular amount of fuel as measured by a fuel-dispensing machine?

Social work too is replete with measurement. Measurement instruments inform us about a wide range of things--from a person's psychological state to her or his aptitude for a particular occupation. In fact, the pathway to clienthood almost always includes measurement-based information. Often, this information is expressed as numbers, giving it an air of objectivity and trustworthiness. Converting concepts to numbers also generates a need for expert interpretation giving test information a certain authority. These qualities have led social workers to rely on measurement instruments for credible and useful information. Yet, despite this reliance, an aura of mystery surrounds measurement that keeps social workers uncharacteristically silent about this influential source of information.

My own social work education taught me about the mystery and power of measurement. My first-year MSW field placement was at a family services agency. The director, who called herself a psychiatric social worker, believed that psychological tests could be an important addition to our social work practice. She hired a psychologist, someone named Dr. Gibeau, to teach us how to administer and interpret some psychological tests, most notably the Minnesota Multiphasic Personality Inventory (MMPI). Developed around 1940 to measure psychopathology, the MMPI (at the time I was learning about it) consisted of about 540 true-false questions on a range of topics. Scores on the MMPI were graphed across its 10 clinical subscales and three "validity" scales. To a neophyte social worker, this scientific-looking display was impressive, but not very informative. For Dr. Gibeau, however, interpreting an MMPI profile was like reading a book. I remember watching in awe as he effortlessly spun a narrative from the squiggly line t hat stretched across the scales. He would say something like, "This 19-year--old female has periodic suicidal ideation--probably a long-term problem precipitated by being on her own. She has a tendency to run away from problems or retreat into fantasy. The therapeutic task is to get her to interpret reality more accurately." While telling this story, Dr. Gibeau would point out the various scales and combinations of scales that supported his interpretation.

We were believers. Using this instrument of psychological science, we could gain access to the inner recesses of our clients' psyches, revealing their fears and pathologies, even those of which they themselves were unaware. There was no fooling the test. Attempts to "fake good" would be detected by the validity scales, which were designed for this purpose. Yes, this powerful tool would reveal a truth of which we were previously unaware. We could not wait to begin using it.

Administering the MMPI felt great. No longer was I merely one more insecure social work student anxiously trying to deal with the range of issues clients presented to me. I was an applied scientist, trained to use a sensitive psychological probe. Reflective listening and empathy were fine, but this went directly to the source. Others in the agency felt similarly and soon MMPIs were being administered regularly to clients. In fact, we hardly talked to people anymore until after they took the test. Once that was done, and we knew the "real" issues they were struggling with, we could begin meaningful therapeutic work. We also learned, to our surprise, that more of our clients had serious psychological problems than we previously believed. Never did we suspect that psychopathology was so rampant! But we were prepared.

I look back on that time with a mixture of amusement, regret, and embarrassment. Like most social work students, even today, I knew little about measurement. Certainly I was not about to challenge the knowledgeable and experienced doctor. But what if I had mustered the courage to ask about the reliability and validity of the test? Would I have known what to make of the answer? What if the good doctor said it had "high reliability"; what exactly

did that mean? Or, perhaps he would have been more specific and said it had an alpha coefficient of .87. OK, now what?

The problem is that social work education is woefully inadequate when it comes to teaching students about measurement. Sure, we encourage the proper reverence for this form of scientific expression. And the mandatory research courses provide basic information about questionnaire construction, reliability, and validity in the context of conducting or interpreting research. But the context in which most social workers confront measurement issues is not formal research, but practice: How to evaluate the labels, predictions, and judgments generated by measurement instruments such as those that "reveal" that a person is "clinically depressed," a good or bad candidate for a program, or of "average" intelligence. How are social workers to assess this information? What should they be telling their clients who are requested or required to take these tests?

Educational and psychological testing is a big business in this country. Schools, industry, and human services all rely heavily on tests to provide information that will assist them in evaluation and decision making. This reliance is encouraged by the current political climate that promotes tests, especially standardized tests, as the best--that is, the most fair, objective, and accurate way to determine achievement and effectiveness, and by an economic climate in which testing is considered a way to reduce costs, limit liability, and increase efficiency in hiring. In addition, the rhetoric of testing appeals to Americans' belief in fairness and accountability. Words like "objective," "scientific," and "standardized" convey an image of impartiality and realness, and the quantification of test results bespeaks its importance. As de Saint-Exupery's Little Prince (1943) explained,

Grownups love numbers. When you tell them about a new friend, they never ask you questions about the essential things. They never say to you: What does his voice sound like? Which games does he prefer? Does he collect butterflies? Instead they ask you: How old is he? How many brothers has he got? How much does he weigh? How much does his father make? Only in that way do they get the feeling they know him. (p. 14)

Never mind that most folks have little understanding of these tests and their potential dangers. They have rapidly become a taken-for-granted part of educational, personnel, and human services practice.

Because social workers operate in the interstices between dominant social, economic, and political interests and those on the margins of society, our position in relation to this trend needs careful consideration. As with other capitalist behemoths, our stance toward the testing industry should be as allies and resources for our clients. Fulfilling these roles requires us to function as mediators, interpreters, and advocates for those who are the subjects of testing. To carry out these functions we need to understand enough about measurement to engage in a meaningful dialogue about specific measures to help clients decide whether to complete a test (if they have a choice), and if they do, what the results mean. It also means knowing enough to support or oppose the use of particular tests by human services agencies or other institutions where such practices can affect people's wellbeing, and to be able to take an informed position about the growing use of tests for social decision making and evaluation.

Measurement constitutes a powerful discourse that has important consequences for the lives of the people with whom we work. To participate in measurement discourse, we need to understand its language--vocabulary, grammar, and syntax--if we are to be effective interpreters, resources, and advocates. This understanding enables an "insider" critique, an assessment of measurement applications using traditional assumptions, definitions, and rules. In addition, we need an alternative understanding to generate an "outsider" critique, assessment from a position that does not presume the assumptions and tenets of traditional measurement theory. Outsider critiques often are necessary to effect systemic or structural change. Brief illustrations of these two types of critique follow.

Insider Critique

Many measurements are developed rather poorly, even by traditional criteria. Consider, for example, reliability, a basic and critical psychometric property of a measuring instrument. Typically, reliability is understood as something like "consistency"--that is, a measure is said to be reliable to the extent that it will produce the same score under different conditions of measurement. For example, if you were to take an intelligence test, you would not want your score to vary because of differences among test administrators or what day of the week you took the test. Variations in your test score should reflect changes in your intelligence (or whatever the test measures), not differences in administrators or days.

Estimating changes in scores under different conditions of measurement yields different types of reliability. If a group of people completed a test on Monday and then completed the same test again on Friday, correlating the two sets of scores provides an estimate of how much they vary as a function of the passage of time. This estimate is often called test--retest reliability. The same logic applies to any condition of measurement, such as differences in test instructions, test items, or test location. Thus, there can be numerous types of reliability--many of which may be relevant to particular decisions or judgments.

Many tests in use report only one or two types of reliability estimates. A popular one is "internal consistency" measured by a reliability coefficient called alpha (mentioned in the "Dr. Gibeau" example). I am not sure why alpha is so popular other than it is relatively convenient and with computers, easy to calculate. Basically, alpha estimates the extent to which the items on a test correlate with one another. This may be useful for us to know, but it certainly is not the only or even the most important type of reliability that we might be interested in. For example, if my client was given a measure of depression, I would certainly want to know if the particular way the test was presented to her, that the test administrator was a man, or that the test was given at a counseling clinic will affect her score. All of these factors are conditions of measurement, and reliability can be estimated for each of them. Unfortunately, it is difficult to find reliability information on conditions other than test items ( internal consistency) and time (test--retest), possibly because it would be too costly or time consuming. Nevertheless, other conditions may be important for the use of tests in practice, and the absence of such information must be weighed in assessing the value and interpretation of a test.

Outsider Critique

Traditional measurement theory assumes a correspondence between test scores and some existing reality. In contrast, the act of measurement may generate reality. For example, suppose you took the MMPI and it suggested that you had responded like people diagnosed with a particular DSM psychiatric disorder. Did the test discover a pre-existing condition about you or did it generate one (whose existence is presupposed by the test)?

The concept of validity provides a good illustration of these different (insider and outsider) perspectives. Generally, a test is valid to the extent that it measures what it purports to measure. A test that purports to measure parenting ability, for example, is valid to the extent that the scores obtained are indicative of the construct "parenting ability" (however defined) and not some other construct. However, from an outsider perspective, validity might be considered a claim that a test has a certain kind of authority--the authority to reflect some aspect of reality. Such claims function rhetorically to confer power on test owners and test interpreters. If, as Morawski (1994) suggested, we consider validity functionally rather than representationally, our interest shifts to how validity claims construct--rather than reflect--what we take to be real. From this perspective, measures do not possess validity; rather validity claims are exercises of power. These claims are enabled by the very definition of va lidity as test construct-reality correspondence, whereas topics such as power and ideology, topics of interest to social workers, are excluded (see Cherryholmes, 1989).

Social Work Critique

As social workers, our concerns go beyond the psychometric properties of instruments. Therefore, social work commitments and interests generate a type of outsider critique. We want to know how an instrument applies to the people with whom we work. Because poor, oppressed populations often are not well-represented during the development of measurement instruments, social workers have reason to question an instrument's relevance for their clients. Even accepting the notion of validity does not entail its constancy across different people, conditions, or times. In other words, a measure might appear to measure one thing with a group of white, male, middle-class college students, but something different (or nothing at all) with African American, female, poor, single mothers.

Social workers also are interested in issues like social justice and human rights, strengths, diversity, and human dignity. Thus, they need to ask, to what extent does the use of a particular measure further or thwart these ideals? They also need to consider questions like:

* To what extent are the cultural and life experiences of people of color, gay and lesbian people, people with disabilities, and other disadvantaged groups considered by the test?

* What are the practice implications of having clients complete this test? For example, do they get categorized into psychiatric syndromes?

* Of what theory is this test an expression?

* What can the test tell me beyond what I already know or could know about this individual?

These questions, among others, express our concerns as professional social workers. Keeping these concerns on par with measurement issues will help us unwittingly avoid adopting the mindset of test developers and administrators. As Rogers (1995) noted, "In an important way, the very existence of tests creates a climate where it is tempting to lapse into rigid and mechanistic ways of thinking about human issues" (p. 529). By interrogating and participating in the discourse surrounding the use and interpretation of tests, we can counteract these temptations and better serve our clients.

References

Cherryholmes, C. C. (1989). Power and criticism: Poststructural investigations in education (2nd ed.). New York: Teachers College.

cummings, e. e. (1991). Voices to voices, lip to lip. In G. J. Firmage (Ed.), Complete poems 1904--1962 (p. 262). New York: Liverlight.

de Saint-Exupery, A. (1943). The little prince. (K. Woods, Trans.). San Diego: Harcourt Brace Jovanovich.

Morawski, J. G. (1994). Practicing feminisms, reconstructing psychology. Ann Arbor: University of Michigan Press.

Rogers, T. B. (1995). The psychological testing enterprise: An introduction. Pacific Grove, CA: Brooks/Cole.