sea change in nonprofit human services: A critical assessment of outcomes measurement, The
Fischer, Robert LAbstract
Outcomes assessment or program evaluation has historically been an infrequent activity within nonprofit agencies. Direct-- service agencies have routinely focused on measures of output as indicators of impact rather than on measures related to outcomes and effectiveness. Funders of program services (eg., United Way, foundations, government agencies) exercise substantial Influence over the use of evaluation by recipient programs and agencies. The essential task of identifying outcomes and Implementing the processes associated with the outcome measurement enterprise, however, fall squarely on the nonprofits themselves. This paper discusses and critiques the outcome measurement approach from the perspective of a direct service agency and suggests how the available methods can be adopted most successfully.
FOR THE LAST 3 DECADES, formal outcome evaluation in human services has routinely been the purview of well-funded programs and federal demonstrations. Large-scale and multi-site experiments have represented the flagship efforts of the evaluation movement in documenting the impact of social programs. The products of these well-designed efforts frequently have been pivotal in our understanding of the effectiveness of broad areas of social intervention (e.g., in areas such as job training, income maintenance, pregnancy prevention, and educational reform).
During the same time frame, many private and nonprofit agencies delivering services in local communities (frequently member agencies of the United Way) have strived to improve their programs based on the data available. The majority of agencies have traditionally focused their attention on measures of the quantity of services provided-numbers of participants, units of staff effort, and the like.
Relatively few programs have had the resources (or the justification to use limited resources) to fund evaluations of their services. Some agencies welcomed the assistance of university-based researchers in performing lower-cost, fairly short-term research projects. Other agencies reported what they could, based on the records kept by their staff (frequently kept by hand in paper files). Staff members who dealt with elementary numeric calculations such as the number of clients served and their average amount of services were often quickly dubbed the in-house "statistician."
While evaluations have been conducted on an ad hoc basis over this period, the use of ongoing data collection for this purpose has been less frequent. While the achievement of outcome measurement is of far less technical import than well-designed evaluations, its widespread use would represent a major advancement for the nonprofit sector.
In many ways the evaluation of social interventions is considered the frontier of applied social science research (e.g., Boruch, 1994; Jacobs & Weiss, 1988). Human service environments in which rigorous experimental (or quasi-experimental) approaches are unacceptable or infeasible are the norm for much of social work practice. Jacobs and Weiss (1988) highlighted this incompatibility in the statement that "evaluations often must trade off between neat scientific rigor and complex, but realistic, portrayal of programs" (p. 503). While the evaluation community has grown more sensitive to the myriad issues surrounding the misapplication of more rigorous designs in applied settings, additional attention is needed to avoid these pitfalls (Fetterman, 1994). Further, the struggle to find meaningful and obtainable measures of success continues to be particularly challenging in programs dealing with complex family problems (Barthel, 1992; Weiss & Jacobs, 1988). While there is a growing body of work on the evaluation of social work interventions in the nonprofit sector, most research in this area continues to rely on less rigorous research designs.
The Growth of Evaluation
To many human service professionals, the move to measure outcomes has seemed entirely new. This shift, however, can been seen as the continuation of a movement begun in the 1960s (Rossi, 1997). The field of evaluation has blossomed since the era of the "Great Society" and the various social programs launched as part of the "War on Poverty." The increased call for outcome-focused evidence on the effectiveness of social interventions comes not just from government funders and the research community (Courtney & Collins, 1994) but also from within the nonprofit (e.g., the United Way) and foundation communities and even among private donors (Bickel, Eichelberger, & Hattrup, 1994). These various influences have led many nonprofits to undertake a process of embedding outcome measurement activities into program operations.
Despite this broad movement toward the systematic evaluation of social interventions, the nonprofit sector, particularly agencies and programs funded through local United Ways, has been, until recently, only minimally affected. Up until the early 1990s, many United Ways concentrated principally on measures related to the quantity of services provided by funded agencies and programs.
In the main, the assessment of the quality or effectiveness of these services was frequently approached with a plea to "common sense" or simply addressed through testimonials and vignettes from individual clients. The movement toward program evaluation has now heavily filtered down to local nonprofit direct-service agencies. A variety of forces have intersected to produce a unified emphasis in much of the nonprofit sector on measuring program outcomes and basing future funding decisions on data about key outcomes.
Toward Outcomes Measurement in the Nonprofit Sector
With the increased pressure on public and private human service agencies to demonstrate the effectiveness of services, a notable movement has occurred within the nonprofit sector among the network of local United Ways that fund programs in their respective communities. In 1996 United Way of America began to promote the use of outcome measurement as an aid to communicating resuits and funding decisions within its network of member United Ways. It also produced a manual to serve as a guide in this task (Harry, Van Houten, Plantz, & Greenway, 1996). Concurrently, United Way of America convened a Task Force on Impact. Representatives from 23 national health, human service, and youth and family agencies/associations comprised the Task Force.
In addition to working with other national agencies, United Way also approached local United Ways during this phase. Seven United Ways of varying sizes were selected in June 1996 to be studied for a 3-year period to examine how outcomes could be used to produce change in individual-funded programs (United Way of America, 1997). A parallel movement focused on systems change using results-based accountability has emerged over this same time (Weiss, 1997).
This effort uses system-level measures of outcomes to track the progress of constellations of individual programs, agencies, or organizations. United Way of America (1999a & 1999b) has worked to educate local United Ways on the use of such results-focused efforts through the use of community status reports (or report cards). These efforts by United Way of America set the stage for a profound change in the environment facing local agencies and programs.
With outcome measurement activities proliferating among United Way agencies across the country, either independently or at the behest of funders, an examination of the tools available and the likely success of the shift is warranted. The rapid expansion and implementation of outcomes measurement, termed "outcome mania" by some, has been viewed with caution by many in the evaluation community (Bonnett, 1997; Evaluation Forum, 1995). A description of the outcomes measurement model developed by United Way of America is presented, followed by a discussion of the primary threats to the effective use of this model. The United Way outcomes measurement approach is used as a starting point for examining the wide-scale use of evaluation methods in nonprofit agencies.
The Process of Outcome Measurement
This section draws primarily on the outcomes measurement model developed under the auspices of the United Way of America and published as a stand-alone manual (Hatry et al., 1996). The model was also further described in a published journal article (Plantz, Greenway, & Hendricks, 1997). This author draws on his experience as an outcomes trainer for the United Way of Metropolitan Atlanta during 1997-1998. In this capacity, the United Way of America materials were used in the training of staff from funded agencies and United Way volunteers in the workings of the model and its application in a variety of human service settings. This also draws on the author's ongoing experience as an internal program evaluator with a large nonprofit multi-service agency implementing outcomes measurement.
The outcomes measurement model is a systematic approach to implementing program evaluation in applied or direct-service settings. It provides a framework for engaging the discussion of outcomes in agencies and programs that have likely not addressed this issue in a substantive way as yet. The model lays out eight steps for taking an agency from start to finish in implementing the model (see Table 1). Hatry et al. (1996) present these steps in a pyramid schematic, with the eighth step at the pyramid's peak, symbolically conveying the, at times, arduous task of implementing outcomes measurement (p. 6).
The first three steps of the model lay the groundwork for putting a measurement system in place. This groundwork covers involving the right people in the process and developing an elementary model of the program under consideration. A critical step for programs involves the clear identification of the program goals. The model for outcomes measurement demonstrates how to formulate a logic model for a human service intervention. The logic model is a schematic representation of the main elements of a program and includes the ultimate program goals. The model requires the specification of the inputs, activities, outputs, and outcomes of the individual program. Most direct-service agencies have routinely focused on describing the outputs of their work. Recognizing the differing conceptions of these terms, outputs are defined by the authors of the manual as "the direct products of the program activities and usually are measured in terms of the volume of work accomplished" (Harry et al., 1996, p. 1).
The key difference in the move to outcomes measurement is the shift from counting outputs to measuring outcomes. In the model's terminology, outcomes are defined as "benefits or changes for individuals or populations during or after participating in program activities" and involve "behavior, skills, knowledge, attitudes, values, condition, or other attributes" (Hatry et al., 1996, p. 2). The identification of outcomes often forces the program staff to make explicit the operational theory of the program and present the essential elements of the model. This is one of the potentially far-reaching benefits of broader use of this model. The clearer description of program theories has been cited as an essential task in the development of knowledge about social interventions (Lipsey, 1993). The third step involves the selection of indicators that are the data elements that will register change on the outcomes of the program. The identification of indicators can prove to be an analytically challenging and politically charged process, and must be tailored to the individual program.
The fourth and fifth steps involve the "nuts and bolts" aspects of operationalizing the elements of the program logic model. This includes the development of data collection instruments and the design of processes for handling the data collection. A key feature is the testing of the measurement system before implementing the system across the board. The model takes great care is communicating the importance of working out the bugs in data collection through an iterative process of testing and revising the system.
The remaining three steps of the model guide the reader through the analysis, reporting, and use of findings, and the improvement of the measurement system. These steps are necessarily understated due to their somewhat flexible nature. The heart of the process is the development of a program model, the identification of measures, and the collection of data. These final three steps are much more meaningful to program staff who have some data in hand, making the discussion relevant to their own experience.
Threats to the Success of Outcomes Measurement
Given the understanding that the move to outcomes measurement by direct-service agencies is worthwhile, a discussion of the likely pitfalls that agencies face may be useful. These pitfalls are broken into two categories: (1) issues that arise as a result of the agency context for evaluation, and (2) issues that result from the structural limitations of the outcomes measurement model.
Agency Context Issues
Despite broad similarities, nonprofit entities are recognized as having bureaucratic and political environments distinctly different from both the private sector and the public sector. Faced with the need to report outcomes to funders, these agencies will have many decisions to make. The following items identify possible unintended results of the implementation of outcomes measurement in nonprofits. The negative influence of most of these can be minimized by acquiring agency buy-in to the outcomes effort, providing adequate training to agency staff who will be responsible for implementing the processes, and conducting sufficient oversight of reporting by agencies. The success of these approaches rests heavily on the attention and expertise of the staff in local United Ways, as collaborators with agencies in the reorientation to outcomes measurement. Largely an unfunded mandate for nonprofits, the cost of outcomes measurement may fall in the range of 5-15% when built into program budgets and contractual agreements.
"Just give them some numbers. " While some agencies may have the resources to undertake and implement the processes of outcomes measurement, many more do not. If agencies lack the time, resources, and expertise to adequately undertake evaluation tasks, they may be forced to either curtail direct service or generate low quality outcome data. If faced with such a choice, many program administrators would opt for the latter. This is coupled with the sentiment among some nonprofit managers that this movement is simply the latest fad in nonprofit management. Under this thinking, the requirements of outcomes measurement are to be satisfied with the least inconvenience possible.
Creaming. Once outcome measures have been established against which the agency's success would be measured, there may be an incentive to alter the clients served in order to improve outcomes (e.g., serve "better-off" or less-troubled cases). This phenomenon was well documented in the job training programs funded under the Job Training Partnership Act (JTPA) legislation. In that instance, success was measured in part by the job placement rate for clients served by the program. In order to boost the placement rate, some programs began to enroll less-disadvantaged clients in the program.
Dollars driving outcomes. This scenario derives from an incentive for agencies to frame a program's outcomes according to the specified interests of the funder, not based on the program theory. This is particularly harmful in outcomes measurement because the measures must come from within the program. This incentive can lead to agencies over-reaching by linking the program to outcomes that it could not plausibly influence in any meaningful way. Additionally, this may lead to agencies using finite resources to measure the wrong outcomes, a situations that clearly does not serve well the needs of the program, its participants, or its funders.
Selective reporting. Ultimately, the outcomes reported by a program or agency must be representative of the clients served, in order to be meaningful. Some agencies will have the incentive to focus their reporting on specific subgroups of the caseload, that will make the outcomes appear better than they otherwise would (e.g., by reporting on program graduates or completers or by excluding dropouts). Not only will this lead to a misrepresentation of actual program impacts but will cause the creation of a biased database that will be useless for decision making and program development.
By moving evaluation into agencies, without sufficiently building capacity for this new endeavor, the influence of mixed incentives without professional evaluation assistance, may lead to biased data procedures. Ultimately, staff may believe so much in the effectiveness of programs that the elimination of "bad" data may represent only an incidental concern.
Limitations of the Outcomes Measurement Model
As an elementary approach to measuring outcomes, the United Way model is somewhat mismatched in some program areas common to human service interventions. It should be noted that the authors of the approach discuss several areas of limitations in the training manual (Hatry et at., 1996). Following are several primary areas for which the model may be either ill-fitted or inappropriate.
Crisis-focused interventions. Many nonprofit agencies provide services in which the contact between the program staff and the client is focused around some crisis event and may be very limited. For example, agencies that offer "hot line" services (e.g., for victims of rape or domestic violence) are faced with a situation in which the client is anonymous and no follow-up is possible to trace the benefits of the contact in the client's life. Programs that serve the homeless (shelter and food assistance) or individuals needing immediate financial assistance (e.g., for rent, utilities, travel to return to home city), have limited access to data on how the services and aid provided led to resolution of a client's problems or other positive outcomes. Frequently, in these instances the measure of output becomes the only plausible measure of outcome. These programs may be forced to rely on the things that are measurable, such as number of meals served, the number of homeless housed, and the number of clients served.
Prevention-focused programs. Another category of programs also presents a structural difficulty when the outcomes model is applied to them. Programs that are based on a prevention model, especially those in which the expected benefits are very long-term, are difficult to handle in the outcome measurement context. Likely the largest subset of programs are those that work with youth, with messages that promote positive behaviors and discourage negative behaviors. These interventions are targeted at areas such as preventing academic failure, delinquency, substance abuse, early sexual activity, and violence. The difficulty for outcome measurement is that the outcomes that the program is designed to influence are often far into the future and outside the program's ability to reasonably collect data. A default option is for the programs to measure change that is plausibly connected (often based on existing basic social science research) to the distal outcomes. This approach would focus on change in the youths' knowledge and self-esteem and their intentions regarding their own future behavior. A subsequent difficulty, however, is the inability to measure these characteristics well among youth, especially disadvantaged youth who may have limited literacy (due both to their age and academic difficulties).
Brief-service interventions. Frequently in human service organizations clients discontinue service after a short time (e.g., individual and family counseling). This could range from clients who engage the agency over the phone but then do not come for a scheduled appointment ("no-- show") to clients who begin treatment and then simply do not follow through with the case plan. Often these levels of service receipt are not the recommended amount, nor would discontinuation of service be recommended by the program staff. It may be rather straightforward to construct a logic model for such programs, given an assumption of a recommended dose of service (e.g., six sessions). In these instances, however, there may be high attrition from the population of individuals that initiate service, due to unplanned discontinuation of service. Certainly, there is something to be learned from the experiences of these individuals. Clients who receive only one session or, arguably, even a phone consultation, may have received meaningful benefit from this limited contact; however, the outcome measurement model may miss these benefits unless properly tailored to detect them.
Community-building efforts. Another growing category of work in nonprofits is in the area of community building or community organizing. These efforts seek to aid neighborhoods and other defined communities in improving the social environment for its residents, through increasing connections among residents and community groups, improving leadership within communities, and helping residents advocate on the behalf of the community and themselves. The difficulty here, in regard to outcomes measurement, is that the intervention is diffuse and the target of the work is at the community level, not the individual level. Thus, data collection traditionally focused on individuals and families, the services they received, and how they fared, is not well-suited to a program directed at community change. Certainly, methods are available to gauge changes in community, but there is an initial disconnect with the basic outcomes model.
The ultimate question: Compared to what? As formulated by some (e.g., Hatry et al., 1996, p. 21-22) outcomes measurement makes no substantive claims about the impact of services; that is, the difference between what happened and what would have happened in the absence of services. So, in the end, much of these data will simply document that changes, positive and negative, occurred in the lives of the clients served; simple pre/post studies will not support claims that a program produced client change, unlike more cumbersome research designs (e.g., randomized experiments). The reality, however, is that when communicated to funders, these data will be understood to suggest that the programs caused the changes observed, regardless of the rigor of the design and the causal links that have been proven. The next step beyond elementary outcomes measurement is generating comparative data against which programmatic outcome data can be judged. These data may come from empirical work by the agency, using comparable groups of clients receiving alternative services or perhaps on a waiting list, or from external data sources or research studies that cast light on the program. Decision-making based on outcome data will require such comparative data to provide the necessary context for interpretation.
Developing a Workable Approach
Nonprofit agencies moving to adopt outcomes measurement will need to adapt the available models and approaches to their own specific situations. It may be useful to briefly describe one application of the basic model in an agency setting. The example discussed here is based on the experience of Families First, a multi-service family and children's agency in metropolitan Atlanta. While as an agency Families First had been doing systematic evaluative work since the late 1980s, it was not until 1996 that the agency adopted clearly specified outcome models for its 12 service areas. In each substantive area (e.g., foster care, family counseling, transitional housing), a program logic -model was developed, working with the direct staff and supervisors of the programs. The agency elected to begin with noncomplex models, with the option of increasing the level of detail in the future as data collection was fully implemented. Most of the models have multiple outcome streams, reflecting the different domains of intended effects for each program. Figure 1 shows one such model, for a case management program for pregnant and parenting teens.
The model specifies four principal outcome areas for the program: (1) healthy pregnancy and birth, (2) avoidance of subsequent pregnancy, (3) academic achievement, and (4) parenting preparation. These basic outcome categories were each fleshed out with more detail for the routine data collection and summary reports have been released (Fischer, 1997a; Fischer, 1997b). A primary message to be taken from this application is that while the outcomes measurement process may seem daunting at first, if the work is compartmentalized and simplified, an agency can make marked progress toward its goal. Once data are collected, the model will need to be revisited and revised and plans made for the analysis and use of outcome data produced.
Practical Advice for Agencies
In this final section a few key points will be highlighted that may prove useful to nonprofit agencies. Certainly, the appropriate outcome approach depends on the extent of prior evaluative work in the agency, the level of expectations of funders, the size of the agency, and the availability of resources. However, the following points are threads that tend to be useful across agency situations.
Begin small. It is neither feasible nor prudent to implement outcomes measurement overnight. The process can begin with one program service or even one site of a program. The agency staff working on the process will gain useful knowledge that will be transferable to other program areas within the agency as the effort progresses. It should be noted that success is not necessarily linked to expensive technology initially, but over time it may become a necessary component.
Focus on measures that are both meaningful and practical. Whatever outcome indicators are selected should be viewed by stakeholders as acceptable measures of program impact and the collection of data should mesh well with the delivery of the program. Thus, it improves the product to include a variety of staff and other stakeholders in the process of measure selection to provide these various perspectives.
Do your homework (or get someone else to do it). Most programs currently operated by agencies are rooted is some subfield of human services. Undoubtedly, some literature exists on the theory underlying the program, its implementation in other areas, and potentially, findings in regard to its effectiveness. Begin to compile this type of information, as both a way of generating ideas for measuring the outcomes of the program and as a resource for providing context to your future findings. One of the widely-used evaluation texts may be useful in framing the effort (e.g., Chelimsky & Shadish, 1997; Rossi, Freeman, & Lipsey, 1999; Wholey, Harry, & Newcomer, 1994). Other resources may be tapped are guides for implementing outcomes methods more broadly (Annie Casey Foundation, 1995; Centers for Disease Control and Prevention, 1999; Kellogg Foundation, 1998).
Conclusion
Outcomes measurement has begun to take hold in nonprofits across the broad spectrum of agencies in the human service sector. While there has been considerable progress in the use of these methods, there is cause for caution and attention to the areas where our current models may not fit well. This paper has described some of the context for this movement and has laid out seve al caveats for agencies attempting to enact outcomes measurement processes. The broad application of evaluative approaches in human services should and will lead to improved program services, better outcomes for clients, and better use of finite resources within the nonprofit sector.
References
Barthel, J. (1992). For children's sake: The promise of family preservation. New York: The Edna McConnell Clark Foundation.
Bickel, W. E., Eichelberger, R. T., & Hattrup, R. A. (1994). Evaluation use in private foundations: A case study. Evaluation Practice, 15(2). 169-178.
Bonnet, D. (1997, May 12). Outcomes of outcome mania. Message posted to EVALTALK Listserv. Indianapolis, IN: D. Bonnet Associates.
Boruch, R. E (1994). The future of controlled randomized experiments: A briefing. Evaluation Practice, 15(3). 265-274.
Annie E. Casey Foundation. (1995, September). Getting smart, getting real: Using research and evaluation information to improve programs and policies. Baltimore, MD: Author.
Centers for Disease Control and Prevention (1999, September 17). Framework for program evaluation in public health. Morbidity and Mortality Weekly Report, 48. No. RR-11. Atlanta, GA: U.S. Department of Health and Human Services.
Chelimsky, E., & Shadish, W (Eds.). (1997). Evaluation for the 21st Century. Beverly Hills CA: Sage
Courtney, M. E., & Collins, R. C. (1994). New challenges and opportunities in child welfare outcomes and information technologies. Child Welfare, 76(5), 359-378.
Evaluation Forum. (1995). Outcomes for success! Seattle, WA: Organizational Research Services, Inc. and Clegg & Associates, Inc. Fetterman, D. M. (1994). Keeping research on track. In K. J. Conrad
(Ed.) Critically Evaluating the Role of Experiments. New Directions for Program Evaluation, 63, 103-105.
Fischer, R. L. (1997a). Evaluating the delivery of a teen pregnancy and parenting program across two settings. Research on Social Work Practice, 7(3), 350-369.
Fischer, R. L. (1997b, June). Healthy pregnant and parenting teens: Serving at-risk families. Paper presented at the John and Kelly Hartman Conference on Children and Their Families, New London, CT.
Harry, H., Van Houten, T, Plantz, M., & Greenway, M. (1996). Measuring program outcomes: A practical approach. Alexandria, VA: United Way of America.
Jacobs, E H. & Weiss, H. B. (1988). Lessons in context. In H. B. Weiss & E H. Jacobs (Eds.) Evaluating family programs. New York: Aldine De Gruyter. 497-505.
Kellogg Foundation (1998, January). WK. Kellogg Foundation evaluation handbook. Battle Creek, MI: Author.
Lipsey, M. W (1993, Spring). Theory as method: Small theories of treatments. In L. B. Sechrest & A. G. Scott (Eds.) Understanding causes and generalizing about them. New Directions for Program Evaluation, 57. 5-38.
Plantz, M. C., Greenway, M. T., & Hendricks, M. (1997, Fall). Outcome measurement: Showing results in the nonprofit sector. In Using performance measurement to improve programs. New Directions for Evaluation, 75. 15-30.
Rossi, P. H. (1997). Program outcomes: Conceptual and measurement issues. Chapter 2 in Mullen, E. J., & Magnabosco (Eds.), Outcomes measurement in the human services: Cross-cutting issues and methods. (pp. 20-34.) Washington, DC: National Association of Social Workers Press.
Rossi, P. H., Freeman, H. E., & Lipsey, M. W (1999). Evaluation: A systematic approach. (6th ed.) Beverly Hills, CA: Sage
United Way of America (1999a). Community status reports and targeted community interventions: Drawing a distinction. Unpublished paper. Alexandria, VA: Author.
United Way of America (19996, April). Achieving and measuring community outcomes: Challenges, issues, some approaches. Alexandria, VA: Author.
United Way of America (1997, March). Newsletter, 3(1). Alexandria, VA: Author.
Weiss, H. B. (1997). Results-based accountability for child and family services. Chapter 14 in Mullen, E. J., & Magnabosco (Eds.), Outcomes measurement in the human services: Cross-cutting issues and methods. Washington, DC: NASW Press. 173-180.
Weiss, H. B., & Jacobs, E H. (Eds.) (1988). Evaluating family programs. New York: Aldine De Gruyter.
Wholey, J. S., Harry, H. P., & Newcomer, K. E. (Eds.). (1994). The handbook of practical program evaluation. San Francisco: Jossey-Bass.
Robert L. Fischer is senior research associate, Center on Urban Poverty and Social Change, Mandel School of Applied Social Sciences, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, OH 44106-- 7164; e-mail: rlf11@po.cwru.edu.
Author's note: This work was conducted while the author was serving as director of program evaluation at Families First, a nonprofit child- and family-serving agency in Atlanta, Georgia. The opinions expressed are those of the author alone.
Manuscript received: April 24, 2000
Revised: July 6, 2001
Accepted: August 21, 2001
Copyright Families in Society Nov/Dec 2001
Provided by ProQuest Information and Learning Company. All rights Reserved