文章基本信息

标题：Special issue on policy evaluation: introduction.
作者：Meadows, Pamela ; Metcalf, Hilary
期刊名称：National Institute Economic Review
印刷版ISSN：0027-9501
出版年度：2003
期号：October
语种：English
出版社：National Institute of Economic and Social Research
摘要：Although the British Government has a long tradition of policy evaluation, until recently it has concentrated on a limited range of types of policy and using a limited range of methods. Thus, overseas aid projects have long been subject to the evaluation both of their impact and of their costs and benefits, as have regional development initiatives. However, traditionally, social policies and programmes have been less subject to such scrutiny. The 'Green Book' of Treasury guidance for departments on how to conduct appraisal and evaluation traditionally concentrated on the evaluation of economic interventions. However, the latest edition, Appraisal and Evaluation in Central Government, published in January 2003, both covers a wider range of policy areas and a wider and more robust range of techniques.
关键词：Social policy

Special issue on policy evaluation: introduction.

Meadows, Pamela ; Metcalf, Hilary

Although the British Government has a long tradition of policy evaluation, until recently it has concentrated on a limited range of types of policy and using a limited range of methods. Thus, overseas aid projects have long been subject to the evaluation both of their impact and of their costs and benefits, as have regional development initiatives. However, traditionally, social policies and programmes have been less subject to such scrutiny. The 'Green Book' of Treasury guidance for departments on how to conduct appraisal and evaluation traditionally concentrated on the evaluation of economic interventions. However, the latest edition, Appraisal and Evaluation in Central Government, published in January 2003, both covers a wider range of policy areas and a wider and more robust range of techniques.

Since 1997, when the present Government was first elected, it has introduced a wide range of social policy changes and initiatives, and has placed a high degree of emphasis on finding out 'what works' and then using that evidence to feed into practice. Perhaps the best known of these initiatives is the National Institute for Clinical Excellence, which has been established to ensure that National Health Service practice is based on both the clinical effectiveness and the cost-effectiveness of different treatments, but similar approaches are being adopted in other fields too.

Moreover, the development of policies as much as their implementation on the ground has relied to a large extent on evidence about the possible outcomes of the intervention. Where there is no existing British evidence, policymakers have not been afraid to borrow evidence from other countries. In practice this has tended to mean evidence from the United States, where the tradition of policy and programme evaluation is well embedded.

In 1999 the Government produced a White Paper, Modernising Government, and, in the subsequent guidance to civil servants as to what the principles set out in the White Paper meant for operations, suggested that the evidence on which policy was expected to draw included:

 "Expert knowledge; published research; existing
 statistics; stakeholder consultations; previous policy
 evaluations; the Internet; outcomes from consultations;
 costings of policy options; output from economic and
 statistical modelling." (Strategic Policy Making Team,
 1999)

There is now a Directorate of Policy Evaluation in the Centre for Management and Policy Studies in the Cabinet Office, whose mission is to ensure that the evaluation message is both heard and implemented throughout government. In addition, as a condition of providing funding, the Treasury, in particular, has imposed on departments evaluation requirements for new initiatives which are more thorough and more robust than those which have traditionally been commissioned in the UK. Almost all new programmes, whatever their economic or social objectives, are now, wherever possible, subject to impact and 'value for money' assessment. Many new government programmes now have 10 per cent of their budget allocated for evaluation, enabling evaluations of larger programmes to use the more robust methods.

Certainly, evaluation methods being used in the UK have been becoming increasingly sophisticated. Comparison group methods (in which the outcomes for programme participants and for a similar group outside the programme are compared) have been increasingly used. These have been replacing merely descriptive evaluations which, at worst, attribute all successful outcomes of programme participants as due to the programme or rely on participants' assessment of the assistance provided. The use of comparison group methods has been assisted by attempts to build evaluation into programme implementation. These developments are to be welcomed.

However, a distinct pecking order of evaluation techniques appears to be developing, with demonstration projects (also known as random assignment evaluations) seen as the gold standard of evaluation, followed by comparison groups methods with descriptive and qualitative techniques at the bottom. But, just as no exchange rate regime is most appropriate to all circumstances, the efficacy of an evaluation technique depends on the policy to be evaluated. For example, demonstration projects rely on the control group being unaffected by the programme ('uncontaminated'). This is difficult to achieve in many evaluations, but impossible in programmes which aim to disseminate good practice across disparate practitioners who are in contact with both project participants and non-participants. Contamination is also a problem for demonstration projects in which programme participants gain at the expense of the control group. (This might occur, for example, in a demonstration project aimed at reducing unemployment if programme participants' increased employment probability were to result in a decline in the employment probability amongst the control group.) Robust evaluations need to be able to consider such factors and ensure that observed outcomes could be replicated within a wider population. Moreover, Heckman and Smith (1995) have shown that comparison groups can be as accurate as control groups (i.e. those achieved through random assignment methods), if sufficient appropriate information is collected from both groups.

Demonstration projects, most purely, and comparison group methods, often, are 'black box' methods, identifying inputs and outcomes but not how the outcomes were achieved. Whilst such forms of evaluation are important (indeed, often crucial), these approaches need to be complemented by methods which identify how and why effects were achieved. This can be particularly important for demonstration projects and pilot programmes ('Pathfinders', 'Trailblazers' and the like), which may differ in important ways from a full programme.

Hawthorne effects may occur, where the response of those taking part is partly due to the fact that it is a demonstration or pilot and someone is taking an interest in them. (The term comes from experiments to measure the effect of work organisation changes, including lighting levels and the timing and duration of both shifts and breaks, on productivity at the Western Electric plant in Hawthorne, Chicago during the 1920s. Productivity rose both when changes were introduced and when conditions returned to their original configuration.)

The restriction of pilots to particular locations may mean that observed effects are affected by characteristics of the location. Pilot areas may be selected because the public services there are known to be good at delivery. Implementation practices may differ between a pilot and a full programme. For example, the best staff may be selected to deliver a pilot. None of these features is likely to be replicated in a wider programme, so evaluation needs to consider how the implementation was done, as well as what was actually delivered. This calls for complementary descriptive techniques, which may be qualitative or quantitative.

Complementary descriptive techniques are also important to ensure that the programme being evaluated is indeed the programme intended and that other factors have not affected outcomes. For example, a pilot programme may fail, not because the basic approach was wrong, but because implementation was poor. Descriptive analysis can be used to ensure that the programme evaluated is indeed the programme assumed and, if not, why they diverged.

Finally, the efficacy of evaluation needs to be considered within its own cost-benefit framework. Where there is already extensive evidence of the effectiveness of the elements of a programme, then the costs of a robust evaluation may outweigh their benefits. Evaluation should be seen as only one strand contributing to 'evidence-based policy'. At the same time, the extension of evaluation from new to long-standing programmes and public sector practices may be beneficial, investigating whether received 'wisdom' as to their effects is correct.

This special edition of the Review provides examples of the use of some of these more robust techniques, both in recent British policy evaluations, and in the evaluation of similar programmes from North America. It also discusses some of the practical measurement challenges that confront the evaluator in circumstances where programmes have both multiple objectives and multiple delivery methods.

REFERENCES

Heckman, J. and Smith, J.A. (1995), 'Assessing the case for social experiments', Journal of Economic Perspectives, 9(2).

Strategic Policy Making Team (1999), Professional Policy Making for the Twenty First Century, London, The Cabinet Office.