Special issue on policy evaluation: introduction.
Meadows, Pamela ; Metcalf, Hilary
Although the British Government has a long tradition of policy
evaluation, until recently it has concentrated on a limited range of
types of policy and using a limited range of methods. Thus, overseas aid
projects have long been subject to the evaluation both of their impact
and of their costs and benefits, as have regional development
initiatives. However, traditionally, social policies and programmes have
been less subject to such scrutiny. The 'Green Book' of
Treasury guidance for departments on how to conduct appraisal and
evaluation traditionally concentrated on the evaluation of economic
interventions. However, the latest edition, Appraisal and Evaluation in
Central Government, published in January 2003, both covers a wider range
of policy areas and a wider and more robust range of techniques.
Since 1997, when the present Government was first elected, it has
introduced a wide range of social policy changes and initiatives, and
has placed a high degree of emphasis on finding out 'what
works' and then using that evidence to feed into practice. Perhaps
the best known of these initiatives is the National Institute for
Clinical Excellence, which has been established to ensure that National
Health Service practice is based on both the clinical effectiveness and
the cost-effectiveness of different treatments, but similar approaches
are being adopted in other fields too.
Moreover, the development of policies as much as their
implementation on the ground has relied to a large extent on evidence
about the possible outcomes of the intervention. Where there is no
existing British evidence, policymakers have not been afraid to borrow
evidence from other countries. In practice this has tended to mean
evidence from the United States, where the tradition of policy and
programme evaluation is well embedded.
In 1999 the Government produced a White Paper, Modernising
Government, and, in the subsequent guidance to civil servants as to what
the principles set out in the White Paper meant for operations,
suggested that the evidence on which policy was expected to draw
included:
"Expert knowledge; published research; existing
statistics; stakeholder consultations; previous policy
evaluations; the Internet; outcomes from consultations;
costings of policy options; output from economic and
statistical modelling." (Strategic Policy Making Team,
1999)
There is now a Directorate of Policy Evaluation in the Centre for
Management and Policy Studies in the Cabinet Office, whose mission is to
ensure that the evaluation message is both heard and implemented
throughout government. In addition, as a condition of providing funding,
the Treasury, in particular, has imposed on departments evaluation
requirements for new initiatives which are more thorough and more robust
than those which have traditionally been commissioned in the UK. Almost
all new programmes, whatever their economic or social objectives, are
now, wherever possible, subject to impact and 'value for
money' assessment. Many new government programmes now have 10 per
cent of their budget allocated for evaluation, enabling evaluations of
larger programmes to use the more robust methods.
Certainly, evaluation methods being used in the UK have been
becoming increasingly sophisticated. Comparison group methods (in which
the outcomes for programme participants and for a similar group outside
the programme are compared) have been increasingly used. These have been
replacing merely descriptive evaluations which, at worst, attribute all
successful outcomes of programme participants as due to the programme or
rely on participants' assessment of the assistance provided. The
use of comparison group methods has been assisted by attempts to build
evaluation into programme implementation. These developments are to be
welcomed.
However, a distinct pecking order of evaluation techniques appears
to be developing, with demonstration projects (also known as random
assignment evaluations) seen as the gold standard of evaluation,
followed by comparison groups methods with descriptive and qualitative
techniques at the bottom. But, just as no exchange rate regime is most
appropriate to all circumstances, the efficacy of an evaluation
technique depends on the policy to be evaluated. For example,
demonstration projects rely on the control group being unaffected by the
programme ('uncontaminated'). This is difficult to achieve in
many evaluations, but impossible in programmes which aim to disseminate good practice across disparate practitioners who are in contact with
both project participants and non-participants. Contamination is also a
problem for demonstration projects in which programme participants gain
at the expense of the control group. (This might occur, for example, in
a demonstration project aimed at reducing unemployment if programme
participants' increased employment probability were to result in a
decline in the employment probability amongst the control group.) Robust
evaluations need to be able to consider such factors and ensure that
observed outcomes could be replicated within a wider population.
Moreover, Heckman and Smith (1995) have shown that comparison groups can
be as accurate as control groups (i.e. those achieved through random
assignment methods), if sufficient appropriate information is collected
from both groups.
Demonstration projects, most purely, and comparison group methods,
often, are 'black box' methods, identifying inputs and
outcomes but not how the outcomes were achieved. Whilst such forms of
evaluation are important (indeed, often crucial), these approaches need
to be complemented by methods which identify how and why effects were
achieved. This can be particularly important for demonstration projects
and pilot programmes ('Pathfinders', 'Trailblazers'
and the like), which may differ in important ways from a full programme.
Hawthorne effects may occur, where the response of those taking
part is partly due to the fact that it is a demonstration or pilot and
someone is taking an interest in them. (The term comes from experiments
to measure the effect of work organisation changes, including lighting
levels and the timing and duration of both shifts and breaks, on
productivity at the Western Electric plant in Hawthorne, Chicago during
the 1920s. Productivity rose both when changes were introduced and when
conditions returned to their original configuration.)
The restriction of pilots to particular locations may mean that
observed effects are affected by characteristics of the location. Pilot
areas may be selected because the public services there are known to be
good at delivery. Implementation practices may differ between a pilot
and a full programme. For example, the best staff may be selected to
deliver a pilot. None of these features is likely to be replicated in a
wider programme, so evaluation needs to consider how the implementation
was done, as well as what was actually delivered. This calls for
complementary descriptive techniques, which may be qualitative or
quantitative.
Complementary descriptive techniques are also important to ensure
that the programme being evaluated is indeed the programme intended and
that other factors have not affected outcomes. For example, a pilot
programme may fail, not because the basic approach was wrong, but
because implementation was poor. Descriptive analysis can be used to
ensure that the programme evaluated is indeed the programme assumed and,
if not, why they diverged.
Finally, the efficacy of evaluation needs to be considered within
its own cost-benefit framework. Where there is already extensive
evidence of the effectiveness of the elements of a programme, then the
costs of a robust evaluation may outweigh their benefits. Evaluation
should be seen as only one strand contributing to 'evidence-based
policy'. At the same time, the extension of evaluation from new to
long-standing programmes and public sector practices may be beneficial,
investigating whether received 'wisdom' as to their effects is
correct.
This special edition of the Review provides examples of the use of
some of these more robust techniques, both in recent British policy
evaluations, and in the evaluation of similar programmes from North
America. It also discusses some of the practical measurement challenges
that confront the evaluator in circumstances where programmes have both
multiple objectives and multiple delivery methods.
REFERENCES
Heckman, J. and Smith, J.A. (1995), 'Assessing the case for
social experiments', Journal of Economic Perspectives, 9(2).
Strategic Policy Making Team (1999), Professional Policy Making for
the Twenty First Century, London, The Cabinet Office.