The state of evidence-based policy evaluation and its role in policy formation.
Davies, Philip
This paper argues that evidence-based policy has clearly made a
worldwide impact, at least at the rhetorical and institutional levels,
and in terms of analytical activity. The paper then addresses whether or
not evidence-based policy evaluation has had an impact on policy
formation and public service delivery. The paper uses a model of
research-use that suggests that evidence can be used in instrumental,
conceptual and symbolic ways. Taking four examples of the use of
evidence in the UK over the past decade, this paper argues that evidence
can be used instrumentally, conceptually and symbolically in
complementary ways at different stages of the policy cycle and under
different policy and political circumstances. The fact that evidence is
not always used instrumentally, in the sense of "acting on research
results in specific, direct ways" (Lavis et al., 2003, p. 228),
does not mean that it has little or no influence. The paper ends by
considering some of the obstacles to getting research evidence into
policy and practice, and how these obstacles might be overcome.
Keywords: Evidence-based policy; policymaking; public service
delivery; delivery trajectories
JEL Classifications: H11; H43; 128; 138; J62
The rhetoric of evidence-based policy
Over the past decade or so, and in many countries, public
policymaking has claimed to be 'evidence-based' and doing
'what works'. In the United Kingdom evidence-based policy was
a key element of efforts to reform the machinery of government after
1997. The Modernising Government White Paper (Cabinet Office, 1999a),
for instance, stated that government policy must be evidence-based,
properly evaluated and based on best practice. Prime Minister Tony Blair confirmed his "Government's commitment to policy-making based
on hard evidence and, as in education, or NHS reforms, or fighting
crime, we must always be looking at the outcomes of policies--the
benefits in people's lives--not the process" (Cabinet Office,
2000, p.3).
A report from the Cabinet Office Strategic Policy Making Team on
Professional Policy Making for the Twenty-First Century also suggested
that "policy making must be soundly based on evidence of what
works" and that "government departments must improve their
capacity to make use of evidence" (Cabinet Office, 1999b, p.40).
This approach to policymaking called for a greater use of evaluation of
policies ex ante and post hoc and, consequently, a greater use of
monitoring the roll out of policies and the delivery of public services (Barber, 2007).
Yet another report from the Cabinet Office in 2000, titled Adding
It Up, clearly recognised the need for high quality analysis and
evaluation in government, whilst acknowledging a sometimes limited
demand:
"The government is fully committed to the principle that
policies should be based on evidence. This means policy should be
supported by good analysis and, where appropriate, modelling. This study
has found, however, that demand for good analysis is not fully
integrated in the culture of central Government."
Cabinet Office, 2000, p. 12
The rhetoric of evidence-based policymaking has continued with the
UK's Coalition Government. In a speech to the Annual Leadership
Conference of the National College for School Leadership, titled Seizing
Success 2010, the Secretary of State for Education, Michel Gove,
suggested that:
"Indeed I want to see more data generated by the profession to
show what works, clearer information about teaching techniques that get
results, more rigorous, scientifically-robust research about pedagogies
which succeed and proper independent evaluations of interventions which
have run their course. We need more evidence-based policy making, and
for that to work we need more evidence."
The evidence-based policy agenda can be found worldwide. In his
Inaugural Address as the 44th President of the United States of America,
Barack Obama told the American people that his Administration would be
based:
"not [on] whether our government is too big or too small, but
whether it works, whether it helps families find jobs at a decent wage,
care they can afford, a retirement that is dignified. Where the answer
is yes, we intend to move forward. Where the answer is no, programs will
end"
Obama, 2009
President Obama's mission for US policymaking is supported by
resources in America such as the Coalition for Evidence-Based Policy
Making, (1) Evidence-Based Practice Centers, (2) and Social Programs
That Work. (3) Evidence-based policy is similarly promoted in Australia
(Campbell, 2005; Leigh, 2009; Topp and McKetin, 2003), Canada (Zussman,
2003; Lomas, et al., 2005; CIHR, 2006; Townsend and Kunimoto, 2009), New
Zealand (Marsh, 2006), South Africa (Office of the Presidency, 2010),
and by international organisations such as the OECD (Martin, 2000),
UNESCO (Milani, 2009), and the World Bank (Fiszbein and Shady, 2009).
Organisations that undertake systematic reviews of evidence (e.g. the
Cochrane Collaboration, (4) the Campbell Collaboration, (5) the
Eppi-Centre, (6) DfID, (7) AusAID, (8) 3ie (9)), and those that provide
evidence-based guidance for public service professionals and users of
public services (e.g. the National Institute for Health and Clinical
Excellence, (10) the Social Care Institute of Excellence, (11) Coalition
for Evidence-Based Education (12)) all add to the global availability of
high quality evidence for policymaking and the provision of public
services.
Evidence-based policy, then, has clearly made a worldwide impact,
at least at the rhetorical and organisational levels and in terms of
analytical activity. The question that this paper addresses is whether
or not evidence-based policy evaluation has had an impact on policy
formation and public service delivery.
How evidence influences policy and practice
The fundamental principle of evidence-based policy is beguilingly
simple. It helps policymakers make better decisions, and achieve better
outcomes, by using existing evidence more effectively, and undertaking
new research, evaluation and analysis where knowledge about effective
policy initiatives and policy implementation is lacking. This relatively
straightforward principle, however, is not without problems when applied
to the realities of policymaking.
First, there are many factors other than evidence that influence
policymaking (Davies, 2004). These include the role of values, beliefs
and ideology, which are the driving forces of most policymaking
processes, as well as the experience, expertise and judgement of
policymakers. The availability of resources, a bureaucratic culture, the
role of lobbyists and pressure groups, and the need to respond quickly
to everyday contingencies all contribute to policymaking in addition to
evaluation evidence and analysis. For evaluation evidence to be
effective in policymaking, one has to find ways of integrating such
evidence with these many other factors.
Second, evidence is seldom self-evident or definitive. By itself
evidence does not tell users what to do, or how to act. It merely
provides a basis upon which decisionmakers can make informed judgements
about the likely effect or impact of an intervention, or about the
conditions under which a desired effect is likely to be achieved or not
achieved. Research evidence, like all scientific evidence, is
probabilistic and carries some degree of uncertainty. That uncertainty
can be better understood, and sometimes reduced, by formative evaluation that explores how, why, for whom, and under what conditions an
intervention is likely to achieve its desired effects. Hence,
evidence-based policy requires impact and formative evaluation using
qualitative and quantitative methods, under
experimental/quasi-experimental and naturalistic conditions (Davies,
2004; HM Treasury, 2011).
Third, researchers and policymakers often have different notions of
evidence and different absorptive capacity to seek and use evidence.
Lomas et al. (2005) found that whereas policymakers in Canada "view
evidence colloquially ("anything that establishes a fact or gives
reason for believing something") and define it by its relevance,
most researchers view evidence scientifically (the use of systematic,
replicable methods for production) and define it by its
methodology" (Lomas et al., 2005, p.1). A similar study of civil
servants in Whitehall (Campbell et al., 2007) found that these
policymakers wanted evidence that focused on the 'end
product', rather than on how the information was either collected
or analysed. These Whitehall civil servants also valued anecdotal
evidence, and evidence that draws upon "'real life
stories', 'fingers in the wind', 'local' and
'bottom up' evidence" (Campbell et al., 2005, p.21).
Ouimet et al. (2009) found that the 'absorptive capacity' of
civil servants to seek and use research evidence depended on their
physical and cognitive access to research (their scientific literacy),
their educational backgrounds, and the direct access they have to
academic researchers. Given these different notions, expectations and
experiences of evidence, it is not surprising that the role of
evaluation evidence in policy formation and delivery is not
straightforward or assured.
Fourth, the impact of research, evaluation and analysis is seldom
direct or immediate. As Carol Weiss notes "cases of immediate and
direct influence of research findings on specific policy decisions are
not frequent" (Weiss, 1982, p. 620). Weiss also notes that:
"rarely does research supply an "answer" that policy
actors employ to solve a policy problem. Rather, research provides a
background of data, empirical generalisations, and ideas that affect the
way that policy makers think about a problem."
Weiss (1982), pp. 620-1
Weiss goes on to suggest that "to acknowledge this is not the
same as saying that research findings have little influence on
policy". For Weiss, research, evaluation and analysis influence
policymakers':
"conceptualizations of the issues with which they deal,
affects those facets of the issue they consider inevitable and
unchangeable and those they perceive as amenable to policy action;
widens the range of options that they consider, and challenges
taken-for-granted assumptions about appropriate goals and appropriate
activities ... ideas from research are picked up in diverse ways and
percolate through to officeholders"
op cit., p. 622
This percolation process can take a long time for research and
evaluation evidence to impact on policy and practice. Drawing on the
work of Balas and Boren (2000), Mold and Peterson (2005) have estimated
that in the case of medical knowledge "it takes an average of 17
years to turn 14 per cent of original research findings into changes in
care that benefits patients" (Mold and Peterson, 2005, S14). One
can only assume that in substantive policy areas that have a shorter
history and tradition of evidence-based policy and practice than
medicine, the time-lag for the percolation of evidence may be even
longer.
This time lag between gathering high quality evidence and getting
it into policy and practice is often seen as another factor working
against evidence-based policy. Policymaking usually takes place in time
periods of weeks and months, whereas high quality evidence gathering
usually requires many months and years. The challenge for researchers
and analysts is to identify and provide the best available evidence in
the time available to inform the contemporary policymaking process,
whilst also developing a more robust evidence base for future
policymaking in the medium to longer term. The development of strategic
policymaking teams within many governments, which seek to identify the
policy needs of their countries in five-year, ten-year, fifteen-year and
even longer future time periods, provides an opportunity for researchers
and policy teams to work together to build a medium- to longer-term
evidence base that is sound and robust. The timing of evidence gathering
and policymaking, whilst clearly a major challenge, need not preclude
research-based evidence contributing to policy and practice, providing
one distinguishes between the operational (day-to-day) and strategic
(medium- to long-term) use of evidence.
Ways of using research and evaluation in policymaking
Lavis et al. (2003), drawing on the work of Beyer (1997), have
noted that research knowledge may be used in instrumental, conceptual
and symbolic ways. Instrumental use involves "acting on research
results in specific, direct ways", whereas conceptual use involves
"using research results for general enlightenment; results
influence actions, but in less specific, more indirect ways than in
instrumental use" (Lavis et al., 2003, p. 228). Symbolic use is
more about "using research results to legitimate and sustain
pre-determined positions" (ibid). Amara et al. (2004) have
suggested that "the three types of research utilization must be
considered as complementary rather than as contradictory dimensions of
research utilization" (Amara, 2004, p. 79). These authors have
examined empirically the instrumental, conceptual and symbolic uses of
research evidence in Canadian federal and provincial governments, and
found that:
"conceptual use of research is more frequent than instrumental
use. More precisely, the conceptual use of research is more important in
the day-to-day professional activity of professionals and managers in
government agencies than symbolic utilization, which, in turn, is more
important than instrumental utilization".
Amara, 2004, p. 98
The remainder of this paper will present some examples of how
evaluation evidence has been used in policy formation and the delivery
of public services in the UK and other countries. It will be argued that
instrumental, conceptual and symbolic uses of evidence are not mutually
exclusive, but can operate in different ways at different stages in the
policy cycle and under different political contexts.
The Educational Maintenance Allowance
The evaluation of the Educational Maintenance Allowance (EMA) in
England (Dearden et al., 2001) is one example of how evaluation evidence
was undertaken to test the likely effectiveness and cost-effectiveness
of a major policy initiative before it was rolled out nationally. The
subsequent policy was closely based on the findings of this evaluation
and, therefore, can be seen as an example of the instrumental use of
evaluation evidence in policymaking.
The EMA has been described as "a conditional cash transfer,
the aim of which is to decrease dropout rates in the transition from
compulsory to post-compulsory education in the UK" (IFS, 1999). The
EMA evaluation tested four variants of a means-tested conditional cash
transfer paid to 16-18-year olds for staying in full-time education. The
variants consisted of two levels of payment (30 [pounds sterling] and 40
[pounds sterling]) to either the young person or a primary carer (usually the mother), combined with different levels of a retention
bonus (50 [pounds sterling] and 80 [pounds sterling]) and an achievement
bonus (50 [pounds sterling] and 140 [pounds sterling]). The evaluation
was undertaken amongst male and female young people, and in both urban
and rural areas. A comparison group, against which outcomes of those
eligible for EMA could be assessed, was identified using propensity
score matching (Rosenbaum and Rubin, 1983; Dearden et al., 2008). (13)
Dearden et al. (2008) reported a substantial impact of the cash
transfers, ranging from a 4.5 per cent increase (over the comparison
group) in full-time education participation in the first year to a 6.7
per cent increase in the second year. Chowdry and Emmerson (2010) have
argued that "based on these impacts, and on estimates of the
financial benefits of additional education taken from elsewhere in the
economics literature, .... the costs of providing EMA were likely to be
exceeded in the long run by the higher wages that its recipients would
go on to enjoy in future" (Chowdry and Emmerson, 2010, p. 1).
Notwithstanding the clear impacts and benefits of the EMA, the
programme was withdrawn by the Coalition Government following the 2010
Spending Review. The rationale for ending the EMA, according to Chowdry
and Emmerson (2010), was based on the findings of a survey of 16-17-year
olds for the Department for Children Schools and Families (Spielhofer et
al., 2010). This suggested that "only 12 per cent of young people
overall receiving an EMA believe that they would not have participated
in the courses they are doing if they had not received an EMA"
(Spielhofer et al., 2010, p. 7). The Coalition Government inferred from
this "that the EMA policy carries a 'deadweight' of 88
per cent, i.e. 88 out of every 100 students receiving EMA would still
have been in education if EMA did not exist and are therefore being paid
to do something they would have done anyway" (Chowdry and Emmerson,
2010, p. 1). In turn, Chowdry and Emmerson have argued that the
cost-benefit analysis undertaken by the Dearden et al. (2008) evaluation
"suggests that even taking into account the level of deadweight
that was found, the costs of EMA are completely offset by the beneficial
effect of the spending on those whose behaviour was affected"
(Chowdry and Emmerson, 2010, p. 1). Chowdry and Emmerson also point out
that the EMA may have had other benefits, such as better school
attendance, more study time, and "the transfer of resources to
low-income households with children, which may in its own right
represent a valuable policy objective" (ibid).
This differential use and interpretation of evidence to support,
and later withdraw, the EMA illustrates the point made above about
research evidence being used instrumentally, conceptually and
symbolically at different stages of the policy cycle and under different
political circumstances. It also demonstrates that alternative sources
of evidence can be used to justify a policy decision, and that factors
other than evidence (values, beliefs, ideology, resources, judgement),
play a significant role in policymaking.
The employment retention and advancement demonstration
A major policy evaluation that started off with the suggestion of
being instrumental in policymaking, but ended up being more of a
conceptual use of evidence, is the Employment Retention and Advancement
(ERA) Demonstration project. This demonstration project was undertaken
across the UK between 2003 and 2011 to test the likely impact and
cost-effectiveness of a combination of inputs (a post-employment adviser
service, cash rewards for staying in work and for completing training,
and in-work training support) for low paid workers and the long-term
unemployed. Over 16,000 people from six regions of Britain were randomly
allocated to the ERA programme, or a business-as-usual control group
(the counterfactual).
Although the evaluation took seven years to complete, there were
milestone data in real time from one year following the beginning of the
project. This means that the ERA demonstration project could have
informed the development of welfare-to-work policies in an instrumental
way from 2004 onwards. Such premature use of evidence was sensibly
avoided, and the first-year impacts of the ERA initiatives were
initially reported in 2007 (Dorsett et al., 2007). These reported
substantial and statistically significant increases in earnings and
employment retention for one of the lone parent target groups (the New
Deal for Lone Parents [NDLP] group), but a lesser impact on earnings for
the Working Tax Credit [WTC] group. The first year impacts on the
earnings of the New Deal 25 Plus [ND25+] group, however, "were
smaller, more mixed, and less certain" (Dorsett et al., 2007, p.
10) than for the lone parent groups.
These impacts, however, were reversed by the rime of the final
report on the ERA evaluation in 2011 (Hendra et al., 2011). This
reported that for the NDLP and WTC groups the early effects gained from
the proportion of participants who worked full time (at least 30 hours
per week) "generally faded in the later years, after the programme
ended ... [and] from a cost-benefit perspective, ERA did not produce
encouraging results for the lone parent groups, with the exception of
the NDLP better-educated subgroup" (Hendra et al., 2011, p. 10).
For the long-term unemployed participants (mostly men) in the ND25+
group, however, the longer-term impacts were more positive in that:
ERA produced modest but sustained increases in employment and
substantial and sustained increases in earnings. These positive effects
emerged after the first year and were still evident at the end of the
follow-up period. The earnings gains were accompanied by lasting
reductions in benefits receipt over the five-year follow-up period. ERA
proved cost-effective.
Hendra et al., 2011, pp. 10-11
The ERA evaluation did not have an immediate or direct effect on
welfare-to-work policies in the sense of rolling out nationally a
discrete set of retention and advancement initiatives for low-income and
long-term unemployed people. Hence, it does not provide an example of
instrumental use of evidence-based policymaking. It has, however, had
other effects in terms of informing and enlightening policymaking on
welfare-to-work issues (i.e. a conceptual use of evidence).
First, as has been noted above, the ERA evaluation demonstrated
that a policy, or a set of policy initiatives, can have heterogeneous
effects across client groups. Whereas the combination of financial
incentives and post-employment support had generally positive outcomes
for the ND25+ group of clients, "over five years, ERA in the UK had
no lasting overall effects for lone parents in the New Deal for Lone
Parents (NDLP) and Working Tax Credit (WTC) target groups" (Hendra
et al., 2011, p. 232). Also, not all of the models of implementation
were successful (ibid). Such findings about the heterogeneous impacts of
interventions are invaluable for policymaking purposes. Part of the
value of evaluation in policymaking is that it allows negative
consequences and unsuccessful implementation and delivery approaches to
be avoided. The Final Report on the ERA Demonstration acknowledged this
by noting that "had the Government invested in ERA as a full-scale
national policy without having mounted this rigorous test of its
effectiveness in advance, that investment would not have achieved all
the hoped-for positive results" (Hendra, 2011, p. 248).
Second, although the ERA Demonstration did not result in a direct
national roll-out of all of the initiatives that it tested, it would be
wrong to conclude that it has not had some influence on welfare-to-work
policy in the UK. In developing the Coalition Government's Work
Programme, evidence from the ERA Demonstration was used by the
Department for Work and Pensions for developing sustainability outcome
measures. Also, lessons learned from ERA Demonstration were shared with
a number of officials and contracted service providers involved with the
Work Programme. ERA evidence has also helped inform a number of policy
initiatives for lone parents, such as the in-work emergency discretion
fund and in-work credit for lone parents. (14)
Third, the ERA evaluation showed that immediate and early impacts
of a policy may not be sustained over time and, consequently, may
provide imprecise evidence. A similar finding was made by the evaluation
of the Self-Sufficiency Project (SSP) in Canada, in which initial
positive impacts of financial incentives and post-employment support
were not sustained beyond the fifth quarter follow-up (Quets et al.,
1999). It is also important to establish that positive effects on
certain outcomes are also evident on other important outcomes. A major
review of experimental and quasi-experimental evaluations of conditional
cash transfers in a range of countries in Africa, Asia, Latin and South
America, and Eastern Europe undertaken by the World Bank (Fiszbein and
Shady, 2009) showed that whereas conditional cash transfers had
generally positive short-term effects in terms of getting children to
attend school and health centres for immunisation, evidence of the
longer-term outcomes in terms of improved educational achievement and
health status was less apparent. In this respect the lessons learned
from this, and the ERA evaluations, are that evidence-based policymaking
requires sustained monitoring and evaluation over time, using outcome
measures that have internal validity (known bias) and external validity (i.e. have 'real world' relevance).
Fourth, such monitoring and evaluation must include
formative/process approaches as well as impact methods. Some of the most
interesting and valuable evidence from the ERA evaluation was about the
challenges of implementing, developing and sustaining the ERA's
initiatives in different regions and contexts, and how these challenges
were overcome. This required a multi-method evaluation "including
in-depth qualitative interviews with programme staff and participants;
three waves of survey interviews with programme and control group
respondents (at 12, 24, and 60 months after random assignment); (14) and
administrative data on participants' employment, earnings, and
benefits receipt" (Hendra et al., 2011, p. 15).
Fifth, the ERA evaluation contributed to evidence-based
policymaking by testing in a UK context interventions that had generally
been proven to be effective in the USA and Canada. Evidence does not
always 'travel' well. The US and Canadian labour markets,
welfare systems, and their socio-demographic and cultural features are
generally very different from those in the UK. Furthermore, there are
differences on these variables within the UK. Policies that have been
shown to be effective in one or more countries, or in some parts of
countries, may not have the same outcomes elsewhere. Hence the need for
the ERA evaluation to have tested the effectiveness of post-employment
welfare policies in, and within, the UK. Introducing such policies
without impact and formative evaluation runs a high risk of policy
failure and misplaced resources.
Sixth, much of the evidence on the use of personal employment
advisers, cash transfers/incentives, and training support has been in
pre-employment contexts. The ERA evaluation has provided valuable
evidence on the implementation and effectiveness of the policy
initiatives post-employment. This point has been acknowledged in the
Final Report on the ERA Demonstration where the authors note that
"little of the [existing] evidence came from interventions that
included extensive job coaching and advancement support after people
began working. Consequently, ERA, like similar demonstration programmes
in the US, was charting new territory" (Hendra et al., 2011, p.
232).
Lastly, the ERA evaluation was the first major demonstration
project of its kind and magnitude in the UK. Unlike the many other
policy pilots that are undertaken in the UK, in which a policy
commitment has already been made (Jowell, 2003), the ERA Demonstration
"tested an idea that the Government had not yet committed to
incorporating into national policy" (Hendra et al., 2011, p. 248).
Furthermore, the Department for Work and Pensions and HM Treasury
committed the ERA to a five-year follow-up period, thereby moving away
from the short-termism of most policymaking and policy evaluation.
Another important aspect of the ERA as a demonstration is that it used a
random allocation design on a very large sample of the population in six
regions of Britain. To this extent it was also demonstrating that a
large-scale randomised controlled evaluation of a major policy
initiative could be undertaken in the UK, alongside other 'mixed
methods' of evaluation--something that was clearly achieved with
considerable success.
Impact assessments
Impact Assessments are an evidence-based tool of policymaking that
have become institutionalised in the UK in the sense that they are a
required part of the policymaking process whenever a policy initiative
imposes or reduces costs, a new information obligation, administrative
burdens, redistribution, regulatory change, or involves a European Union
directive (BIS, 2011a, p. 8). Impact assessments are a structured way of
gathering evidence to establish the economic, social, environmental and
regulatory impacts on business, the third sector and the public sector.
The impacts that have to be assessed in UK policymaking are summarised
in figure 1. The Department of Business, Innovation and Skills (BIS) has
described impact assessments as a tool "to help policymakers to
fully think through the reasons for government intervention, to weigh up
various options for achieving an objective and to understand the
consequences of a proposed intervention" (BIS, 2011a, p. 4).
UK impact assessments require policymakers to identify and appraise viable options that will achieve the policy objective, including the
'do minimum/do nothing' option (usually the
'business-as-usual' option).The appraisal process required by
impact assessments mirrors HM Treasury's ROAMEF (15) process, and
consists of six stages: Development, Options, Consultation, Final
Proposal, Enactment and Review (see figure 2). At each stage of the
impact assessment process evidence must be gathered and critically
appraised for quality, cost-effectiveness and cost benefit. Hence, the
impact assessment process can involve a great deal of evidence gathering
and analysis, though in the spirit of the Coalition Government's
'small government' agenda BIS now calls for
"proportionality of analysis", which it defines as using
"the appropriate level of resources to invest in gathering and
analysing data for appraisals and evaluations" (BIS, 2011b, p. 8).
The depth of analysis required by an impact assessment is seen by BIS as
increasing from 'minimal' during the early stages--"the
identification winners and losers" and the "full description
of costs and benefits" (BIS, 2011a, p. 8)--to a much greater degree
of detail at the final stage of "fully monetizing the costs and
benefits".
Figure 1 Impacts assessments (UK): a summary of the
impacts to be assessed
* Total costs and benefits of options
* Geographical coverage (within the UK)
* Enforcement arrangements
* More than minimum EU requirements
* The value of the proposed offsetting measure per year
* Hampton Principles (a)
* Economic impacts: the impact on competition and on small
firms
* Environmental impacts: greenhouse gases/wider environment,
sustainable development
* Social impacts: health and well-being; human rights; justice
systems; rural proofing
* Statutory equality: impacts on race, gender, disability, sexual
orientation
Note: (a) The Hampton Principles set out "how to reduce
unnecessary administration for businesses, without compromising
the UK's excellent regulatory regime" (see BIS, 2011b, p. 26).
Impact assessments, then, have the potential to use evidence
instrumentally--i.e. to determine the most cost-effective way of
achieving a policy objective or the most cost beneficial way of using
available resources--and/or conceptually in the sense of generating
insight about the likely regulatory, economic, social and environmental
consequences of a policy.
Impact assessments can also involve the symbolic use of evidence.
The National Audit Office (NAO) reviews impact assessment periodically
and has repeatedly found that the level of analysis in impact
assessments is weak, particularly the quality of economic analysis (NAO,
2007). The range of policy options considered by many impact assessments
is also limited (NAO, 2009). In its 2009 Report the NAO found that
"only 20 per cent [of impact assessments] presented the results of
an evaluation of a range of options" and "that the
introduction of the summary sheet, which has improved clarity and
consistency, has encouraged a "tick box" approach rather than
making an assessment of the costs and benefits of different options
integral to policy formation" (NAO, 2009, p. 15). The most recent
NAO review of impact assessments found that "in nearly two thirds
of final Impact Assessments in our sample, however, different options
were not well explored or summarized", and that "overall, 42
per cent of the Impact Assessments we reviewed had at no time considered
more than one option in addition to the 'do nothing'
option" (NAO, 2010, p. 14). This suggests that by supporting the
preferred policy option, rather than genuinely seeking the most
effective and cost-beneficial options, impact assessments sometimes use
evidence symbolically to justify pre-determined positions.
[FIGURE 2 OMITTED]
Delivery trajectories
As has been noted above, researchers and policymakers often have
different notions of evidence. Whereas researchers see evidence as being
theoretically grounded, empirically proven, and meeting scientific
standards of internal validity and adequacy of reporting, policymakers
often have a more utilitarian and problem-solving approach to evidence
(Lomas et al., 2005; Campbell et al., 2007). The use of monitoring to
gather real-time evidence of goal attainment, target achievement, and
success or failure of public service delivery is a major feature of
policymaking in many countries. This may not meet many analysts'
notions of evidence, and it is undoubtedly a performance management
tool, but it is seen and used as an evidence-based approach by many
governments.
Delivery trajectories provide a visual representation of the actual
delivery of a service compared with the expected performance towards a
set goal or target. Figure 3 is a hypothetical representation of the
delivery of anti-retroviral (ARV) drugs in two health service areas for
people who have HIV/AIDs. The dotted line in the middle of figure 3
represents the 'ideal' trend line that would deliver the
target of 95 per cent of people with HIV/AIDS receiving (ARVs) by the
end of 2011. The recent historical performance in Areas A and B, in
terms of delivering ARVs, is represented by the two lines to the left of
the baseline. These indicate a fairly flat delivery trajectory followed
by some improvement in Area A, and a somewhat erratic and declining
performance in Area B. Actual delivery trajectories following the
baseline (Quarter 1 2007) have been plotted. The delivery of ARVs in
Area B is clearly below both the trend line and that of Area A. From a
performance monitoring and management perspective the flat lining of
delivery in Area B at consecutive quarterly data points 1, 2 and 3 might
cause concern and warrant a policy review (sometimes called a delivery
review).
[FIGURE 3 OMITTED]
A priority review is "a rapid analysis of the state of
delivery of a high priority strategy and identification of the action
needed to strengthen delivery" (O'Connor, 2008). It is
undertaken by a team of people with mixed expertise and skills, as well
as the key agencies responsible for frontline delivery of services. Each
priority review involves intensive evidence gathering in the area where
delivery is failing, and seeks to identify problems and weaknesses in
the delivery chain that require remedy. It might also include an
analysis of delivery in a successful area (such as Area A in figure 3)
to identify procedures, activities, people and agencies that could
assist the underperforming area. The analysis undertaken during a
priority review is "firmly rooted in evidence and triangulates
existing evaluations, data and evidence from reviews"
(O'Connor, 2008). A priority review also requires an analysis of
existing and new policy initiatives, including public expenditure
commitments and changes, in order to establish artefacts external to the
local delivery context that might account for poor performance and/or
improvement. The main outcome of a priority review is a prioritised
action plan for strengthening delivery, which is followed up in a
planned and timely manner. In the hypothetical example in figure 3 the
policy review seems to have been effective in terms of improving the
delivery of ARVs in Area B, and establishing a more sustained pattern of
delivery for the future.
In the UK the Prime Minister's Delivery Unit (PMDU) worked
with a series of prioritised delivery targets that had been established
in 2001 as part of the Performance Management and Comprehensive Spending
Review regimes of the Labour Government (Barber, 2007; O'Connor,
2008). PMDU developed a range of delivery trajectories, similar to that
presented in figure 3, and these allowed central government departments
and local agencies responsible for public service delivery to identify
when service delivery fell short of the expected performance. The
evidence provided by such data was then used to decide when
underperformance was more than just a temporary blip, and when it
required more detailed attention, analysis and action in the form of a
policy review.
This approach to using evidence to monitor and manage delivery had
some encouraging results (O'Connor, 2008). Waiting times for
surgery, waiting times to be seen by a doctor in accident and emergency
departments, school attendance and attainment, and train punctuality all
improved during this time. Some (though not all) of this improvement can
be attributed to the use of delivery trajectories, priority reviews and
monitoring and evaluation evidence. The coalition government in the UK
has abandoned these methods of using evidence to monitor and manage
public service delivery, and the machinery of government that supported
them has also been dismantled. The Department of Performance Monitoring
and Evaluation (DPME) in the government of South Africa, however, is
currently piloting similar methods to monitor and manage key public
service delivery as part of that government's Outcomes Approach
(Office of the Presidency, 2010).
Overcoming barriers to the use of evidence in policymaking
There is now a considerable body of knowledge on the barriers to
getting research and evaluation evidence into policy and practice, and
on how these barriers can be overcome. Some of the barriers have already
been referred to above--factors other than evidence, the uncertainty and
inconclusiveness of some research findings, different notions of
evidence, and the time-lag for research findings to percolate into
policy and practice. Other barriers include the presentational format of
research findings--which are often characterised as being too long, too
dense, too methodological, and too inaccessible--the lack of a clear
message, researchers' lack of familiarity with the policy process,
and policymakers' lack of familiarity with the research process
(Lavis et al., 2005).
The presentation of research findings can be enhanced by the use of
a 1:3:25 format (CHSRF, 2001). This consists of one page of main
messages, followed by a three-page executive summary, and the
presentation of findings in no more than 25 pages of writing in language
that is easy to comprehend by a non-research specialist. Lavis et al.
(2005) have referred to this as a 'graded entry' to the
available research evidence. The one page of main messages should not be
a summary of the main findings but should indicate what the main
messages are that decision-makers can take from the research. The
three-page executive summary consists of the main findings from the
research, presented succinctly to serve the needs of busy
decision-makers. There should be no details or discussion of the
methodology in an executive summary, other than a very brief statement
about which methods were used and which sections of the population were
included. The twenty-five page report should cover the background to the
research, the questions addressed, a brief outline of the methodology,
the findings, a discussion, and conclusions. Research and evaluation
reports should acknowledge explicitly the strength of the available
evidence, including the degree of uncertainty and contested knowledge.
Researchers' lack of familiarity with the policy process, and
policymakers' lack of familiarity with the research process have
been identified as a common problem in the use of research evidence in
policymaking (Amara et al., 2004; Lavis et al., 2005; Nutley et al.,
2007, Ouimet et al., 2009). The trust (and lack of trust) of
policymakers in researchers has also been identified as an important
factor in the use of research in policymaking (Lavis, et al., 2005).
Lavis et al. have noted that interactions between researchers and
policymakers increased the prospects for research use by policymakers.
Similarly, Lomas has concluded that "the clearest message from
evaluations of successful research utilization is that early and ongoing
involvement of relevant decision makers in the conceptualization and
conduct of a study is the best predictor of its utilization"
(Lomas, 2000, p. 141). Other research (Gabbay and Le May, 2004;
Greenhalgh et al., 2005; Best and Holmes, 2010) has identified the
importance of interpersonal networks and direct interactions between
researchers and policymakers as important factors in the use of evidence
in policymaking.
Summary
This paper has argued that evidence-based policy has clearly made a
worldwide impact, at least at the rhetorical and institutional levels,
and in terms of analytical activity. There is also evidence that the
machinery of government in the UK has been developed to increase the
capacity for evidence-based policymaking. This includes the development
of the analytical professions within government (economists, social
researchers, statisticians, operational researchers, information
specialists), evaluation guidance documents (the Green Book and the
Magenta Book), the Impact Assessment process, monitoring and evaluation
mechanisms, and the Comprehensive Spending Review process.
The role that evaluation evidence plays in policymaking can be
instrumental (direct), conceptual (indirect) or symbolic (i.e. using
research results to legitimate and sustain pre-determined positions).
Observers such as Lavis et al. (2003) and Amara et al. (2004) have
suggested that "the three types of research utilization must be
considered as complementary rather than as contradictory dimensions of
research utilization" (Amara, 2004, p. 79). The four examples
presented in this paper of how evidence has been used in policymaking
and public service delivery in the UK confirm these complementary uses
of evidence, and suggest that evidence can be used instrumentally,
conceptually and symbolically at different stages of the policy cycle
and under different policy and political circumstances. The fact that
evidence is not always used instrumentally, in the sense of "acting
on research results in specific, direct ways" (Lavis et al., 2003,
p. 228), does not mean that it has little or no influence. Nor does the
symbolic use of evidence always imply sinister or Machiavellian
practice. It may be quite reasonable to seek evidence to confirm or
justify a policy position to which there is already a political
commitment. This is surely better than proceeding on the basis of blind
faith and without any involvement with evidence. Using evidence
symbolically at least leaves open the prospect that new insights into
the nature of the policy issue, refinements of detail, and different
approaches to implementation and delivery may be forthcoming from such
an approach.
The broader question remains: what is the state of evidence-based
policy evaluation and its role in policy formation? The conclusion from
what has been presented in this paper is that the notion of
evidence-based policy and doing 'what works' is well
established internationally. The concept seems to have percolated into
the language of policymaking and governments worldwide. It is hard to
see how evidence can play a dominant role in policymaking given the role
of values, beliefs and ideology, the many other factors that influence
the policy process, and the different notions that policymakers and
researchers/evaluators often have about evidence. Probably the most
likely ways in which evaluation evidence can influence policymaking is
by integrating it with these other factors; by direct contact between
policymakers and researchers; by the use of interpersonal networks; and
by making research evidence more accessible by having a 'graded
entry' to its outputs.
As for the question of the role of policy evaluation in
recession--the overall theme of this Review--there can hardly be any
doubt that evidence-based policy and policy evaluation are more relevant
and more needed in recession times than ever before. Identifying which
policy interventions are effective, cost-effective, and cost-beneficial
in the most socially advantageous and fairly distributed ways, must be a
central principle of policymaking in times when resources are limited
and issues of social equity are acute. Establishing what is already
known about effective and efficient interventions, using systematic
reviews of evidence, meta-analyses and rapid evidence assessments (HM
Treasury 2011; GSR, 2009), would seem to be a priority. Where evidence
is not available from these sources, consensus conferences of academic
researchers, policymakers, substantive experts, and knowledge brokers
can be used to establish agreement on the best available evidence, and
on priorities for future research and evaluation. Further, at times of
economic recession and uncertainty, initiatives that are introduced
should be monitored and evaluated carefully in real time, with feedback
mechanisms being used to help policymakers make the most informed
decisions possible.
doi: 10.1177/002795011221900105
REFERENCES
Amara, N., Ouimet, M. and Landry, R. (2004), 'New evidence on
instrumental, conceptual, and symbolic utilization of university
research in government agencies', Science Communication, 26, pp.
76-106.
Balas, E.A. and Boren, S.A. (2000), 'Managing clinical
knowledge for health care improvement', in Yearbook of Medical
Informatics 2000: Patient-Centered Systems, Stuttgart, Germany,
Schattauer, pp. 65-70.
Barber, M. (2007), Instruction to Deliver. Tony Blair, the Public
Services and the Challenge of Achieving Targets, London, Methuen
Publishing Ltd.
Best, A. and Holmes, B. (2010), 'Systems thinking, knowledge
and action: towards better models and methods', Evidence and
Policy, 6, 2, pp. 145-59.
Beyer, J.M. (1997), 'Research utilization: bridging the gap
between communities', Journal of Management Inquiry, 6, 1, pp.
17-22.
BIS (2011a), Impact Assessment Guidance When to do an Impact
Assessment, London, Department of Business, Innovation and Skills,
available at: http://www.bis.gov.uk/assets/biscore/
better-regulation/docs/i/11-1111-impact-assessment- guidance.pdf.
--(2011b), IA Toolkit How to do an Impact Assessment, Department of
Business, Innovation and Skills, available at: http://www.bis.
cov.uk/assets/biscore/better-regulation/docs/i/11--1112-impact
-assessment-toolkit.pdf.
Bryson, A., Dorsett, R. and Purdon, S. (2002), The Use of
Propensity Score Matching in The Evaluation of Active Labour Market
Policies, Working Paper Number 4, London, Department for Work and
Pensions.
Cabinet Office (1999a), Modernising Government White Paper, London,
Cabinet Office.
--(1999b), Professional Policy Making for the Twenty-First Century,
London, Cabinet Office.
--(2000), Additing It Up: Improving Analysis and Modelling, London,
Cabinet Office.
--(2003), Adding It Up: Improving Analysis and Modelling in Central
Government, London, Cabinet Office.
Campbell, D. (2005), 'Getting a 'GRIPP' on the
research-policy interface in NSW', New South Wales Public Health
Bulletin, 16, 10, pp. 154-6.
Campbell, S., Benita, S., Coates, E., Davies, P. and Penn, G.
(2007), Analysis for Policy: Evidence-Based Policy in Practice, London,
Government Social Research Unit.
Chowdry, H. and Emmerson, C. (2010), An Efficient Maintenance
Allowance?, London, Institute for Fiscal Studies, available at:
http://www.ifs.org.uk/publications/5370.
CHSRF (2001), 'Communication notes: reader-friendly writing
--1:3:25', Ottawa, Canadian Health Services Research Foundation,
available at: http://www.chsrf.ca/knowledge_
transfer/communication_notes/comm_reader_friendly_ wwriting_e.php.
CIHR (2006), Evidence in Action, Acting on Evidence: A casebook of
health services and policy research knowledge translation stories,
Canadian Institute of Health Services and Policy Research, Ottawa,
Canada, available at: www.cihr-irsc.gc.ca/e/documents/
ihspr_ktcasebook_e.pdf.
Davies, P.T. (2004), 'Is Evidence-Based Government
Possible?', Jerry Lee Lecture to the 4th Annual Campbell
Collaboration Colloquium, Washington D.C., 19 February.
Dearden, L., Emmerson, C., Frayne, C. and Meghir, C. (2001),
Education Maintenance Allowance: The First Year--A Quantitative
Evaluation, London, Department for Education and Skills (now archived at
The National Archives website).
--(2008), 'Conditional cash transfers and school dropout
rates', The Journal of Human Resources, 44,4, pp. 827-57.
Dorsett, R., Campbell-Barr, V., Hamilton, G., Hoggart, L., Marsh,
A., Miller, C., Phillips, J., Ray, K., Riccio, J.A., Rich, S. and
Vegeris, S. (2007), Implementation and First-Year Impacts of The UK
Employment Retention and Advancement (ERA) Demonstration, Research
Report 412, London, Department for Work and Pensions.
Fiszbein, A. and Shady, N. (2009), Conditional Cash Transfers:
Reducing Present and Future Poverty, Washington D.C, The World Bank.
Gabbay, J. and Le May A. (2004), 'Evidence based guidelines or
collectively constructed "mindlines?" Ethnographic study of
knowledge management in primary care', British Medical Journal,
329, pp. 1013-8.
Greenhalgh, T., Robert, G., Bate, P., Macfarlane, F. and
Kyriakidou, O. (eds) (2005), Diffusion of Innovations in Health Service
Organisations: A Systematic Literature Review, Oxford, Blackwell
Publishing Ltd.
GSR (2009), Rapid Evidence Assessment Toolkit, London, Government
Social Research Service, available at: http://www.civilservice.
gov.uk/networks/gsr/resources-and-guidance/rapid-evidence-assessment.
Hendra, R., Riccio, J.A., Dorsett, R., Greenberg, D.H., Knight, G.,
Phillips, J, Robins, P.K., Vegeris, S., Walter. J., Hill, A., Ray, K.
and Smith, J. (2011), Breaking the Low-Pay, No-Pay Cycle: Final Evidence
from the UK Employment, Retention and Advancement (ERA) Demonstration,
London, Department for Work and Pensions.
HM Treasury (2003), The Green Book: Appraisal and Evaluation in
Central Government, London, HM Treasury.
--(2011), The Magenta Book: Guidance for Evaluation, London, HM
Treasury.
IFS (1999), Education Maintenance Allowance (EMA) Evaluation,
Institute for Fiscal Studies, London (summary available at: http://
www.ifs.org.uk/projects/98).
Jowell, R. (2003), The Role of 'Pilots' in Policy-Making.
Report of a Review of Government, Government Social Research Unit,
London, HM Treasury.
Lavis, J., Davies, H., Oxman, A., Denis, J-L., Golden-Biddle, K.
and Ferlie, E. (2005), 'Towards systematic reviews that inform
health care management and policy-making', Journal of Health
Services Research & Policy, 10, Supplement 1, pp. 35-48.
Lavis J.N., Robertson D, Woodside J.M., McLeod C.B., Abelson J. and
the Knowledge Transfer Study Group (2003), 'How can research
organizations more effectively transfer research knowledge to decision
makers?', The Milbank Quarterly, 81,2, pp. 221-48.
Leigh, A. (2009), 'What evidence should social policymakers
use?', Australian Treasury Economic Roundup, 1, pp. 27-43.
Lomas, J. (2000), 'Connecting research and policy',
Canadian Journal of Policy Research, Spring, pp. 140-4.
Lomas, J, Culyer, T., McCutcheon, C., McAuley, L. and Law, S.
(2005), Conceptualizing and Combining Evidence for Health System
Guidance: Final Report, Canadian Health Services Research Foundation,
Ottawa.
Marsh, D. (2006), 'Evidence-based policy: framework, results
and analysis from the New Zealand biotechnology', International
Journal of Biotechnology, 8, 3-4, pp. 206-24.
Martin, J.P. (2000), 'What works among active labour market
policies: evidence from OECD countries' experiences', OECD
Economic Studies, 30, pp. 80-113.
Milani, C.R.S. (2009), Evidence-Based Policy Research: Critical
Review of Some International Programmes on Relationships Between Social
Science Research and Policy-Making, Paris, France, UNESCO.
Mold, J.W. and Peterson, K.A. (2005), 'Primary care
practice-based research networks: working at the interface between
research and quality improvement', Annals of Family Medicine, 3, 1,
May/ June 2005, S12-S20.
NAO (2007), Evaluation of Regulatory Impact Assessments 2006-07,
London, National Audit Office.
--(2009), Delivering High Quality Impact Assessments, London,
National Audit Office.
--(2010), Assessing the Impact of Proposed New Policies, Report by
the Comptroller and Auditor General, HC 185 Session 2010-2011, London,
National Audit Office.
Nutley, S.M., Walter, I. and Davies H.T.O. (2007), Using Evidence:
How Research Can Inform Public Services, Bristol, Policy Press.
Obama, B.H. (2009), Inaugural Address, Washington DC, 20 January,
available at: http://www.whitehouse.gov/the-press-office/
president-barack-obamas-inaugural-address.
O'Connor, T. (2008), How The Prime Minister Monitors
Performance and Assesses Delivery, Presentation to GORSInduction, Tony
O'Connor CBE, Chief Operational Research Analyst, Prime
Minister's Delivery Unit, 8 May.
Office of the Presidency (2010), Guide To The Outcomes Approach,
Pretoria, Office of the Presidency of South Africa.
Ouimet, M., Landry, R., Ziam, S. and Bedard, P. (2009), 'The
absorption of research knowledge by public servants', Evidence and
Policy, 5, 4, pp. 331-50.
Purdon, S. (2002), Estimating The Impact of Labour Market
Programmes, Working Paper No. 3, London, Department for Work and
Pensions.
Quets, H., Robins, P.K., Pan, E.C., Michalopoulos, C. and Card, D.
(1999), Does SSP Plus Increase Employment? The Effect of Adding Services
to the Self Sufficiency Project's Financial Incentives, Ottawa,
Social Research Development Corporation.
Rosenbaum, P, and Rubin, D. (1983), 'The central role of the
propensity score in observational studies for causal effects',
Biometrika, 70, 1, pp. 41-55.
Spielhofer, T., Golden, S., Evans, K., Marshall, H., Mundy, E.,
Pomati, M. and Styles, B. (2010), Barriers to Participation in Education
and Training, Slough, NFER.
Topp L. and McKetin, R. (2003), 'Supporting evidence-based
policy making: A case study of the illicit drug reporting system in
Australia', Bulletin on Narcotics, 55, 1-2, pp. 23-30, United
Nations Office on Drugs and Crime, Vienna, Austria.
Townsend, T. and Kunimoto B. (2009), Collaboration and Culture. The
Future of the Policy Research Function in the Government of Canada,
Policy Research Initiative, Ottawa, Canada, March.
Weiss, C.H. (1982), 'Policy research in the context of diffuse
decision making', Journal of Higher Education, 53, 6, pp. 619-39.
Zussman, D. (2003), 'Evidence-based policy making: some
observations of recent Canadian experience', Social Policy Journal
of New Zealand, 20, pp. 64-71, June.
NOTES
(1) http://coexgov.securesites.net/index.php?keyword
=a432fbc34d71c7
(2) www.ahrq.gov/clinic/epc
(3) http://www.evidencebasedprograms.org/
(4) http://www.cochrane.org/
(5) http://www.campbellcollaboration.org/
(6) http://eppi.ioe.ac.uk/cms/
(7) http://www.dfid.gov.uk/r4d/
(8) http://www.ausaid.gov.au/
(9) http://www.3ieimpact.org?
(10) http://www.nice.org.uk/
(11) http://www.scie.org.uk/
(12) http://www.cebenetwork.org/
(13) For further discussion on the use of propensity score matching
in policy evaluation see Bryson et al. (2002) and Purdon (2002).
(14) The observations in this paragraph have been provided by
officials at the Department for Work and Pensions via personal
communication.
(15) ROAMEF is an acronym for Rationale, Objectives, Appraisal,
Monitoring, Evaluation and Feedback (HM Treasury, 2003, p. 3).
Philip Davies, Oxford Evidentia Limited. E-mail:
pdavies@oxev.co.uk.