文章基本信息

标题：The state of evidence-based policy evaluation and its role in policy formation.
作者：Davies, Philip
期刊名称：National Institute Economic Review
印刷版ISSN：0027-9501
出版年度：2012
期号：January
语种：English
出版社：National Institute of Economic and Social Research
摘要：Keywords: Evidence-based policy; policymaking; public service delivery; delivery trajectories
关键词：Economic reform;Policy sciences

The state of evidence-based policy evaluation and its role in policy formation.

Davies, Philip

This paper argues that evidence-based policy has clearly made a worldwide impact, at least at the rhetorical and institutional levels, and in terms of analytical activity. The paper then addresses whether or not evidence-based policy evaluation has had an impact on policy formation and public service delivery. The paper uses a model of research-use that suggests that evidence can be used in instrumental, conceptual and symbolic ways. Taking four examples of the use of evidence in the UK over the past decade, this paper argues that evidence can be used instrumentally, conceptually and symbolically in complementary ways at different stages of the policy cycle and under different policy and political circumstances. The fact that evidence is not always used instrumentally, in the sense of "acting on research results in specific, direct ways" (Lavis et al., 2003, p. 228), does not mean that it has little or no influence. The paper ends by considering some of the obstacles to getting research evidence into policy and practice, and how these obstacles might be overcome.

Keywords: Evidence-based policy; policymaking; public service delivery; delivery trajectories

JEL Classifications: H11; H43; 128; 138; J62

The rhetoric of evidence-based policy

Over the past decade or so, and in many countries, public policymaking has claimed to be 'evidence-based' and doing 'what works'. In the United Kingdom evidence-based policy was a key element of efforts to reform the machinery of government after 1997. The Modernising Government White Paper (Cabinet Office, 1999a), for instance, stated that government policy must be evidence-based, properly evaluated and based on best practice. Prime Minister Tony Blair confirmed his "Government's commitment to policy-making based on hard evidence and, as in education, or NHS reforms, or fighting crime, we must always be looking at the outcomes of policies--the benefits in people's lives--not the process" (Cabinet Office, 2000, p.3).

A report from the Cabinet Office Strategic Policy Making Team on Professional Policy Making for the Twenty-First Century also suggested that "policy making must be soundly based on evidence of what works" and that "government departments must improve their capacity to make use of evidence" (Cabinet Office, 1999b, p.40). This approach to policymaking called for a greater use of evaluation of policies ex ante and post hoc and, consequently, a greater use of monitoring the roll out of policies and the delivery of public services (Barber, 2007).

Yet another report from the Cabinet Office in 2000, titled Adding It Up, clearly recognised the need for high quality analysis and evaluation in government, whilst acknowledging a sometimes limited demand:

"The government is fully committed to the principle that policies should be based on evidence. This means policy should be supported by good analysis and, where appropriate, modelling. This study has found, however, that demand for good analysis is not fully integrated in the culture of central Government."

Cabinet Office, 2000, p. 12

The rhetoric of evidence-based policymaking has continued with the UK's Coalition Government. In a speech to the Annual Leadership Conference of the National College for School Leadership, titled Seizing Success 2010, the Secretary of State for Education, Michel Gove, suggested that:

"Indeed I want to see more data generated by the profession to show what works, clearer information about teaching techniques that get results, more rigorous, scientifically-robust research about pedagogies which succeed and proper independent evaluations of interventions which have run their course. We need more evidence-based policy making, and for that to work we need more evidence."

The evidence-based policy agenda can be found worldwide. In his Inaugural Address as the 44th President of the United States of America, Barack Obama told the American people that his Administration would be based:

"not [on] whether our government is too big or too small, but whether it works, whether it helps families find jobs at a decent wage, care they can afford, a retirement that is dignified. Where the answer is yes, we intend to move forward. Where the answer is no, programs will end"

Obama, 2009

President Obama's mission for US policymaking is supported by resources in America such as the Coalition for Evidence-Based Policy Making, (1) Evidence-Based Practice Centers, (2) and Social Programs That Work. (3) Evidence-based policy is similarly promoted in Australia (Campbell, 2005; Leigh, 2009; Topp and McKetin, 2003), Canada (Zussman, 2003; Lomas, et al., 2005; CIHR, 2006; Townsend and Kunimoto, 2009), New Zealand (Marsh, 2006), South Africa (Office of the Presidency, 2010), and by international organisations such as the OECD (Martin, 2000), UNESCO (Milani, 2009), and the World Bank (Fiszbein and Shady, 2009). Organisations that undertake systematic reviews of evidence (e.g. the Cochrane Collaboration, (4) the Campbell Collaboration, (5) the Eppi-Centre, (6) DfID, (7) AusAID, (8) 3ie (9)), and those that provide evidence-based guidance for public service professionals and users of public services (e.g. the National Institute for Health and Clinical Excellence, (10) the Social Care Institute of Excellence, (11) Coalition for Evidence-Based Education (12)) all add to the global availability of high quality evidence for policymaking and the provision of public services.

Evidence-based policy, then, has clearly made a worldwide impact, at least at the rhetorical and organisational levels and in terms of analytical activity. The question that this paper addresses is whether or not evidence-based policy evaluation has had an impact on policy formation and public service delivery.

How evidence influences policy and practice

The fundamental principle of evidence-based policy is beguilingly simple. It helps policymakers make better decisions, and achieve better outcomes, by using existing evidence more effectively, and undertaking new research, evaluation and analysis where knowledge about effective policy initiatives and policy implementation is lacking. This relatively straightforward principle, however, is not without problems when applied to the realities of policymaking.

First, there are many factors other than evidence that influence policymaking (Davies, 2004). These include the role of values, beliefs and ideology, which are the driving forces of most policymaking processes, as well as the experience, expertise and judgement of policymakers. The availability of resources, a bureaucratic culture, the role of lobbyists and pressure groups, and the need to respond quickly to everyday contingencies all contribute to policymaking in addition to evaluation evidence and analysis. For evaluation evidence to be effective in policymaking, one has to find ways of integrating such evidence with these many other factors.

Second, evidence is seldom self-evident or definitive. By itself evidence does not tell users what to do, or how to act. It merely provides a basis upon which decisionmakers can make informed judgements about the likely effect or impact of an intervention, or about the conditions under which a desired effect is likely to be achieved or not achieved. Research evidence, like all scientific evidence, is probabilistic and carries some degree of uncertainty. That uncertainty can be better understood, and sometimes reduced, by formative evaluation that explores how, why, for whom, and under what conditions an intervention is likely to achieve its desired effects. Hence, evidence-based policy requires impact and formative evaluation using qualitative and quantitative methods, under experimental/quasi-experimental and naturalistic conditions (Davies, 2004; HM Treasury, 2011).

Third, researchers and policymakers often have different notions of evidence and different absorptive capacity to seek and use evidence. Lomas et al. (2005) found that whereas policymakers in Canada "view evidence colloquially ("anything that establishes a fact or gives reason for believing something") and define it by its relevance, most researchers view evidence scientifically (the use of systematic, replicable methods for production) and define it by its methodology" (Lomas et al., 2005, p.1). A similar study of civil servants in Whitehall (Campbell et al., 2007) found that these policymakers wanted evidence that focused on the 'end product', rather than on how the information was either collected or analysed. These Whitehall civil servants also valued anecdotal evidence, and evidence that draws upon "'real life stories', 'fingers in the wind', 'local' and 'bottom up' evidence" (Campbell et al., 2005, p.21). Ouimet et al. (2009) found that the 'absorptive capacity' of civil servants to seek and use research evidence depended on their physical and cognitive access to research (their scientific literacy), their educational backgrounds, and the direct access they have to academic researchers. Given these different notions, expectations and experiences of evidence, it is not surprising that the role of evaluation evidence in policy formation and delivery is not straightforward or assured.

Fourth, the impact of research, evaluation and analysis is seldom direct or immediate. As Carol Weiss notes "cases of immediate and direct influence of research findings on specific policy decisions are not frequent" (Weiss, 1982, p. 620). Weiss also notes that:

"rarely does research supply an "answer" that policy actors employ to solve a policy problem. Rather, research provides a background of data, empirical generalisations, and ideas that affect the way that policy makers think about a problem."

Weiss (1982), pp. 620-1

Weiss goes on to suggest that "to acknowledge this is not the same as saying that research findings have little influence on policy". For Weiss, research, evaluation and analysis influence policymakers':

"conceptualizations of the issues with which they deal, affects those facets of the issue they consider inevitable and unchangeable and those they perceive as amenable to policy action; widens the range of options that they consider, and challenges taken-for-granted assumptions about appropriate goals and appropriate activities ... ideas from research are picked up in diverse ways and percolate through to officeholders"

op cit., p. 622

This percolation process can take a long time for research and evaluation evidence to impact on policy and practice. Drawing on the work of Balas and Boren (2000), Mold and Peterson (2005) have estimated that in the case of medical knowledge "it takes an average of 17 years to turn 14 per cent of original research findings into changes in care that benefits patients" (Mold and Peterson, 2005, S14). One can only assume that in substantive policy areas that have a shorter history and tradition of evidence-based policy and practice than medicine, the time-lag for the percolation of evidence may be even longer.

This time lag between gathering high quality evidence and getting it into policy and practice is often seen as another factor working against evidence-based policy. Policymaking usually takes place in time periods of weeks and months, whereas high quality evidence gathering usually requires many months and years. The challenge for researchers and analysts is to identify and provide the best available evidence in the time available to inform the contemporary policymaking process, whilst also developing a more robust evidence base for future policymaking in the medium to longer term. The development of strategic policymaking teams within many governments, which seek to identify the policy needs of their countries in five-year, ten-year, fifteen-year and even longer future time periods, provides an opportunity for researchers and policy teams to work together to build a medium- to longer-term evidence base that is sound and robust. The timing of evidence gathering and policymaking, whilst clearly a major challenge, need not preclude research-based evidence contributing to policy and practice, providing one distinguishes between the operational (day-to-day) and strategic (medium- to long-term) use of evidence.

Ways of using research and evaluation in policymaking

Lavis et al. (2003), drawing on the work of Beyer (1997), have noted that research knowledge may be used in instrumental, conceptual and symbolic ways. Instrumental use involves "acting on research results in specific, direct ways", whereas conceptual use involves "using research results for general enlightenment; results influence actions, but in less specific, more indirect ways than in instrumental use" (Lavis et al., 2003, p. 228). Symbolic use is more about "using research results to legitimate and sustain pre-determined positions" (ibid). Amara et al. (2004) have suggested that "the three types of research utilization must be considered as complementary rather than as contradictory dimensions of research utilization" (Amara, 2004, p. 79). These authors have examined empirically the instrumental, conceptual and symbolic uses of research evidence in Canadian federal and provincial governments, and found that:

"conceptual use of research is more frequent than instrumental use. More precisely, the conceptual use of research is more important in the day-to-day professional activity of professionals and managers in government agencies than symbolic utilization, which, in turn, is more important than instrumental utilization".

Amara, 2004, p. 98

The remainder of this paper will present some examples of how evaluation evidence has been used in policy formation and the delivery of public services in the UK and other countries. It will be argued that instrumental, conceptual and symbolic uses of evidence are not mutually exclusive, but can operate in different ways at different stages in the policy cycle and under different political contexts.

The Educational Maintenance Allowance

The evaluation of the Educational Maintenance Allowance (EMA) in England (Dearden et al., 2001) is one example of how evaluation evidence was undertaken to test the likely effectiveness and cost-effectiveness of a major policy initiative before it was rolled out nationally. The subsequent policy was closely based on the findings of this evaluation and, therefore, can be seen as an example of the instrumental use of evaluation evidence in policymaking.

The EMA has been described as "a conditional cash transfer, the aim of which is to decrease dropout rates in the transition from compulsory to post-compulsory education in the UK" (IFS, 1999). The EMA evaluation tested four variants of a means-tested conditional cash transfer paid to 16-18-year olds for staying in full-time education. The variants consisted of two levels of payment (30 [pounds sterling] and 40 [pounds sterling]) to either the young person or a primary carer (usually the mother), combined with different levels of a retention bonus (50 [pounds sterling] and 80 [pounds sterling]) and an achievement bonus (50 [pounds sterling] and 140 [pounds sterling]). The evaluation was undertaken amongst male and female young people, and in both urban and rural areas. A comparison group, against which outcomes of those eligible for EMA could be assessed, was identified using propensity score matching (Rosenbaum and Rubin, 1983; Dearden et al., 2008). (13)

Dearden et al. (2008) reported a substantial impact of the cash transfers, ranging from a 4.5 per cent increase (over the comparison group) in full-time education participation in the first year to a 6.7 per cent increase in the second year. Chowdry and Emmerson (2010) have argued that "based on these impacts, and on estimates of the financial benefits of additional education taken from elsewhere in the economics literature, .... the costs of providing EMA were likely to be exceeded in the long run by the higher wages that its recipients would go on to enjoy in future" (Chowdry and Emmerson, 2010, p. 1).

Notwithstanding the clear impacts and benefits of the EMA, the programme was withdrawn by the Coalition Government following the 2010 Spending Review. The rationale for ending the EMA, according to Chowdry and Emmerson (2010), was based on the findings of a survey of 16-17-year olds for the Department for Children Schools and Families (Spielhofer et al., 2010). This suggested that "only 12 per cent of young people overall receiving an EMA believe that they would not have participated in the courses they are doing if they had not received an EMA" (Spielhofer et al., 2010, p. 7). The Coalition Government inferred from this "that the EMA policy carries a 'deadweight' of 88 per cent, i.e. 88 out of every 100 students receiving EMA would still have been in education if EMA did not exist and are therefore being paid to do something they would have done anyway" (Chowdry and Emmerson, 2010, p. 1). In turn, Chowdry and Emmerson have argued that the cost-benefit analysis undertaken by the Dearden et al. (2008) evaluation "suggests that even taking into account the level of deadweight that was found, the costs of EMA are completely offset by the beneficial effect of the spending on those whose behaviour was affected" (Chowdry and Emmerson, 2010, p. 1). Chowdry and Emmerson also point out that the EMA may have had other benefits, such as better school attendance, more study time, and "the transfer of resources to low-income households with children, which may in its own right represent a valuable policy objective" (ibid).

This differential use and interpretation of evidence to support, and later withdraw, the EMA illustrates the point made above about research evidence being used instrumentally, conceptually and symbolically at different stages of the policy cycle and under different political circumstances. It also demonstrates that alternative sources of evidence can be used to justify a policy decision, and that factors other than evidence (values, beliefs, ideology, resources, judgement), play a significant role in policymaking.

The employment retention and advancement demonstration

A major policy evaluation that started off with the suggestion of being instrumental in policymaking, but ended up being more of a conceptual use of evidence, is the Employment Retention and Advancement (ERA) Demonstration project. This demonstration project was undertaken across the UK between 2003 and 2011 to test the likely impact and cost-effectiveness of a combination of inputs (a post-employment adviser service, cash rewards for staying in work and for completing training, and in-work training support) for low paid workers and the long-term unemployed. Over 16,000 people from six regions of Britain were randomly allocated to the ERA programme, or a business-as-usual control group (the counterfactual).

Although the evaluation took seven years to complete, there were milestone data in real time from one year following the beginning of the project. This means that the ERA demonstration project could have informed the development of welfare-to-work policies in an instrumental way from 2004 onwards. Such premature use of evidence was sensibly avoided, and the first-year impacts of the ERA initiatives were initially reported in 2007 (Dorsett et al., 2007). These reported substantial and statistically significant increases in earnings and employment retention for one of the lone parent target groups (the New Deal for Lone Parents [NDLP] group), but a lesser impact on earnings for the Working Tax Credit [WTC] group. The first year impacts on the earnings of the New Deal 25 Plus [ND25+] group, however, "were smaller, more mixed, and less certain" (Dorsett et al., 2007, p. 10) than for the lone parent groups.

These impacts, however, were reversed by the rime of the final report on the ERA evaluation in 2011 (Hendra et al., 2011). This reported that for the NDLP and WTC groups the early effects gained from the proportion of participants who worked full time (at least 30 hours per week) "generally faded in the later years, after the programme ended ... [and] from a cost-benefit perspective, ERA did not produce encouraging results for the lone parent groups, with the exception of the NDLP better-educated subgroup" (Hendra et al., 2011, p. 10). For the long-term unemployed participants (mostly men) in the ND25+ group, however, the longer-term impacts were more positive in that:

ERA produced modest but sustained increases in employment and substantial and sustained increases in earnings. These positive effects emerged after the first year and were still evident at the end of the follow-up period. The earnings gains were accompanied by lasting reductions in benefits receipt over the five-year follow-up period. ERA proved cost-effective.

Hendra et al., 2011, pp. 10-11

The ERA evaluation did not have an immediate or direct effect on welfare-to-work policies in the sense of rolling out nationally a discrete set of retention and advancement initiatives for low-income and long-term unemployed people. Hence, it does not provide an example of instrumental use of evidence-based policymaking. It has, however, had other effects in terms of informing and enlightening policymaking on welfare-to-work issues (i.e. a conceptual use of evidence).

First, as has been noted above, the ERA evaluation demonstrated that a policy, or a set of policy initiatives, can have heterogeneous effects across client groups. Whereas the combination of financial incentives and post-employment support had generally positive outcomes for the ND25+ group of clients, "over five years, ERA in the UK had no lasting overall effects for lone parents in the New Deal for Lone Parents (NDLP) and Working Tax Credit (WTC) target groups" (Hendra et al., 2011, p. 232). Also, not all of the models of implementation were successful (ibid). Such findings about the heterogeneous impacts of interventions are invaluable for policymaking purposes. Part of the value of evaluation in policymaking is that it allows negative consequences and unsuccessful implementation and delivery approaches to be avoided. The Final Report on the ERA Demonstration acknowledged this by noting that "had the Government invested in ERA as a full-scale national policy without having mounted this rigorous test of its effectiveness in advance, that investment would not have achieved all the hoped-for positive results" (Hendra, 2011, p. 248).

Second, although the ERA Demonstration did not result in a direct national roll-out of all of the initiatives that it tested, it would be wrong to conclude that it has not had some influence on welfare-to-work policy in the UK. In developing the Coalition Government's Work Programme, evidence from the ERA Demonstration was used by the Department for Work and Pensions for developing sustainability outcome measures. Also, lessons learned from ERA Demonstration were shared with a number of officials and contracted service providers involved with the Work Programme. ERA evidence has also helped inform a number of policy initiatives for lone parents, such as the in-work emergency discretion fund and in-work credit for lone parents. (14)

Third, the ERA evaluation showed that immediate and early impacts of a policy may not be sustained over time and, consequently, may provide imprecise evidence. A similar finding was made by the evaluation of the Self-Sufficiency Project (SSP) in Canada, in which initial positive impacts of financial incentives and post-employment support were not sustained beyond the fifth quarter follow-up (Quets et al., 1999). It is also important to establish that positive effects on certain outcomes are also evident on other important outcomes. A major review of experimental and quasi-experimental evaluations of conditional cash transfers in a range of countries in Africa, Asia, Latin and South America, and Eastern Europe undertaken by the World Bank (Fiszbein and Shady, 2009) showed that whereas conditional cash transfers had generally positive short-term effects in terms of getting children to attend school and health centres for immunisation, evidence of the longer-term outcomes in terms of improved educational achievement and health status was less apparent. In this respect the lessons learned from this, and the ERA evaluations, are that evidence-based policymaking requires sustained monitoring and evaluation over time, using outcome measures that have internal validity (known bias) and external validity (i.e. have 'real world' relevance).

Fourth, such monitoring and evaluation must include formative/process approaches as well as impact methods. Some of the most interesting and valuable evidence from the ERA evaluation was about the challenges of implementing, developing and sustaining the ERA's initiatives in different regions and contexts, and how these challenges were overcome. This required a multi-method evaluation "including in-depth qualitative interviews with programme staff and participants; three waves of survey interviews with programme and control group respondents (at 12, 24, and 60 months after random assignment); (14) and administrative data on participants' employment, earnings, and benefits receipt" (Hendra et al., 2011, p. 15).

Fifth, the ERA evaluation contributed to evidence-based policymaking by testing in a UK context interventions that had generally been proven to be effective in the USA and Canada. Evidence does not always 'travel' well. The US and Canadian labour markets, welfare systems, and their socio-demographic and cultural features are generally very different from those in the UK. Furthermore, there are differences on these variables within the UK. Policies that have been shown to be effective in one or more countries, or in some parts of countries, may not have the same outcomes elsewhere. Hence the need for the ERA evaluation to have tested the effectiveness of post-employment welfare policies in, and within, the UK. Introducing such policies without impact and formative evaluation runs a high risk of policy failure and misplaced resources.

Sixth, much of the evidence on the use of personal employment advisers, cash transfers/incentives, and training support has been in pre-employment contexts. The ERA evaluation has provided valuable evidence on the implementation and effectiveness of the policy initiatives post-employment. This point has been acknowledged in the Final Report on the ERA Demonstration where the authors note that "little of the [existing] evidence came from interventions that included extensive job coaching and advancement support after people began working. Consequently, ERA, like similar demonstration programmes in the US, was charting new territory" (Hendra et al., 2011, p. 232).

Lastly, the ERA evaluation was the first major demonstration project of its kind and magnitude in the UK. Unlike the many other policy pilots that are undertaken in the UK, in which a policy commitment has already been made (Jowell, 2003), the ERA Demonstration "tested an idea that the Government had not yet committed to incorporating into national policy" (Hendra et al., 2011, p. 248). Furthermore, the Department for Work and Pensions and HM Treasury committed the ERA to a five-year follow-up period, thereby moving away from the short-termism of most policymaking and policy evaluation. Another important aspect of the ERA as a demonstration is that it used a random allocation design on a very large sample of the population in six regions of Britain. To this extent it was also demonstrating that a large-scale randomised controlled evaluation of a major policy initiative could be undertaken in the UK, alongside other 'mixed methods' of evaluation--something that was clearly achieved with considerable success.

Impact assessments

Impact Assessments are an evidence-based tool of policymaking that have become institutionalised in the UK in the sense that they are a required part of the policymaking process whenever a policy initiative imposes or reduces costs, a new information obligation, administrative burdens, redistribution, regulatory change, or involves a European Union directive (BIS, 2011a, p. 8). Impact assessments are a structured way of gathering evidence to establish the economic, social, environmental and regulatory impacts on business, the third sector and the public sector. The impacts that have to be assessed in UK policymaking are summarised in figure 1. The Department of Business, Innovation and Skills (BIS) has described impact assessments as a tool "to help policymakers to fully think through the reasons for government intervention, to weigh up various options for achieving an objective and to understand the consequences of a proposed intervention" (BIS, 2011a, p. 4).

UK impact assessments require policymakers to identify and appraise viable options that will achieve the policy objective, including the 'do minimum/do nothing' option (usually the 'business-as-usual' option).The appraisal process required by impact assessments mirrors HM Treasury's ROAMEF (15) process, and consists of six stages: Development, Options, Consultation, Final Proposal, Enactment and Review (see figure 2). At each stage of the impact assessment process evidence must be gathered and critically appraised for quality, cost-effectiveness and cost benefit. Hence, the impact assessment process can involve a great deal of evidence gathering and analysis, though in the spirit of the Coalition Government's 'small government' agenda BIS now calls for "proportionality of analysis", which it defines as using "the appropriate level of resources to invest in gathering and analysing data for appraisals and evaluations" (BIS, 2011b, p. 8). The depth of analysis required by an impact assessment is seen by BIS as increasing from 'minimal' during the early stages--"the identification winners and losers" and the "full description of costs and benefits" (BIS, 2011a, p. 8)--to a much greater degree of detail at the final stage of "fully monetizing the costs and benefits".

Figure 1 Impacts assessments (UK): a summary of the
impacts to be assessed

* Total costs and benefits of options

* Geographical coverage (within the UK)

* Enforcement arrangements

* More than minimum EU requirements

* The value of the proposed offsetting measure per year

* Hampton Principles (a)

* Economic impacts: the impact on competition and on small
firms

* Environmental impacts: greenhouse gases/wider environment,
sustainable development

* Social impacts: health and well-being; human rights; justice
systems; rural proofing

* Statutory equality: impacts on race, gender, disability, sexual
orientation

Note: (a) The Hampton Principles set out "how to reduce
unnecessary administration for businesses, without compromising
the UK's excellent regulatory regime" (see BIS, 2011b, p. 26).

Impact assessments, then, have the potential to use evidence instrumentally--i.e. to determine the most cost-effective way of achieving a policy objective or the most cost beneficial way of using available resources--and/or conceptually in the sense of generating insight about the likely regulatory, economic, social and environmental consequences of a policy.

Impact assessments can also involve the symbolic use of evidence. The National Audit Office (NAO) reviews impact assessment periodically and has repeatedly found that the level of analysis in impact assessments is weak, particularly the quality of economic analysis (NAO, 2007). The range of policy options considered by many impact assessments is also limited (NAO, 2009). In its 2009 Report the NAO found that "only 20 per cent [of impact assessments] presented the results of an evaluation of a range of options" and "that the introduction of the summary sheet, which has improved clarity and consistency, has encouraged a "tick box" approach rather than making an assessment of the costs and benefits of different options integral to policy formation" (NAO, 2009, p. 15). The most recent NAO review of impact assessments found that "in nearly two thirds of final Impact Assessments in our sample, however, different options were not well explored or summarized", and that "overall, 42 per cent of the Impact Assessments we reviewed had at no time considered more than one option in addition to the 'do nothing' option" (NAO, 2010, p. 14). This suggests that by supporting the preferred policy option, rather than genuinely seeking the most effective and cost-beneficial options, impact assessments sometimes use evidence symbolically to justify pre-determined positions.

[FIGURE 2 OMITTED]

Delivery trajectories

As has been noted above, researchers and policymakers often have different notions of evidence. Whereas researchers see evidence as being theoretically grounded, empirically proven, and meeting scientific standards of internal validity and adequacy of reporting, policymakers often have a more utilitarian and problem-solving approach to evidence (Lomas et al., 2005; Campbell et al., 2007). The use of monitoring to gather real-time evidence of goal attainment, target achievement, and success or failure of public service delivery is a major feature of policymaking in many countries. This may not meet many analysts' notions of evidence, and it is undoubtedly a performance management tool, but it is seen and used as an evidence-based approach by many governments.

Delivery trajectories provide a visual representation of the actual delivery of a service compared with the expected performance towards a set goal or target. Figure 3 is a hypothetical representation of the delivery of anti-retroviral (ARV) drugs in two health service areas for people who have HIV/AIDs. The dotted line in the middle of figure 3 represents the 'ideal' trend line that would deliver the target of 95 per cent of people with HIV/AIDS receiving (ARVs) by the end of 2011. The recent historical performance in Areas A and B, in terms of delivering ARVs, is represented by the two lines to the left of the baseline. These indicate a fairly flat delivery trajectory followed by some improvement in Area A, and a somewhat erratic and declining performance in Area B. Actual delivery trajectories following the baseline (Quarter 1 2007) have been plotted. The delivery of ARVs in Area B is clearly below both the trend line and that of Area A. From a performance monitoring and management perspective the flat lining of delivery in Area B at consecutive quarterly data points 1, 2 and 3 might cause concern and warrant a policy review (sometimes called a delivery review).

[FIGURE 3 OMITTED]

A priority review is "a rapid analysis of the state of delivery of a high priority strategy and identification of the action needed to strengthen delivery" (O'Connor, 2008). It is undertaken by a team of people with mixed expertise and skills, as well as the key agencies responsible for frontline delivery of services. Each priority review involves intensive evidence gathering in the area where delivery is failing, and seeks to identify problems and weaknesses in the delivery chain that require remedy. It might also include an analysis of delivery in a successful area (such as Area A in figure 3) to identify procedures, activities, people and agencies that could assist the underperforming area. The analysis undertaken during a priority review is "firmly rooted in evidence and triangulates existing evaluations, data and evidence from reviews" (O'Connor, 2008). A priority review also requires an analysis of existing and new policy initiatives, including public expenditure commitments and changes, in order to establish artefacts external to the local delivery context that might account for poor performance and/or improvement. The main outcome of a priority review is a prioritised action plan for strengthening delivery, which is followed up in a planned and timely manner. In the hypothetical example in figure 3 the policy review seems to have been effective in terms of improving the delivery of ARVs in Area B, and establishing a more sustained pattern of delivery for the future.

In the UK the Prime Minister's Delivery Unit (PMDU) worked with a series of prioritised delivery targets that had been established in 2001 as part of the Performance Management and Comprehensive Spending Review regimes of the Labour Government (Barber, 2007; O'Connor, 2008). PMDU developed a range of delivery trajectories, similar to that presented in figure 3, and these allowed central government departments and local agencies responsible for public service delivery to identify when service delivery fell short of the expected performance. The evidence provided by such data was then used to decide when underperformance was more than just a temporary blip, and when it required more detailed attention, analysis and action in the form of a policy review.

This approach to using evidence to monitor and manage delivery had some encouraging results (O'Connor, 2008). Waiting times for surgery, waiting times to be seen by a doctor in accident and emergency departments, school attendance and attainment, and train punctuality all improved during this time. Some (though not all) of this improvement can be attributed to the use of delivery trajectories, priority reviews and monitoring and evaluation evidence. The coalition government in the UK has abandoned these methods of using evidence to monitor and manage public service delivery, and the machinery of government that supported them has also been dismantled. The Department of Performance Monitoring and Evaluation (DPME) in the government of South Africa, however, is currently piloting similar methods to monitor and manage key public service delivery as part of that government's Outcomes Approach (Office of the Presidency, 2010).

Overcoming barriers to the use of evidence in policymaking

There is now a considerable body of knowledge on the barriers to getting research and evaluation evidence into policy and practice, and on how these barriers can be overcome. Some of the barriers have already been referred to above--factors other than evidence, the uncertainty and inconclusiveness of some research findings, different notions of evidence, and the time-lag for research findings to percolate into policy and practice. Other barriers include the presentational format of research findings--which are often characterised as being too long, too dense, too methodological, and too inaccessible--the lack of a clear message, researchers' lack of familiarity with the policy process, and policymakers' lack of familiarity with the research process (Lavis et al., 2005).

The presentation of research findings can be enhanced by the use of a 1:3:25 format (CHSRF, 2001). This consists of one page of main messages, followed by a three-page executive summary, and the presentation of findings in no more than 25 pages of writing in language that is easy to comprehend by a non-research specialist. Lavis et al. (2005) have referred to this as a 'graded entry' to the available research evidence. The one page of main messages should not be a summary of the main findings but should indicate what the main messages are that decision-makers can take from the research. The three-page executive summary consists of the main findings from the research, presented succinctly to serve the needs of busy decision-makers. There should be no details or discussion of the methodology in an executive summary, other than a very brief statement about which methods were used and which sections of the population were included. The twenty-five page report should cover the background to the research, the questions addressed, a brief outline of the methodology, the findings, a discussion, and conclusions. Research and evaluation reports should acknowledge explicitly the strength of the available evidence, including the degree of uncertainty and contested knowledge.

Researchers' lack of familiarity with the policy process, and policymakers' lack of familiarity with the research process have been identified as a common problem in the use of research evidence in policymaking (Amara et al., 2004; Lavis et al., 2005; Nutley et al., 2007, Ouimet et al., 2009). The trust (and lack of trust) of policymakers in researchers has also been identified as an important factor in the use of research in policymaking (Lavis, et al., 2005). Lavis et al. have noted that interactions between researchers and policymakers increased the prospects for research use by policymakers. Similarly, Lomas has concluded that "the clearest message from evaluations of successful research utilization is that early and ongoing involvement of relevant decision makers in the conceptualization and conduct of a study is the best predictor of its utilization" (Lomas, 2000, p. 141). Other research (Gabbay and Le May, 2004; Greenhalgh et al., 2005; Best and Holmes, 2010) has identified the importance of interpersonal networks and direct interactions between researchers and policymakers as important factors in the use of evidence in policymaking.

Summary

This paper has argued that evidence-based policy has clearly made a worldwide impact, at least at the rhetorical and institutional levels, and in terms of analytical activity. There is also evidence that the machinery of government in the UK has been developed to increase the capacity for evidence-based policymaking. This includes the development of the analytical professions within government (economists, social researchers, statisticians, operational researchers, information specialists), evaluation guidance documents (the Green Book and the Magenta Book), the Impact Assessment process, monitoring and evaluation mechanisms, and the Comprehensive Spending Review process.

The role that evaluation evidence plays in policymaking can be instrumental (direct), conceptual (indirect) or symbolic (i.e. using research results to legitimate and sustain pre-determined positions). Observers such as Lavis et al. (2003) and Amara et al. (2004) have suggested that "the three types of research utilization must be considered as complementary rather than as contradictory dimensions of research utilization" (Amara, 2004, p. 79). The four examples presented in this paper of how evidence has been used in policymaking and public service delivery in the UK confirm these complementary uses of evidence, and suggest that evidence can be used instrumentally, conceptually and symbolically at different stages of the policy cycle and under different policy and political circumstances. The fact that evidence is not always used instrumentally, in the sense of "acting on research results in specific, direct ways" (Lavis et al., 2003, p. 228), does not mean that it has little or no influence. Nor does the symbolic use of evidence always imply sinister or Machiavellian practice. It may be quite reasonable to seek evidence to confirm or justify a policy position to which there is already a political commitment. This is surely better than proceeding on the basis of blind faith and without any involvement with evidence. Using evidence symbolically at least leaves open the prospect that new insights into the nature of the policy issue, refinements of detail, and different approaches to implementation and delivery may be forthcoming from such an approach.

The broader question remains: what is the state of evidence-based policy evaluation and its role in policy formation? The conclusion from what has been presented in this paper is that the notion of evidence-based policy and doing 'what works' is well established internationally. The concept seems to have percolated into the language of policymaking and governments worldwide. It is hard to see how evidence can play a dominant role in policymaking given the role of values, beliefs and ideology, the many other factors that influence the policy process, and the different notions that policymakers and researchers/evaluators often have about evidence. Probably the most likely ways in which evaluation evidence can influence policymaking is by integrating it with these other factors; by direct contact between policymakers and researchers; by the use of interpersonal networks; and by making research evidence more accessible by having a 'graded entry' to its outputs.

As for the question of the role of policy evaluation in recession--the overall theme of this Review--there can hardly be any doubt that evidence-based policy and policy evaluation are more relevant and more needed in recession times than ever before. Identifying which policy interventions are effective, cost-effective, and cost-beneficial in the most socially advantageous and fairly distributed ways, must be a central principle of policymaking in times when resources are limited and issues of social equity are acute. Establishing what is already known about effective and efficient interventions, using systematic reviews of evidence, meta-analyses and rapid evidence assessments (HM Treasury 2011; GSR, 2009), would seem to be a priority. Where evidence is not available from these sources, consensus conferences of academic researchers, policymakers, substantive experts, and knowledge brokers can be used to establish agreement on the best available evidence, and on priorities for future research and evaluation. Further, at times of economic recession and uncertainty, initiatives that are introduced should be monitored and evaluated carefully in real time, with feedback mechanisms being used to help policymakers make the most informed decisions possible.

doi: 10.1177/002795011221900105

REFERENCES

Amara, N., Ouimet, M. and Landry, R. (2004), 'New evidence on instrumental, conceptual, and symbolic utilization of university research in government agencies', Science Communication, 26, pp. 76-106.

Balas, E.A. and Boren, S.A. (2000), 'Managing clinical knowledge for health care improvement', in Yearbook of Medical Informatics 2000: Patient-Centered Systems, Stuttgart, Germany, Schattauer, pp. 65-70.

Barber, M. (2007), Instruction to Deliver. Tony Blair, the Public Services and the Challenge of Achieving Targets, London, Methuen Publishing Ltd.

Best, A. and Holmes, B. (2010), 'Systems thinking, knowledge and action: towards better models and methods', Evidence and Policy, 6, 2, pp. 145-59.

Beyer, J.M. (1997), 'Research utilization: bridging the gap between communities', Journal of Management Inquiry, 6, 1, pp. 17-22.

BIS (2011a), Impact Assessment Guidance When to do an Impact Assessment, London, Department of Business, Innovation and Skills, available at: http://www.bis.gov.uk/assets/biscore/ better-regulation/docs/i/11-1111-impact-assessment- guidance.pdf.

--(2011b), IA Toolkit How to do an Impact Assessment, Department of Business, Innovation and Skills, available at: http://www.bis. cov.uk/assets/biscore/better-regulation/docs/i/11--1112-impact -assessment-toolkit.pdf.

Bryson, A., Dorsett, R. and Purdon, S. (2002), The Use of Propensity Score Matching in The Evaluation of Active Labour Market Policies, Working Paper Number 4, London, Department for Work and Pensions.

Cabinet Office (1999a), Modernising Government White Paper, London, Cabinet Office.

--(1999b), Professional Policy Making for the Twenty-First Century, London, Cabinet Office.

--(2000), Additing It Up: Improving Analysis and Modelling, London, Cabinet Office.

--(2003), Adding It Up: Improving Analysis and Modelling in Central Government, London, Cabinet Office.

Campbell, D. (2005), 'Getting a 'GRIPP' on the research-policy interface in NSW', New South Wales Public Health Bulletin, 16, 10, pp. 154-6.

Campbell, S., Benita, S., Coates, E., Davies, P. and Penn, G. (2007), Analysis for Policy: Evidence-Based Policy in Practice, London, Government Social Research Unit.

Chowdry, H. and Emmerson, C. (2010), An Efficient Maintenance Allowance?, London, Institute for Fiscal Studies, available at: http://www.ifs.org.uk/publications/5370.

CHSRF (2001), 'Communication notes: reader-friendly writing --1:3:25', Ottawa, Canadian Health Services Research Foundation, available at: http://www.chsrf.ca/knowledge_ transfer/communication_notes/comm_reader_friendly_ wwriting_e.php.

CIHR (2006), Evidence in Action, Acting on Evidence: A casebook of health services and policy research knowledge translation stories, Canadian Institute of Health Services and Policy Research, Ottawa, Canada, available at: www.cihr-irsc.gc.ca/e/documents/ ihspr_ktcasebook_e.pdf.

Davies, P.T. (2004), 'Is Evidence-Based Government Possible?', Jerry Lee Lecture to the 4th Annual Campbell Collaboration Colloquium, Washington D.C., 19 February.

Dearden, L., Emmerson, C., Frayne, C. and Meghir, C. (2001), Education Maintenance Allowance: The First Year--A Quantitative Evaluation, London, Department for Education and Skills (now archived at The National Archives website).

--(2008), 'Conditional cash transfers and school dropout rates', The Journal of Human Resources, 44,4, pp. 827-57.

Dorsett, R., Campbell-Barr, V., Hamilton, G., Hoggart, L., Marsh, A., Miller, C., Phillips, J., Ray, K., Riccio, J.A., Rich, S. and Vegeris, S. (2007), Implementation and First-Year Impacts of The UK Employment Retention and Advancement (ERA) Demonstration, Research Report 412, London, Department for Work and Pensions.

Fiszbein, A. and Shady, N. (2009), Conditional Cash Transfers: Reducing Present and Future Poverty, Washington D.C, The World Bank.

Gabbay, J. and Le May A. (2004), 'Evidence based guidelines or collectively constructed "mindlines?" Ethnographic study of knowledge management in primary care', British Medical Journal, 329, pp. 1013-8.

Greenhalgh, T., Robert, G., Bate, P., Macfarlane, F. and Kyriakidou, O. (eds) (2005), Diffusion of Innovations in Health Service Organisations: A Systematic Literature Review, Oxford, Blackwell Publishing Ltd.

GSR (2009), Rapid Evidence Assessment Toolkit, London, Government Social Research Service, available at: http://www.civilservice. gov.uk/networks/gsr/resources-and-guidance/rapid-evidence-assessment.

Hendra, R., Riccio, J.A., Dorsett, R., Greenberg, D.H., Knight, G., Phillips, J, Robins, P.K., Vegeris, S., Walter. J., Hill, A., Ray, K. and Smith, J. (2011), Breaking the Low-Pay, No-Pay Cycle: Final Evidence from the UK Employment, Retention and Advancement (ERA) Demonstration, London, Department for Work and Pensions.

HM Treasury (2003), The Green Book: Appraisal and Evaluation in Central Government, London, HM Treasury.

--(2011), The Magenta Book: Guidance for Evaluation, London, HM Treasury.

IFS (1999), Education Maintenance Allowance (EMA) Evaluation, Institute for Fiscal Studies, London (summary available at: http:// www.ifs.org.uk/projects/98).

Jowell, R. (2003), The Role of 'Pilots' in Policy-Making. Report of a Review of Government, Government Social Research Unit, London, HM Treasury.

Lavis, J., Davies, H., Oxman, A., Denis, J-L., Golden-Biddle, K. and Ferlie, E. (2005), 'Towards systematic reviews that inform health care management and policy-making', Journal of Health Services Research & Policy, 10, Supplement 1, pp. 35-48.

Lavis J.N., Robertson D, Woodside J.M., McLeod C.B., Abelson J. and the Knowledge Transfer Study Group (2003), 'How can research organizations more effectively transfer research knowledge to decision makers?', The Milbank Quarterly, 81,2, pp. 221-48.

Leigh, A. (2009), 'What evidence should social policymakers use?', Australian Treasury Economic Roundup, 1, pp. 27-43.

Lomas, J. (2000), 'Connecting research and policy', Canadian Journal of Policy Research, Spring, pp. 140-4.

Lomas, J, Culyer, T., McCutcheon, C., McAuley, L. and Law, S. (2005), Conceptualizing and Combining Evidence for Health System Guidance: Final Report, Canadian Health Services Research Foundation, Ottawa.

Marsh, D. (2006), 'Evidence-based policy: framework, results and analysis from the New Zealand biotechnology', International Journal of Biotechnology, 8, 3-4, pp. 206-24.

Martin, J.P. (2000), 'What works among active labour market policies: evidence from OECD countries' experiences', OECD Economic Studies, 30, pp. 80-113.

Milani, C.R.S. (2009), Evidence-Based Policy Research: Critical Review of Some International Programmes on Relationships Between Social Science Research and Policy-Making, Paris, France, UNESCO.

Mold, J.W. and Peterson, K.A. (2005), 'Primary care practice-based research networks: working at the interface between research and quality improvement', Annals of Family Medicine, 3, 1, May/ June 2005, S12-S20.

NAO (2007), Evaluation of Regulatory Impact Assessments 2006-07, London, National Audit Office.

--(2009), Delivering High Quality Impact Assessments, London, National Audit Office.

--(2010), Assessing the Impact of Proposed New Policies, Report by the Comptroller and Auditor General, HC 185 Session 2010-2011, London, National Audit Office.

Nutley, S.M., Walter, I. and Davies H.T.O. (2007), Using Evidence: How Research Can Inform Public Services, Bristol, Policy Press.

Obama, B.H. (2009), Inaugural Address, Washington DC, 20 January, available at: http://www.whitehouse.gov/the-press-office/ president-barack-obamas-inaugural-address.

O'Connor, T. (2008), How The Prime Minister Monitors Performance and Assesses Delivery, Presentation to GORSInduction, Tony O'Connor CBE, Chief Operational Research Analyst, Prime Minister's Delivery Unit, 8 May.

Office of the Presidency (2010), Guide To The Outcomes Approach, Pretoria, Office of the Presidency of South Africa.

Ouimet, M., Landry, R., Ziam, S. and Bedard, P. (2009), 'The absorption of research knowledge by public servants', Evidence and Policy, 5, 4, pp. 331-50.

Purdon, S. (2002), Estimating The Impact of Labour Market Programmes, Working Paper No. 3, London, Department for Work and Pensions.

Quets, H., Robins, P.K., Pan, E.C., Michalopoulos, C. and Card, D. (1999), Does SSP Plus Increase Employment? The Effect of Adding Services to the Self Sufficiency Project's Financial Incentives, Ottawa, Social Research Development Corporation.

Rosenbaum, P, and Rubin, D. (1983), 'The central role of the propensity score in observational studies for causal effects', Biometrika, 70, 1, pp. 41-55.

Spielhofer, T., Golden, S., Evans, K., Marshall, H., Mundy, E., Pomati, M. and Styles, B. (2010), Barriers to Participation in Education and Training, Slough, NFER.

Topp L. and McKetin, R. (2003), 'Supporting evidence-based policy making: A case study of the illicit drug reporting system in Australia', Bulletin on Narcotics, 55, 1-2, pp. 23-30, United Nations Office on Drugs and Crime, Vienna, Austria.

Townsend, T. and Kunimoto B. (2009), Collaboration and Culture. The Future of the Policy Research Function in the Government of Canada, Policy Research Initiative, Ottawa, Canada, March.

Weiss, C.H. (1982), 'Policy research in the context of diffuse decision making', Journal of Higher Education, 53, 6, pp. 619-39.

Zussman, D. (2003), 'Evidence-based policy making: some observations of recent Canadian experience', Social Policy Journal of New Zealand, 20, pp. 64-71, June.

NOTES

(1) http://coexgov.securesites.net/index.php?keyword =a432fbc34d71c7

(2) www.ahrq.gov/clinic/epc

(3) http://www.evidencebasedprograms.org/

(4) http://www.cochrane.org/

(5) http://www.campbellcollaboration.org/

(6) http://eppi.ioe.ac.uk/cms/

(7) http://www.dfid.gov.uk/r4d/

(8) http://www.ausaid.gov.au/

(9) http://www.3ieimpact.org?

(10) http://www.nice.org.uk/

(11) http://www.scie.org.uk/

(12) http://www.cebenetwork.org/

(13) For further discussion on the use of propensity score matching in policy evaluation see Bryson et al. (2002) and Purdon (2002).

(14) The observations in this paragraph have been provided by officials at the Department for Work and Pensions via personal communication.

(15) ROAMEF is an acronym for Rationale, Objectives, Appraisal, Monitoring, Evaluation and Feedback (HM Treasury, 2003, p. 3).

Philip Davies, Oxford Evidentia Limited. E-mail: pdavies@oxev.co.uk.