摘要:Randomized controlled trials (RCTs) are essential for evaluating the efficacy of clinical interventions, where the causal chain between the agent and the outcome is relatively short and simple and where results may be safely extrapolated to other settings. However, causal chains in public health interventions are complex, making RCT results subject to effect modification in different populations. Both the internal and external validity of RCT findings can be greatly enhanced by observational studies using adequacy or plausibility designs. For evaluating large-scale interventions, studies with plausibility designs are often the only feasible option and may provide valid evidence of impact. There is an urgent need to develop evaluation standards and protocols for use in circumstances where RCTs are not appropriate. Public health has moved forward in recent years to improve the scientific standards for evidence underlying interventions and actions. “Evidence-based public health” 1 calls for a solid knowledge base on disease frequency and distribution, on the determinants and consequences of disease, and on the safety, efficacy, and effectiveness of interventions and their costs. The efficacy of an intervention is defined as its effect under “ideal conditions.” 2 The effectiveness of an intervention is defined as its effect under normal conditions in field settings. In this report, we question common assumptions about the types of evidence needed to demonstrate the efficacy and effectiveness of public health interventions and suggest that the guidelines for such evidence be updated. Designs for large-scale impact evaluations of health and nutrition interventions are often based on the principles that have guided “gold standard” trials of new medicines and preventive agents in the past. 3, 4 Over time, more and more medical scientists turned to randomized controlled trials (RCTs) in an effort to increase the internal validity of their designs. More recently, this increased attention to quality standards in clinical research has led to the Movement for Evidence-Based Medicine 5 and the establishment of the Cochrane Collaboration, 6 resulting in important improvements in methods and the quality of available evidence. The success of these efforts encouraged the extension of RCT designs to the fields of public health and health policy. 7, 8 The Cochrane Collaboration now includes meta-analyses of many public health topics, 6 and the on-line Journal of Evidence-based Healthcare has recently been established to provide an outlet for work in this area. 9 RCTs have increasingly been promoted for the evaluation of public health interventions. In an earlier report, 10 2 of the authors (C. G. V. and J.-P. H.) described 3 types of scientific inference that are often used for making policy decisions in the fields of health and nutrition. Probability statements are based strictly on RCT results. Plausibility statements are derived from evaluations that, despite not being randomized, are aimed at making causal statements using observational designs with a comparison group. Adequacy statements result from demonstrations that trends in process indicators, impact indicators, or both show substantial progress, suggesting that the intervention is having an important effect. Although the evaluation literature has dealt with nonexperimental or quasi-experimental designs for several decades, 11 most examples of these methods arise from fields such as education, law enforcement, and economics. We are unaware of a systematic discussion of their application to public health. In this article, we argue that the probability approach, and specifically RCTs, are often inappropriate for the scientific assessment of the performance and impact of large-scale interventions. Although evidence-based public health is both possible and desirable, it must go well beyond RCTs. We describe the limitations of using RCTs alone as a source of data on the performance of public health interventions and suggest complementary and alternative approaches that will yield valid and generalizable evidence.