标题:Use of Design Effects and Sample Weights in Complex Health Survey Data: A Review of Published Articles Using Data From 3 Commonly Used Adolescent Health Surveys
摘要:Objectives. We assessed how frequently researchers reported the use of statistical techniques that take into account the complex sampling structure of survey data and sample weights in published peer-reviewed articles using data from 3 commonly used adolescent health surveys. Methods. We performed a systematic review of 1003 published empirical research articles from 1995 to 2010 that used data from the National Longitudinal Study of Adolescent Health (n = 765), Monitoring the Future (n = 146), or Youth Risk Behavior Surveillance System (n = 92) indexed in ERIC, PsycINFO, PubMed, and Web of Science. Results. Across the data sources, 60% of articles reported accounting for design effects and 61% reported using sample weights. However, the frequency and clarity of reporting varied across databases, publication year, author affiliation with the data, and journal. Conclusions. Given the statistical bias that occurs when design effects of complex data are not incorporated or sample weights are omitted, this study calls for improvement in the dissemination of research findings based on complex sample data. Authors, editors, and reviewers need to work together to improve the transparency of published findings using complex sample data. Secondary data analysis of nationally representative health surveys is commonly conducted by health science researchers and can be extremely useful when they are investigating risk and protective factors associated with health-related outcomes. By providing access to a vast array of variables on large numbers of individuals, large-scale health survey data are enticing to many researchers. Many researchers, however, lack the methodological skills needed for effective access to and use of such data. Traditional statistical methods and software analysis programs assume that data were generated through simple random sampling, with each individual having equal probability of being selected. With large, nationally representative health surveys, however, this is often not the case. Instead, from the perspective of statistical analysis, data from these complex sample surveys differ from those obtained via simple random sampling in 4 respects. First, the probabilities of selection of the observations are not equal; oversampling of certain subgroups in the population is often employed in survey sample design to allow reasonable precision in the estimation of parameters. Second, multistage sampling results in clustered observations in which the variance among units within each cluster is less than the variance among units in general. Third, stratification in sampling ensures appropriate sample representation on the stratification variable(s), but yields negatively biased estimates of the population variance. Fourth, unit nonresponse and other poststratification adjustments are usually applied to the sample to allow unbiased estimates of population characteristics. 1 If these aspects of complex survey data are ignored, standard errors and point estimates are biased, thereby potentially leading to incorrect inferences being made by the researcher.