文章基本信息

标题：Pitfalls of significance testing and $p$-value variability: An econometrics perspective
作者：Norbert Hirschauer ; Sven Grüner ; Oliver Mußhoff 等
期刊名称：Statistics Surveys
印刷版ISSN：1935-7516
出版年度：2018
卷号：12
页码：136-172
DOI：10.1214/18-SS122
语种：English
出版社：Statistics Surveys
摘要：Data on how many scientific findings are reproducible are generally bleak and a wealth of papers have warned against misuses of the $p$-value and resulting false findings in recent years. This paper discusses the question of what we can(not) learn from the $p$-value, which is still widely considered as the gold standard of statistical validity. We aim to provide a non-technical and easily accessible resource for statistical practitioners who wish to spot and avoid misinterpretations and misuses of statistical significance tests. For this purpose, we first classify and describe the most widely discussed (“classical”) pitfalls of significance testing, and review published work on these misuses with a focus on regression-based “confirmatory” study. This includes a description of the single-study bias and a simulation-based illustration of how proper meta-analysis compares to misleading significance counts (“vote counting”). Going beyond the classical pitfalls, we also use simulation to provide intuition that relying on the statistical estimate “$p$-value” as a measure of evidence without considering its sample-to-sample variability falls short of the mark even within an otherwise appropriate interpretation. We conclude with a discussion of the exigencies of informed approaches to statistical inference and corresponding institutional reforms.
关键词：Meta-analysis; multiple testing; $p$-hacking; publication bias; $p$-value misinterpretations; $p$-value sample-to-sample variability; statistical inference; statistical significance