Few, if any, of the macro stress tests undertaken before the current crisis uncovered significant vulnerabilities. This article examines the reasons for the poor performance by comparing the outcomes of simple stress tests with actual events for a large sample of historical banking crises. The results highlight that the structural assumptions underlying stress testing models do not match output growth around many crises. Furthermore, unless macro conditions are already weak prior to the eruption of the crisis, the vast majority of stress scenarios based on historical data are not severe enough. Last, stress testing models are not robust, as statistical relationships tend to break down during crises. These insights have important implications for the design and conduct of stress tests in the future.