摘要:Accurate rainfall forecasts on timescales ranging from a few hours to several weeks are needed for many hydrological applications. This study examines bias, skill and reliability of four ensemble forecast systems (from Canada, UK, Europe, and the United States) and a multi-model ensemble as applied to Ethiopian catchments. By verifying these forecasts on hydrological catchments, we focus on spatial scales that are relevant to many actual water forecasting applications, such as flood forecasting and reservoir optimization. By most verification metrics tested, the bias corrected European model is the best individual model at predicting daily rainfall variations, while the Canadian model shows the most realistic ensemble spread and thus the most reliable forecast probabilities, including those of extreme events. The skill of the multi-model ensemble outperforms individual models by most metrics, and is skillful up to 9 days ahead. Skill is higher for the 0–5 day model accumulation than for the first 24 h, suggesting that timing errors strongly penalize the skill of forecasts with shorter accumulation periods. Due to seasonality in the model biases, bias correction is best applied to each month individually. Forecasting extreme rainfall is a challenge for Ethiopia, especially over mountainous regions where positive skill is only reached after bias correction. Compared to individual models, the multi-model ensemble has a higher probability of detecting extreme rainfall and a lower false alarm rate, with usable skill at 24 h lead times.