摘要:Global flood models (GFMs) are becoming increasingly important for disaster risk management internationally. However, these models have had little validation against observed flood events, making it difficult to compare model performance. In this paper, we introduce the first collective validation of multiple GFMs against the same events and we analyse how different model structures influence performance. We identify three hydraulically diverse regions in Africa with recent large scale flood events: Lokoja, Nigeria; Idah, Nigeria; and Chemba, Mozambique. We then evaluate the flood extent output provided by six GFMs against satellite observations of historical flood extents in these regions. The critical success index of individual models across the three regions ranges from 0.45 to 0.7 and the percentage of flood captured ranges from 52% to 97%. Site specific conditions influence performance as the models score better in the confined floodplain of Lokoja but score poorly in Idah's flat extensive floodplain. 2D hydrodynamic models are shown to perform favourably. The models forced by gauged flow data show a greater level of return period accuracy compared to those forced by climate reanalysis data. Using the results of our analysis, we create and validate a three-model ensemble to investigate the usefulness of ensemble modelling in a flood hazard context. We find the ensemble model performs similarly to the best individual and aggregated models. In the three study regions, we found no correlation between performance and the spatial resolution of the models. The best individual models show an acceptable level of performance for these large rivers.