This paper describes a hardware architecture to implement the watershed algorithm using rainfall simulation. The speed of the architecture is increased by utilizing a multiple memory bank approach to allow parallel access to the neighbourhood pixel values. In a single read cycle, the architecture is able to obtain all five values of the centre and four neighbours for a 4-connectivity watershed transform. The storage requirement of the multiple bank implementation is the same as a single bank implementation by using a graph-based memory bank addressing scheme. The proposed rainfall watershed architecture consists of two parts. The first part performs the arrowing operation and the second part assigns each pixel to its associated catchment basin. The paper describes the architecture datapath and control logic in detail and concludes with an implementation on a Xilinx Spartan-3 FPGA.