Wavelet-based automated global image registration (WAGIR) is fundamental for most remote sensing image processing algorithms and extremely computation-intensive. With more and more algorithms migrating from ground computing to onboard computing, an efficient dedicated architecture of WAGIR is desired. In this paper, a BWAGIR architecture is proposed based on a block resampling scheme. BWAGIR achieves a significant performance by pipelining computational logics, parallelizing the resampling process and the calculation of correlation coefficient and parallel memory access. A proof-of-concept implementation with 1 BWAGIR processing unit of the architecture performs at least 7.4X faster than the CL cluster system with 1 node, and at least 3.4X than the MPM massively parallel machine with 1 node. Further speedup can be achieved by parallelizing multiple BWAGIR units. The architecture with 5 units achieves a speedup of about 3X against the CL with 16 nodes and a comparative speed with the MPM with 30 nodes. More importantly, the BWAGIR architecture can be deployed onboard economically.