Design and implementation of a highly optimized MIMO (multiple-input multiple-output) detector requires cooptimization of the algorithm with the underlying hardware architecture. Special attention must be paid to application requirements such as throughput, latency, and resource constraints. In this work, we focus on a highly optimized matrix inversion free 4 × 4 MMSE (minimum mean square error) MIMO detector implementation. The work has resulted in a real-time field-programmable gate array-based implementation (FPGA-) on a Xilinx Virtex-2 6000 using only 9003 logic slices, 66 multipliers, and 24 Block RAMs (less than 33% of the overall resources of this part). The design delivers over 420 Mbps sustained throughput with a small 2.77-microsecond latency. The designed 4 × 4 linear MMSE MIMO detector is capable of complying with the proposed IEEE 802.11n standard.