The higher resolutions and new functionality of video applications increase their throughput and processing requirements. In contrast, the energy and heat limitations of mobile devices demand low-power video cores. We propose a memory and communication centric design methodology to reach an energy-efficient dedicated implementation. First, memory optimizations are combined with algorithmic tuning. Then, a partitioning exploration introduces parallelism using a cyclo-static dataflow model that also expresses implementation-specific aspects of communication channels. Towards hardware, these channels are implemented as a restricted set of communication primitives. They enable an automated RTL development strategy for rigorous functional verification. The FPGA/ASIC design of an MPEG-4 Simple Profile video codec demonstrates the methodology. The video pipeline exploits the inherent functional parallelism of the codec and contains a tailored memory hierarchy with burst accesses to external memory. 4CIF encoding at 30 fps, consumes 71 mW in a 180 nm, 1.62 V UMC technology.