An efficient method that estimates the depth map of a 3D-scene using the motion information of the H.264-encoded 2D-video is presented. The motion information of the video-frames captured via a single camera is either directly used or modified to approximate the displacement (disparity) that exists between the right and left images when the scene is captured by stereoscopic cameras. Then, depth is estimated based on its inverse relation with disparity. The low-complexity of this method and its compatibility with future broadcasting networks allow its real-time implementation at the receiver; thus 3D-signal is constructed at no additional burden to the network. Performance evaluations show that this method outperforms the other existing H.264-based technique by up to 1.98 dB PSNR, providing more realistic depth information of the scene. Moreover subjective comparisons of the results, obtained by viewers watching the generated stereo video sequences on a 3D-display system, confirm the superiority of our method.