We extend a stochastic model of hierarchical dependencies between wavelet coefficients of still images to the spatiotemporal decomposition of video sequences, obtained by a motion-compensated 2D + t wavelet decomposition. We propose new estimators for the parameters of this model which provide better statistical performances. Based on this model, we deduce an optimal predictor of missing samples in the spatiotemporal wavelet domain and use it in two applications: quality enhancement and error concealment of scalable video transmitted over packet networks. Simulation results show significant quality improvement achieved by this technique with different packetization strategies for a scalable video bit stream.