摘要:Object-based audio techniques have become common since they provide the flexibility for personalized rendering. In this paper a multi-stage encoding scheme for multiple audio objects is proposed. The scheme is based on intra-object sparsity. In the encoding phase the dominant Time Frequency (TF) instants of all active object signals are extracted and divided into several stages to form the multi- stage observation signals for transmission. In the decoding phase the preserved TF instants are recovered via Compressed Sensing (CS) technique, and further used for reconstructing the audio objects. The evaluations validated that the proposed encoding scheme can achieve scalable transmission while maintaining perceptual quality of each audio object.