期刊名称:EURASIP Journal on Audio, Speech, and Music Processing
印刷版ISSN:1687-4714
电子版ISSN:1687-4722
出版年度:2019
卷号:2019
期号:1
页码:1-16
DOI:10.1186/s13636-018-0144-6
出版社:Hindawi Publishing Corporation
摘要:Filter banks on spectrums play an important role in many audio applications. Traditionally, the filters are linearly distributed on perceptual frequency scale such as Mel scale. To make the output smoother, these filters are often placed so that they overlap with each other. However, fixed-parameter filters are usually in the context of psychoacoustic experiments and selected experimentally. To make filter banks discriminative, the authors use a neural network structure to learn the frequency center, bandwidth, gain, and shape of the filters adaptively when filter banks are used as a feature extractor. This paper investigates several different constraints on discriminative frequency filter banks and the dual spectrum reconstruction problem. Experiments on audio source separation and audio scene classification tasks show performance improvements of the proposed filter banks when compared with traditional fixed-parameter triangular or gaussian filters on Mel scale. The classification errors on LITIS ROUEN dataset and DCASE2016 dataset are reduced by 13.9% and 4.6% relatively.
关键词:Discriminative frequency filter banks; Networks; Audio scene classification; Audio source separation