摘要:Solfeggio is an important basic course for music majors, and audio recognition training is one of the important links. With the improvement of computer performance, audio recognition has been widely used in smart wearable devices. In recent years, the development of deep learning has accelerated the research process of audio recognition. However, there is a lot of sound interference in music teaching environment, which leads to the performance of the audio classifier that cannot meet the actual demand. In order to solve this problem, an improved audio recognition system based on YOLO-v4 is proposed, which mainly improves the network structure. First, Mel frequency cepstrum number is used to process the original audio and extract the corresponding features. Then, try to apply the YOLO-v4 model in the field of deep learning to the field of audio recognition and improve it by combining with the spatial pyramid pool module to strengthen the generalization ability of data in different audio formats. Second, the stacking method in ensemble learning is used to fuse the independent submodels of two different channels. Experimental results show that compared with other deep learning technologies, the improved YOLO-v4 model can improve the performance of audio recognition, and it has better performance in processing data of different audio formats, which shows better generalization ability.