摘要:Under the framework of traditional pronunciation evaluation system, we try to extract discrete and continuous fundamental frequency features by different methods, use the fundamental frequency features respectively at the speech frame level and syllable level to set up embedded tone models, explicit tone models and their mixed models for the pronunciation evaluation of Mandarin tone, and then compare the influence of different methods and models on the evaluation performance. The results show that the performance of mixed models is the best. The average score error rate (ASER) of mixed models is 0.249 with a relative 29.66% reduction in contrast with the baseline system.