摘要:AbstractThis study proposes a methodology for recognizing repetitions in stuttered speech. First, the recorded speech is parameterized by extracting six acoustic features, including volume, zero crossing rate, spectral entropy, high-order derivatives, VH curve, and VE curve. Second, the speech is segmented using the technique of end-point detection according (EPD) to the threshold of VH curve. Third, the features of the segmented speech are processed by dynamic time warping (DTW) to identify similar patterns in neighbouring segments. The proposed method was verified using the artificial stuttering samples of Mandarin Chinese. Ten male subjects were asked to imitate stuttering by speak out 39 predefined repetition settings. These settings are planned by considering three Mandarin Phonetic Symbols ([t], [k], [t‘]) and three kinds of repetitions (part-word repetition, whole-word repetition, multi-syllable word repetition). The experimental results indicate that EPD using VH curve is capable to slice the repetition in artificial stuttered speech. Comparing the results for recognizing the phoneme and single syllable words, there is no significant difference for the threshold of DTW. The performance of DTW in recognizing repetitions had high accuracy of 83%. Therefore, the proposed method combining EPD and DTW is feasible for automatic recognition of repetitions in stuttered speech. However, more real stuttered speech samples are still needed to verify and improve the proposed method.
关键词:Repetitions;Mandarin Chinese;End-point detection;Dynamic time warping