出版社:The Institute of Image Information and Television Engineers
摘要:We have developed an intelligible high-speed speech rate conversion technology using the acoustic feature quantities that contribute to prosody. In contrast to the conventional method, which plays back accelerated speech at the same uniform rate from the beginning to end, our proposed approach varies the playback rate adaptively on the basis of acoustic detection of the position of an utterance and any fluctuations in a speaker's fundamental frequency (F0) and power. In so doing, we hope to make high-speed playback easier to listen to by providing the listener with a "slowed-down" playback effect. Since this approach converts speech rate using just the acoustic features of audio data, it can be applied to not only Japanese but other languages as well. While the algorithm we developed in this study is optimized for the Japanese language, we aim to implement the proposed approach in a wider array of commercial devices and customize the technology to various languages.