出版社:The Institute of Image Information and Television Engineers
摘要:We have been studying a real-time speech-to-caption system using speech recognition technology with a repeat speaking method. In this system, we used a repeat speaker who listens to a lecturer's voice and then speaks back the lecturer's utterances into a speech recognition computer. Our developing system showed that the accuracy of the captions is about 97% in Japanese-Japanese conversion, and the conversion time from voices to captions is about 4 seconds in English-English conversion in some international conferences. Of course it required a lot of costs to achieve these high performances. In human communications, speech understanding depends not only on verbal information but also on non-verbal information such as speaker's gestures and face and mouth movements. Therefore, we found a suitable way to display the information of captions and speaker's face movement images to achieve higher comprehension after briefly storing information once into a computer. In this paper, we investigated the relationship of the display sequence and display timing between captions that have speech recognition errors and the speaker's face movement images. The results showed that the sequence displaying the caption before the speaker's face image improved the comprehension of the captions. The sequence displaying both simultaneously showed an improvement of only a few percent higher than that of the question sentence, and the sequence displaying the speaker's face image before the caption showed almost no change. In addition, the sequence displaying the caption 1 second before the speaker's face showed the most significant improvement of all the conditions in the hearing-impaired.