出版社:The Institute of Image Information and Television Engineers
摘要:We investigated the effect of asynchronization between visual and auditory stimuli of speech movies upon the correct acquisition of the speech contents under various auditory-noise-conditions. We presented movies of a female announcer's face uttering short sentences to subjects who were asked to repeat the sentence aurally. All sentences consist of five words of the same word order (subject time location object verb). The utterance speed was 120 or 150 ms/mora. Audio-visual asynchronies were 0, ±1, ±2, ±4, ±8, ±16, and ±32 frames (1 frame = 1/30 s, +: visual preceding condition, -: audio preceding condition). Pink noise of -10 dB or -15 dB was imposed on the speech sound. The same procedure was done using the stimuli with no visual image (audio-only) as a control. Increase of word recognition rate compared with the level of audio-only was found within an asynchronization of ±4 frames delay indicating that the facilitation by visual information occurs within that period. On the other hand, a decrease of the recognition rate was observed when the delay is over ±8 frames, which indicates that visual information does not improve, or in some conditions, even disturbs the speech recognition. To examine the effect of word order, the same experiment was done using sentences but with a different word order. Subject performed best when presented with the order used in news reading i.e., time location subject object verb, and recognized the subject word more frequently than any others in all conditions. It is interesting that the time range of visual-facilitation is comparable with the time range of audiovisual simultaneity reported in psychophysical studies.