One of well-known aspects of multisensory communication is auditory and visual integration in face-to-face speech perception as demonstrated in the McGurk effect in which heard speech is altered by mismatching visual mouth movements. The susceptibility to the McGurk effect varies depending on various factors including the intelligibility of auditory speech. Here I focus on the language background of perceivers as an influencing factor on the degree of use of visual speech. When the auditory speech is highly intelligible, native Japanese speakers tend to depend on auditory speech, showing less visual influence compared with native English speakers. Such interlanguage differences are not apparent at 6 years of age, but are developed by 8 years due to increasing visual influence in native English speakers. It seems that native English speakers developmentally acquire vigorous lipreading ability such that adult English speakers can lipread monosyllables faster than they can hear them, while such visual precedence is not observed in native Japanese speakers. This kind of interlanguage difference is being confirmed by event-related potentials and functional magnetic resonance imaging.