This study aims to further examine the cross-cultural differences in multisensory emo- tion perception between Western and East Asian people. In this study, we recorded the audiovisual stimulus video of Japanese actors saying neutral phrase with one of the basic emotions. Then we conducted a validation experiment of the stimuli. In the first part (facial expression), participants watched a silent video of actors and judged what kind of emotion the actor is expressing by choosing among 6 options (i.e., happiness, anger, disgust, sadness, surprise, and fear). In the second part (vocal expression), they listened to the audio part of the same videos without video images while the task was the same. We analyzed their categorization responses based on accuracy and confusion matrix, and discussed the tendency of emotion perception by Japanese.