Information from face and voice plays an important role in social communication. As shown in the study of speech perception, facial and vocal signals are integrated even in the perception of emotion. This paper reviews the studies on multisensory perception of emotion by faces and voices. This paper then introduces recent studies on the cultural differences in the multisensory perception of emotion. It is emphasized that the combination of faces and voices can yield the richness in the expressions of emotions.