The ability to recognize emotional states of others is a fundamental social skill. In this study, we investigated the extent to which complex emotions can be inferred from facial or vocal cues in speech. Several sentences were prepared that intended to appreciate, blame, apologize, or congratulate others. Japanese university students uttered these sentences with congruent or incongruent emotional states, and they were recorded with a video camera. The speakers' friends and strangers were shown these videos in a single modality (face or voice only) and they were asked to rate the perceived emotional states of the speakers. The results showed that the raters discriminated congruent message conditions from incongruent message conditions, and that this discrimination largely depended on voice cues, rather than face cues. The results also showed that the effects of familiarity of target person modulated the way of inferring emotional states. These results suggest that we could detect subtle emotional nuances of others in spoken interaction, and that we use facial and vocal information in some different ways.