The cross-modality adjective metaphor (e.g., “ red taste ”, “ silent color ”) is a metaphor in which the vehicle (i.e., adjective) and the topic (=tenor) (i.e., noun) express different perceptual qualities. Most of the existing studies examine how the acceptability of cross-modality adjective metaphors can be explained by the pairing of the vehicle’s and the topic’s perceptual qualities. Unlike these studies, this paper explores how people comprehend cross-modality adjective metaphors. We conducted a large-scale psychological experiment and collected 10388 words associated with 62 cross-modality adjective metaphors. We regarded those words as features of cross-modality adjective metaphors and classified them into the following four kinds of features: common (features listed for the metaphor, the vehicle and the topic), vehicle-shared (features listed for both the metaphor and the vehicle, but not listed for the topic), topic-shared (features listed for both the metaphor and the topic, but not listed for the vehicle), and emergent (features listed for the metaphor, but not listed for either the vehicle or the topic). The result showed that there weresignificantly more emergent features than the other kinds of features in the comprehension of cross-modality adjective metaphors. We assumed that emergent meanings of cross-modality adjective metaphors are based on scene association. We analyzed how many words associated with cross-modality adjective metaphors could beclassified into those based on scene association. The result showed that therewere significantly more words based on scene association than those not based on scene association. This result suggests that meanings of cross-modality adjective metaphors are basically based on scene association.