The aim of this study was to examine the effect of congruence between the sensory modality through which a concept can be experienced and the modality through which the word denoting that concept is perceived during word recognition. Words denoting concepts that can be experienced visually (e.g. “color”) and words denoting concepts that can be experienced auditorily (e.g. “noise”) were presented both visually and auditorily. We observed shorter processing latencies when there was a match between the modality through which a concept could be experienced and the modality through which a word denoting that concept was presented. In visual lexical decision task, “color” was recognized faster than “noise”, whereas in auditory lexical decision task, “noise” was recognized faster than “color”. The obtained pattern of results can not be accounted for by exclusive amodal theories, whereas it can be easily integrated in theories based on perceptual representations.