Little is known on cross-modal interaction in complex object recognition. The factors influencing this interaction were investigated using simultaneous presentation of pictures and vocalizations of animals. In separate blocks, the task was to identify either the visual or the auditory stimulus, ignoring the other modality. The pictures and the sounds were congruent (same animal), incongruent (different animals) or neutral (animal with meaningless stimulus). Performance in congruent trials was better than in incongruent trials, regardless of whether subjects attended the visual or the auditory stimuli, but the effect was larger in the latter case. This asymmetry persisted with addition of a long delay after the stimulus and before the response. Thus, the asymmetry cannot be explained by a lack of processing time for the auditory stimulus. However, the asymmetry was eliminated when low-contrast visual stimuli were used. These findings suggest that when visual stimulation is highly informative, it affects auditory recognition more than auditory stimulation affects visual recognition. Nevertheless, this modality dominance is not rigid; it is highly influenced by the quality of the presented information.