We report a series of experiments designed to demonstrate that the presentation of a sound can facilitate the identification of a concomitantly presented visual target letter in the backward masking paradigm. Two visual letters, serving as the target and its mask, were presented successively at various interstimulus intervals (ISIs). The results demonstrate that the crossmodal facilitation of participants' visual identification performance elicited by the presentation of a simultaneous sound occurs over a very narrow range of ISIs. This critical time-window lies just beyond the interval needed for participants to differentiate the target and mask as constituting two distinct perceptual events (Experiment 1) and can be dissociated from any facilitation elicited by making the visual target physically brighter (Experiment 2). When the sound is presented at the same time as the mask, a facilitatory, rather than an inhibitory effect on visual target identification performance is still observed (Experiment 3). We further demonstrate that the crossmodal facilitation of the visual target by the sound depends on the establishment of a reliable temporally coincident relationship between the two stimuli (Experiment 4); however, by contrast, spatial coincidence is not necessary (Experiment 5). We suggest that when visual and auditory stimuli are always presented synchronously, a better-consolidated object representation is likely to be constructed (than that resulting from unimodal visual stimulation).