The second of two targets is often missed when presented shortly after the first target--a phenomenon referred to as the attentional blink (AB). Whereas the AB is a robust phenomenon within sensory modalities, the evidence for cross-modal ABs is rather mixed. Here, we test the possibility that the absence of an auditory-visual AB for visual letter recognition when streams of tones are used is due to the efficient use of echoic memory, allowing for the postponement of auditory processing. However, forcing participants to immediately process the auditory target, either by presenting interfering sounds during retrieval or by making the first target directly relevant for a speeded response to the second target, did not result in a return of a cross-modal AB. Thefindings argue against echoic memory as an explanation for efficient cross-modal processing. Instead, we hypothesized that a cross-modal AB may be observed when the different modalities use common representations, such as semantic representations. In support of this, a deficit for visual letter recognition returned when the auditory task required a distinction between spoken digits and letters.