Audio-visual [McGurk and MacDonald (1976). Nature 264, 746-748] and audio-tactile [Gick and Derrick (2009). Nature 462(7272), 502-504] speech stimuli enhance speech perception over audio stimuli alone. In addition, multimodal speech stimuli form an asymmetric window of integration that is consistent with the relative speeds of the various signals [Munhall, Gribble, Sacco, and Ward (1996). Percept. Psychophys. 58(3), 351-362; Gick, Ikegami, and Derrick (2010). J. Acoust. Soc. Am. 128(5), EL342-EL346]. In this experiment, participants were presented video of faces producing /pa/ and /ba/ syllables, both alone and with air puffs occurring synchronously and at different timings up to 300 ms before and after the stop release. Perceivers were asked to identify the syllable they perceived, and were more likely to respond that they perceived /pa/ when air puffs were present, with asymmetrical preference for puffs following the video signal-consistent with the relative speeds of visual and air puff signals. The results demonstrate that visual-tactile integration of speech perception occurs much as it does with audio-visual and audio-tactile stimuli. This finding contributes to the understanding of multimodal speech perception, lending support to the idea that speech is not perceived as an audio signal that is supplemented by information from other modes, but rather that primitives of speech perception are, in principle, modality neutral.