In an everyday social interaction we automatically integrate another's facial movements and vocalizations, be they linguistic or otherwise. This requires audiovisual integration of a continual barrage of sensory input-a phenomenon previously well-studied with human audiovisual speech, but not with non-verbal vocalizations. Using both fMRI and ERPs, we assessed neural activity to viewing and listening to an animated female face producing non-verbal, human vocalizations (i.e. coughing, sneezing) under audio-only (AUD), visual-only (VIS) and audiovisual (AV) stimulus conditions, alternating with Rest (R). Underadditive effects occurred in regions dominant for sensory processing, which showed AV activation greater than the dominant modality alone. Right posterior temporal and parietal regions showed an AV maximum in which AV activation was greater than either modality alone, but not greater than the sum of the unisensory conditions. Other frontal and parietal regions showed Common-activation in which AV activation was the same as one or both unisensory conditions. ERP data showed an early superadditive effect (AV > AUD + VIS, no rest), mid-range underadditive effects for auditory N140 and face-sensitive N170, and late AV maximum and common-activation effects. Based on convergence between fMRI and ERP data, we propose a mechanism where a multisensory stimulus may be signaled or facilitated as early as 60 ms and facilitated in sensory-specific regions by increasing processing speed (at N170) and efficiency (decreasing amplitude in auditory and face-sensitive cortical activation and ERPs). Finally, higher-order processes are also altered, but in a more complex fashion.