In an earlier study, we found that human voices evoked a positive event-related potential (ERP) peaking at approximately 320 ms after stimulus onset, distinctive from those elicited by instrumental tones. Here we show that though similar in latency to the Novelty P3, this Voice-Sensitive Response (VSR) differs in antecedent conditions and scalp distribution. Furthermore, when participants were not attending to stimuli, the response to voices was undistinguished from other harmonic stimuli (strings, winds, and brass). During a task requiring attending to a feature other than timbre, voices were not distinguished from voicelike stimuli (strings), but were distinguished from other harmonic stimuli. We suggest that the component elicited by voices and similar sounds reflects the allocation of attention on the basis of stimulus significance (as opposed to novelty), and propose an explanation of the task and attentional factors that contribute to the effect.