This paper focuses on the cognitive and neural mechanisms of speech perception: the rapid, and highly automatic processes by which complex time-varying speech signals are perceived as sequences of meaningful linguistic units. We will review four processes that contribute to the perception of speech: perceptual grouping, lexical segmentation, perceptual learning and categorical perception, in each case presenting perceptual evidence to support highly interactive processes with top-down information flow driving and constraining interpretations of spoken input. The cognitive and neural underpinnings of these interactive processes appear to depend on two distinct representations of heard speech: an auditory, echoic representation of incoming speech, and a motoric/somatotopic representation of speech as it would be produced. We review the neuroanatomical system supporting these two key properties of speech perception and discuss how this system incorporates interactive processes and two parallel echoic and somato-motoric representations, drawing on evidence from functional neuroimaging studies in humans and from comparative anatomical studies. We propose that top-down interactive mechanisms within auditory networks play an important role in explaining the perception of spoken language.