Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 4, 67

From Where to What: A Neuroanatomically Based Evolutionary Model of the Emergence of Speech in Humans


From Where to What: A Neuroanatomically Based Evolutionary Model of the Emergence of Speech in Humans

Oren Poliva. F1000Res.


In the brain of primates, the auditory cortex connects with the frontal lobe via the temporal pole (auditory ventral stream; AVS) and via the inferior parietal lobe (auditory dorsal stream; ADS). The AVS is responsible for sound recognition, and the ADS for sound-localization, voice detection and integration of calls with faces. I propose that the primary role of the ADS in non-human primates is the detection and response to contact calls. These calls are exchanged between tribe members (e.g., mother-offspring) and are used for monitoring location. Detection of contact calls occurs by the ADS identifying a voice, localizing it, and verifying that the corresponding face is out of sight. Once a contact call is detected, the primate produces a contact call in return via descending connections from the frontal lobe to a network of limbic and brainstem regions. Because the ADS of present day humans also performs speech production, I further propose an evolutionary course for the transition from contact call exchange to an early form of speech. In accordance with this model, structural changes to the ADS endowed early members of the genus Homo with partial vocal control. This development was beneficial as it enabled offspring to modify their contact calls with intonations for signaling high or low levels of distress to their mother. Eventually, individuals were capable of participating in yes-no question-answer conversations. In these conversations the offspring emitted a low-level distress call for inquiring about the safety of objects (e.g., food), and his/her mother responded with a high- or low-level distress call to signal approval or disapproval of the interaction. Gradually, the ADS and its connections with brainstem motor regions became more robust and vocal control became more volitional. Speech emerged once vocal control was sufficient for inventing novel calls.

Keywords: Auditory cortex; Auditory dorsal stream; Contact calls; Evolution; Speech; Vocal production.

Conflict of interest statement

Competing interests: No competing interests were disclosed.


Figure 1.
Figure 1.. Dual stream connectivity between the auditory cortex and frontal lobe of monkeys and humans.
Top: The auditory cortex of the monkey (left) and human (right) is schematically depicted on the supratemporal plane and observed from above (with the parieto-frontal operculi removed). Bottom: The brain of the monkey (left) and human (right) is schematically depicted and displayed from the side. Orange frames mark the region of the auditory cortex, which is displayed in the top sub-figures. Top and Bottom: Blue colors mark regions affiliated with the ADS, and red colors mark regions affiliated with the AVS (dark red and blue regions mark the primary auditory fields). Abbreviations: AMYG-amygdala, HG-Heschl’s gyrus, FEF-frontal eye field, IFG-inferior frontal gyrus, INS-insula, IPS-intra parietal sulcus, MTG-middle temporal gyrus, PC-pitch center, PMd-dorsal premotor cortex, PP-planum polare, PT-planum temporale, TP-temporal pole, Spt-sylvian parietal-temporal, pSTG/mSTG/aSTG-posterior/middle/anterior superior temporal gyrus, CL/ML/AL/RTL-caudo-/middle-/antero-/rostrotemporal-lateral belt area, CPB/RPB-caudal/rostral parabelt fields.
Figure 2.
Figure 2.. Discrete stages in contact call exchange.
In accordance with the model, the original function of the ADS is for the localization of and the response to contact calls that are exchanged between mothers and their infants. When an infant emits a contact call ( A), the mother identifies her offspring’s voice (B1) localizes the call (B2) and maintains this information in visual working memory. Then, if the corresponding face is absent in that location (B3), the mother emits a contact call in return ( C).
Figure 3.
Figure 3.. The use of prosody to signal levels of distress.
In accordance with the model, early Hominans became capable of modifying their contact calls with intonations (prosody). This modification could have originated for the purpose of expressing different levels of distress. In this figure, we see a Homo habilis child using prosody to modify the contact call to express a high level of distress ( A) or a low level of distress ( C). The child’s mother then registers the call (by integrating his prosodic intonation and voice, location, and the absence of his face) to recognize whether her child requires immediate ( B) or non-immediate ( D) attention.
Figure 4.
Figure 4.. Prosody and the emergence of question-answer conversations.
In accordance with the model, the modification of contact calls with intonations for reporting distress levels eventually transitioned into question-answer conversations about items in their environment. In this figure, a child is using low-level distress call ( A, C) to ask permission to eat an unfamiliar food (berries). The mother can then respond with a high-level distress call ( D) that signals danger or a low-level distress ( B) that signals safety.

Similar articles

See all similar articles

Cited by 1 article


    1. Aboitiz F, García VR: The evolutionary origin of the language areas in the human brain. A neuroanatomical perspective. Brain Res Brain Res Rev. 1997;25(3):381–396. 10.1016/S0165-0173(97)00053-2 - DOI - PubMed
    1. Acheson DJ, Hamidi M, Binder JR, et al. : A common neural substrate for language production and verbal working memory. J Cogn Neurosci. 2011;23(6):1358–1367. 10.1162/jocn.2010.21519 - DOI - PMC - PubMed
    1. Ahveninen J, Jääskeläinen IP, Raij T, et al. : Task-modulated “what” and “where” pathways in human auditory cortex. Proc Natl Acad Sci U S A. 2006;103(39):14608–14613. 10.1073/pnas.0510480103 - DOI - PMC - PubMed
    1. Aitken PG: Cortical control of conditioned and spontaneous vocal behavior in rhesus monkeys. Brain Lang. 1981;13(1):171–184. 10.1016/0093-934X(81)90137-1 - DOI - PubMed
    1. Alain C, Arnott SR, Hevenor S, et al. : “What” and “where” in the human auditory system. Proc Natl Acad Sci U S A. 2001;98(21):12301–12306. 10.1073/pnas.211209098 - DOI - PMC - PubMed

Grant support

The author(s) declared that no grants were involved in supporting this work.

LinkOut - more resources