The recognition of spoken language has typically been studied by focusing on either words or their constituent elements (for example, low-level features or phonemes). More recently, the 'temporal mesoscale' of speech has been explored, specifically regularities in the envelope of the acoustic signal that correlate with syllabic information and that play a central role in production and perception processes. The temporal structure of speech at this scale is remarkably stable across languages, with a preferred range of rhythmicity of 2- 8 Hz. Importantly, this rhythmicity is required by the processes underlying the construction of intelligible speech. A lot of current work focuses on audio-motor interactions in speech, highlighting behavioural and neural evidence that demonstrates how properties of perceptual and motor systems, and their relation, can underlie the mesoscale speech rhythms. The data invite the hypothesis that the speech motor cortex is best modelled as a neural oscillator, a conjecture that aligns well with current proposals highlighting the fundamental role of neural oscillations in perception and cognition. The findings also show motor theories (of speech) in a different light, placing new mechanistic constraints on accounts of the action-perception interface.