Speech contains temporal structure that the brain must analyze to enable linguistic processing. To investigate the neural basis of this analysis, we used sound quilts, stimuli constructed by shuffling segments of a natural sound, approximately preserving its properties on short timescales while disrupting them on longer scales. We generated quilts from foreign speech to eliminate language cues and manipulated the extent of natural acoustic structure by varying the segment length. Using functional magnetic resonance imaging, we identified bilateral regions of the superior temporal sulcus (STS) whose responses varied with segment length. This effect was absent in primary auditory cortex and did not occur for quilts made from other natural sounds or acoustically matched synthetic sounds, suggesting tuning to speech-specific spectrotemporal structure. When examined parametrically, the STS response increased with segment length up to ∼500 ms. Our results identify a locus of speech analysis in human auditory cortex that is distinct from lexical, semantic or syntactic processes.