Monkeys can easily form lasting central representations of visual and tactile stimuli, yet they seem unable to do the same with sounds. Humans, by contrast, are highly proficient in auditory long-term memory (LTM). These mnemonic differences within and between species raise the question of whether the human ability is supported in some way by speech and language, e.g., through subvocal reproduction of speech sounds and by covert verbal labeling of environmental stimuli. If so, the explanation could be that storing rapidly fluctuating acoustic signals requires assistance from the motor system, which is uniquely organized to chain-link rapid sequences. To test this hypothesis, we compared the ability of normal participants to recognize lists of stimuli that can be easily reproduced, labeled, or both (pseudowords, nonverbal sounds, and words, respectively) versus their ability to recognize a list of stimuli that can be reproduced or labeled only with great difficulty (reversed words, i.e., words played backward). Recognition scores after 5-min delays filled with articulatory-suppression tasks were relatively high (75-80% correct) for all sound types except reversed words; the latter yielded scores that were not far above chance (58% correct), even though these stimuli were discriminated nearly perfectly when presented as reversed-word pairs at short intrapair intervals. The combined results provide preliminary support for the hypothesis that participation of the oromotor system may be essential for laying down the memory of speech sounds and, indeed, that speech and auditory memory may be so critically dependent on each other that they had to coevolve.