Shared acoustic cues in speech, music, and nonverbal emotional expressions were postulated to code for emotion quality and intensity favoring the hypothesis of a prehuman origin of affective prosody in human emotional communication. To explore this hypothesis, we examined in playback experiments using a habituation-dishabituation paradigm whether a solitary foraging, highly vocal mammal, the tree shrew, is able to discriminate two behaviorally defined states of affect intensity (low vs. high) from the voice of conspecifics. Playback experiments with communication calls of two different types (chatter call and scream call) given in the state of low affect intensity revealed that habituated tree shrews dishabituated to one call type (the chatter call) and showed a tendency to do so for the other one (the scream call), both given in the state of high affect intensity. Findings suggest that listeners perceive the acoustic variation linked to defined states of affect intensity as different within the same call type. Our findings in tree shrews provide first evidence that acoustically conveyed affect intensity is biologically relevant without any other sensory cue, even for solitary foragers. Thus, the perception of affect intensity in voice conveyed in stressful contexts represents a shared trait of mammals, independent of the complexity of social systems. Findings support the hypothesis that affective prosody in human emotional communication has deep-reaching phylogenetic roots, deriving from precursors already present and relevant in the vocal communication system of early mammals.