We tested whether squirrel monkeys have cross-modal representations of their human caretakers with a 0-delay symbolic matching-to-sample procedure. We first trained the monkeys to match photographs of two of their caretakers. After reaching criterion, they were exposed to two test sessions. In these sessions 32 all-reinforced test trials were interspersed among the training trials. In the test trials, a voice, either matching (congruent condition) or mismatching (incongruent condition) with the sample photographs, was played back after the sample stimulus disappeared. The monkeys' matching accuracies in the incongruent condition were lower than in the match condition. Post hoc analyses revealed that the presentation of the primary caretaker's voice interfered with performance in test trials where the secondary caretaker's face was presented (incongruent condition). This suggests that our subjects recalled their primary caretaker's representation upon hearing the appropriate voice.