Face and voice processing contribute to person recognition, but it remains unclear how the segregated specialized cortical modules interact. Using functional neuroimaging, we observed cross-modal responses to voices of familiar persons in the fusiform face area, as localized separately using visual stimuli. Voices of familiar persons only activated the face area during a task that emphasized speaker recognition over recognition of verbal content. Analyses of functional connectivity between cortical territories show that the fusiform face region is coupled with the superior temporal sulcus voice region during familiar speaker recognition, but not with any of the other cortical regions normally active in person recognition or in other tasks involving voices. These findings are relevant for models of the cognitive processes and neural circuitry involved in speaker recognition. They reveal that in the context of speaker recognition, the assessment of person familiarity does not necessarily engage supramodal cortical substrates but can result from the direct sharing of information between auditory voice and visual face regions.