We have proposed that the left inferotemporal (IT) region contains structures that mediate between conceptual knowledge retrieval and word-form retrieval, and we have hypothesized that these structures are utilized for word retrieval irrespective of the sensory modality through which an entity is apprehended, thus being "modality neutral." We tested this idea in two sensory modalities, visual and auditory, and for two categories of concrete entities, tools and animals. In a PET experiment, 10 normal participants named tools and animals either from pictures or from characteristic sounds (e.g., "scissors" from a picture of a scissors or from the sound of a scissors cutting; "rooster" from a picture of a rooster or from the sound of a rooster crowing). Visual and auditory naming of tools activated the left posterior/lateral IT; visual and auditory naming of animals activated the left anterior/ventral IT. For both tools and animals, the left IT activations were similar in location and magnitude regardless of whether participants were naming entities from pictures or from sounds. The results provide novel evidence to support the notion that left IT structures contain "modality-neutral" systems for mediating between conceptual knowledge and word retrieval.