This paper describes the effect of spectro-temporal factors on human sound localization performance in two dimensions (2D). Subjects responded with saccadic eye movements to acoustic stimuli presented in the frontal hemisphere. Both the horizontal (azimuth) and vertical (elevation) stimulus location were varied randomly. Three types of stimuli were used, having different spectro-temporal patterns, but identically shaped broadband averaged power spectra: noise bursts, frequency-modulated tones, and trains of short noise bursts. In all subjects, the elevation components of the saccadic responses varied systematically with the different temporal parameters, whereas the azimuth response components remained equally accurate for all stimulus conditions. The data show that the auditory system does not calculate a final elevation estimate from a long-term (order 100 ms) integration of sensory input. Instead, the results suggest that the auditory system may apply a "multiple-look" strategy in which the final estimate is calculated from consecutive short-term (order few ms) estimates. These findings are incorporated in a conceptual model that accounts for the data and proposes a scheme for the temporal processing of spectral sensory information into a dynamic estimate of sound elevation.