Novel dynamic random-do displays representing a rotating cylinder or a noise-field were used to investigate the perception of structure from motion (SFM) in humans. The finite lifetimes of the points allowed the study of spatiotemporal characteristics with smoothly moving stimuli. In one set of experiments subjects had to detect the change from the unstructured motion to the appearance of the cylinder in a reaction time task. In another set of experiments subjects had to distinguish these two stimuli in a two-alternative forced-choice task. The two major findings were: (1) a relatively constant point lifetime threshold (50-85 msec) for perceiving structure from motion. This threshold is similar to the threshold for estimating velocity and suggests that velocity measurements are used to process SFM; (2) long reaction times for detecting structure (approximately 1 sec). The build-up of performance with time and with increasing numbers of points reflects a process of temporal and spatial integration. We propose that this integration is achieved through the generation of a surface representation of the object. Information from single features on the object appears to be used to interpolate a surface between these local measurements allowing the system to improve perception over extended periods of time even though each feature is present only briefly. Selective masking of the stimulus produced characteristic impairments which suggest that both velocity measurements and surface interpolation are global processes.