The Fourier phase spectrum plays a central role regarding where in an image contours occur, thereby defining the spatial relationship between those structures in the overall scene. Only a handful of studies have demonstrated psychophysically the relevance of the Fourier phase spectrum with respect to human visual processing, and none have demonstrated the relative amount of local cross-scale spatial phase alignment needed to perceptually extract meaningful structure from an image. We investigated the relative amount of spatial phase alignment needed for humans to perceptually match natural scene image structures at three different spatial frequencies [3, 6, and 12 cycles per degree (cpd)] as a function of the number of structures within the image (i.e., "structural sparseness"). The results showed that (1) the amount of spatial phase alignment needed to match structures depends on structural sparseness, with a bias for matching structures at 6 cpd and (2) the ability to match partially phase-randomized images at a given spatial frequency is independent of structural sparseness at other spatial frequencies. The findings of the current study are discussed in terms of a network of feature integrators in the human visual system.