The integration of information across different sensory modalities is known to be dependent upon the statistical characteristics of the stimuli to be combined. For example, the spatial and temporal proximity of stimuli are important determinants with stimuli that are close in space and time being more likely to be bound. These multisensory interactions occur not only for singular points in space/time, but over "windows" of space and time that likely relate to the ecological statistics of real-world stimuli. Relatedly, human psychophysical work has demonstrated that individuals are highly prone to judge multisensory stimuli as co-occurring over a wide range of time--a so-called simultaneity window (SW). Similarly, there exists a spatial representation of peripersonal space (PPS) surrounding the body in which stimuli related to the body and to external events occurring near the body are highly likely to be jointly processed. In the current study, we sought to examine the interaction between these temporal and spatial dimensions of multisensory representation by measuring the SW for audiovisual stimuli through proximal-distal space (i.e., PPS and extrapersonal space). Results demonstrate that the audiovisual SWs within PPS are larger than outside PPS. In addition, we suggest that this effect is likely due to an automatic and additional computation of these multisensory events in a body-centered reference frame. We discuss the current findings in terms of the spatiotemporal constraints of multisensory interactions and the implication of distinct reference frames on this process.