Nine experiments examined the means by which visual memory for individual objects is structured into a larger representation of a scene. Participants viewed images of natural scenes or object arrays in a change detection task requiring memory for the visual form of a single target object. In the test image, 2 properties of the stimulus were independently manipulated: the position of the target object and the spatial properties of the larger scene or array context. Memory performance was higher when the target object position remained the same from study to test. This same-position advantage was reduced or eliminated following contextual changes that disrupted the relative spatial relationships among contextual objects (context deletion, scrambling, and binding change) but was preserved following contextual change that did not disrupt relative spatial relationships (translation). Thus, episodic scene representations are formed through the binding of objects to scene locations, and object position is defined relative to a larger spatial representation coding the relative locations of contextual objects.