We previously showed that cross-modal recognition of unfamiliar objects is view-independent, in contrast to view-dependence within-modally, in both vision and haptics. Does the view-independent, bisensory representation underlying cross-modal recognition arise from integration of unisensory, view-dependent representations or intermediate, unisensory but view-independent representations? Two psychophysical experiments sought to distinguish between these alternative models. In both experiments, participants began from baseline, within-modal, view-dependence for object recognition in both vision and haptics. The first experiment induced within-modal view-independence by perceptual learning, which was completely and symmetrically transferred cross-modally: visual view-independence acquired through visual learning also resulted in haptic view-independence and vice versa. In the second experiment, both visual and haptic view-dependence were transformed to view-independence by either haptic-visual or visual-haptic cross-modal learning. We conclude that cross-modal view-independence fits with a model in which unisensory view-dependent representations are directly integrated into a bisensory, view-independent representation, rather than via intermediate, unisensory, view-independent representations.