Natural events present multiple types of sensory cues, each detected by a specialized sensory modality. Combining information from several modalities is essential for the selection of appropriate actions. Key to understanding multimodal computations is determining the structural patterns of multimodal convergence and how these patterns contribute to behaviour. Modalities could converge early, late or at multiple levels in the sensory processing hierarchy. Here we show that combining mechanosensory and nociceptive cues synergistically enhances the selection of the fastest mode of escape locomotion in Drosophila larvae. In an electron microscopy volume that spans the entire insect nervous system, we reconstructed the multisensory circuit supporting the synergy, spanning multiple levels of the sensory processing hierarchy. The wiring diagram revealed a complex multilevel multimodal convergence architecture. Using behavioural and physiological studies, we identified functionally connected circuit nodes that trigger the fastest locomotor mode, and others that facilitate it, and we provide evidence that multiple levels of multimodal integration contribute to escape mode selection. We propose that the multilevel multimodal convergence architecture may be a general feature of multisensory circuits enabling complex input-output functions and selective tuning to ecologically relevant combinations of cues.