Percussionists inadvertently use visual information to strategically manipulate audience perception of note duration. Videos of long (L) and short (S) notes performed by a world-renowned percussionist were separated into visual (Lv, Sv) and auditory (La, Sa) components. Visual components contained only the gesture used to perform the note, auditory components the acoustic note itself. Audio and visual components were then crossed to create realistic musical stimuli. Participants were informed of the mismatch, and asked to rate note duration of these audio-visual pairs based on sound alone. Ratings varied based on visual (Lv versus Sv), but not auditory (La versus Sa) components. Therefore while longer gestures do not make longer notes, longer gestures make longer sounding notes through the integration of sensory information. This finding contradicts previous research showing that audition dominates temporal tasks such as duration judgment.