Repetition can boost memory and perception. However, repeating the same stimulus several times in immediate succession also induces intriguing perceptual transformations and illusions. Here, we investigate the Speech to Song Transformation (S2ST), a massed repetition effect in the auditory modality, which crosses the boundaries between language and music. In the S2ST, a phrase repeated several times shifts to being heard as sung. To better understand this unique cross-domain transformation, we examined the perceptual determinants of the S2ST, in particular the role of acoustics. In 2 Experiments, the effects of 2 pitch properties and 3 rhythmic properties on the probability and speed of occurrence of the transformation were examined. Results showed that both pitch and rhythmic properties are key features fostering the transformation. However, some properties proved to be more conducive to the S2ST than others. Stable tonal targets that allowed for the perception of a musical melody led more often and quickly to the S2ST than scalar intervals. Recurring durational contrasts arising from segmental grouping favoring a metrical interpretation of the stimulus also facilitated the S2ST. This was, however, not the case for a regular beat structure within and across repetitions. In addition, individual perceptual abilities allowed to predict the likelihood of the S2ST. Overall, the study demonstrated that repetition enables listeners to reinterpret specific prosodic features of spoken utterances in terms of musical structures. The findings underline a tight link between language and music, but they also reveal important differences in communicative functions of prosodic structure in the 2 domains.