There is the question of whether learning differs when students speak versus type their responses when interacting with intelligent tutoring systems with natural language dialogues. Theoretical bases exist for three contrasting hypotheses. The speech facilitation hypothesis predicts that spoken input will increase learning, whereas the text facilitation hypothesis predicts typed input will be superior. The modality equivalence hypothesis claims that learning gains will be equivalent. Previous experiments that tested these hypotheses were confounded by automated speech recognition systems with substantial error rates that were detected by learners. We addressed this concern in two experiments via a Wizard of Oz procedure, where a human intercepted the learner's speech and transcribed the utterances before submitting them to the tutor. The overall pattern of the results supported the following conclusions: (1) learning gains associated with spoken and typed input were on par and quantitatively higher than a no-intervention control, (2) participants' evaluations of the session were not influenced by modality, and (3) there were no modality effects associated with differences in prior knowledge and typing proficiency. Although the results generally support the modality equivalence hypothesis, highly motivated learners reported lower cognitive load and demonstrated increased learning when typing compared with speaking. We discuss the implications of our findings for intelligent tutoring systems that can support typed and spoken input.