Despite decades of cognitive, neuropsychological and neuroimaging studies, it is unclear if letters are identified before word-form encoding during reading, or if letters and their combinations are encoded simultaneously and interactively. Here using functional magnetic resonance imaging, we show that a 'letter-form' area (responding more to consonant strings than false fonts) can be distinguished from an immediately anterior 'visual word-form area' in ventral occipito-temporal cortex (responding more to words than consonant strings). Letter-selective magnetoencephalographic responses begin in the letter-form area ∼60 ms earlier than word-selective responses in the word-form area. Local field potentials confirm the latency and location of letter-selective responses. This area shows increased high-gamma power for ∼400 ms, and strong phase-locking with more anterior areas supporting lexico-semantic processing. These findings suggest that during reading, visual stimuli are first encoded as letters before their combinations are encoded as words. Activity then rapidly spreads anteriorly, and the entire network is engaged in sustained integrative processing.