The gold standard for measuring treatment effects is the randomized controlled trial. In patients with multiple sclerosis (MS), trial durations are typically 2-3 years, and the long-term effects of drugs for MS can only be assessed through trial extensions or observational studies that take advantage of data from registries or large single-centre databases. The main limitation of observational studies is an unavoidable selection bias that is introduced through nonrandom assignment of the intervention. Propensity score methods can mitigate this bias by balancing the groups with respect to baseline covariates, but this approach cannot correct for unmeasurable confounding factors. Extensions of clinical trials are free from selection biases because of the initial randomization, but they can only provide an assessment of early versus delayed treatment effects. Here, we discuss these methodological issues and analyse how they have been managed in studies of the long-term effects of IFN-β in patients with MS.