To understand the evolution, attenuation, and variable protective efficacy of bacillus Calmette-Guérin (BCG) vaccines, Mycobacterium bovis BCG Pasteur 1173P2 has been subjected to comparative genome and transcriptome analysis. The 4,374,522-bp genome contains 3,954 protein-coding genes, 58 of which are present in two copies as a result of two independent tandem duplications, DU1 and DU2. DU1 is restricted to BCG Pasteur, although four forms of DU2 exist; DU2-I is confined to early BCG vaccines, like BCG Japan, whereas DU2-III and DU2-IV occur in the late vaccines. The glycerol-3-phosphate dehydrogenase gene, glpD2, is one of only three genes common to all four DU2 variants, implying that BCG requires higher levels of this enzyme to grow on glycerol. Further amplification of the DU2 region is ongoing, even within vaccine preparations used to immunize humans. An evolutionary scheme for BCG vaccines was established by analyzing DU2 and other markers. Lesions in genes encoding sigma-factors and pleiotropic transcriptional regulators, like PhoR and Crp, were also uncovered in various BCG strains; together with gene amplification, these affect gene expression levels, immunogenicity, and, possibly, protection against tuberculosis. Furthermore, the combined findings suggest that early BCG vaccines may even be superior to the later ones that are more widely used.