The whole genome sequence (1.83 Mbp) of Haemophilus influenzae strain Rd was searched to identify tandem oligonucleotide repeat sequences. Loss or gain of one or more nucleotide repeats through a recombination-independent slippage mechanism is known to mediate phase variation of surface molecules of pathogenic bacteria, including H. influenzae. This facilitates evasion of host defenses and adaptation to the varying microenvironments of the host. We reasoned that iterative nucleotides could identify novel genes relevant to microbe-host interactions. Our search of the Rd genome sequence identified 9 novel loci with multiple (range 6-36, mean 22) tandem tetranucleotide repeats. All were found to be located within putative open reading frames and included homologues of hemoglobin-binding proteins of Neisseria, a glycosyltransferase (lgtC gene product) of Neisseria, and an adhesin of Yersinia. These tetranucleotide repeat sequences were also shown to be present in two other epidemiologically different H. influenzae type b strains, although the number and distribution of repeats was different. Further characterization of the lgtC gene showed that it was involved in phenotypic switching of a lipopolysaccharide epitope and that this variable expression was associated with changes in the number of tetranucleotide repeats. Mutation of lgtC resulted in attenuated virulence of H. influenzae in an infant rat model of invasive infection. These data indicate the rapidity, economy, and completeness with which whole genome sequences can be used to investigate the biology of pathogenic bacteria.