A postgenomic method for predicting essential genes at subsaturation levels of mutagenesis: application to Mycobacterium tuberculosis

Proc Natl Acad Sci U S A. 2003 Jun 10;100(12):7213-8. doi: 10.1073/pnas.1231432100. Epub 2003 May 29.


We describe a postgenomic in silico approach for identifying genes that are likely to be essential and estimate their proportion in haploid genomes. With the knowledge of all sites eligible for mutagenesis and an experimentally determined partial list of nonessential genes from genome mutagenesis, a Bayesian statistical method provides reasonable predictions of essential genes with a subsaturation level of random mutagenesis. For mutagenesis, a transposon such as Himar1 is suitable as it inserts randomly into TA sites. All of the possible insertion sites may be determined a priori from the genome sequence and with this information, data on experimentally hit TA sites may be used to predict the proportion of genes that cannot be mutated. As a model, we used the Mycobacterium tuberculosis genome. Using the Himar1 transposon, we created a genetically defined collection of 1,425 insertion mutants. Based on our Bayesian statistical analysis using Markov chain Monte Carlo and the observed frequencies of transposon insertions in all of the genes, we estimated that the M. tuberculosis genome contains 35% (95% confidence interval, 28%-41%) essential genes. This analysis further revealed seven functional groups with high probabilities of being enriched in essential genes. The PE-PGRS (Pro-Glu polymorphic GC-rich repetitive sequence) family of genes, which are unique to mycobacteria, the polyketide/nonribosomal peptide synthase family, and mycolic and fatty acid biosynthesis gene families were disproportionately enriched in essential genes. At subsaturation levels of mutagenesis with a random transposon such as Himar1, this approach permits a statistical prediction of both the proportion and identities of essential genes of sequenced genomes.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Anti-Bacterial Agents / pharmacology
  • Base Sequence
  • DNA, Bacterial / genetics
  • Genes, Bacterial* / drug effects
  • Genetic Techniques
  • Genome, Bacterial
  • Genomics
  • Multigene Family
  • Mutagenesis, Insertional*
  • Mycobacterium tuberculosis / drug effects
  • Mycobacterium tuberculosis / genetics*
  • Open Reading Frames


  • Anti-Bacterial Agents
  • DNA, Bacterial