Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Jun 10;100(12):7213-8.
doi: 10.1073/pnas.1231432100. Epub 2003 May 29.

A Postgenomic Method for Predicting Essential Genes at Subsaturation Levels of Mutagenesis: Application to Mycobacterium Tuberculosis

Affiliations
Free PMC article

A Postgenomic Method for Predicting Essential Genes at Subsaturation Levels of Mutagenesis: Application to Mycobacterium Tuberculosis

Gyanu Lamichhane et al. Proc Natl Acad Sci U S A. .
Free PMC article

Abstract

We describe a postgenomic in silico approach for identifying genes that are likely to be essential and estimate their proportion in haploid genomes. With the knowledge of all sites eligible for mutagenesis and an experimentally determined partial list of nonessential genes from genome mutagenesis, a Bayesian statistical method provides reasonable predictions of essential genes with a subsaturation level of random mutagenesis. For mutagenesis, a transposon such as Himar1 is suitable as it inserts randomly into TA sites. All of the possible insertion sites may be determined a priori from the genome sequence and with this information, data on experimentally hit TA sites may be used to predict the proportion of genes that cannot be mutated. As a model, we used the Mycobacterium tuberculosis genome. Using the Himar1 transposon, we created a genetically defined collection of 1,425 insertion mutants. Based on our Bayesian statistical analysis using Markov chain Monte Carlo and the observed frequencies of transposon insertions in all of the genes, we estimated that the M. tuberculosis genome contains 35% (95% confidence interval, 28%-41%) essential genes. This analysis further revealed seven functional groups with high probabilities of being enriched in essential genes. The PE-PGRS (Pro-Glu polymorphic GC-rich repetitive sequence) family of genes, which are unique to mycobacteria, the polyketide/nonribosomal peptide synthase family, and mycolic and fatty acid biosynthesis gene families were disproportionately enriched in essential genes. At subsaturation levels of mutagenesis with a random transposon such as Himar1, this approach permits a statistical prediction of both the proportion and identities of essential genes of sequenced genomes.

Figures

Fig. 1.
Fig. 1.
(A) Distribution of transposon insertions within the ORFs of M. tuberculosis by percentage of each ORF's total length. The MT numbers for each of the 1,183 genes containing a transposon insertion is plotted versus the percent distance from the 5′ end of the ORF. Insertions in the TA sites comprising stop codons have been excluded. The insertions are uniformly distributed, suggesting intragenic Himar1 insertion is random. (B) Distribution of transposon insertions in the 4.4-Mb circular chromosome of M. tuberculosis. The origin is marked as 0. The line segments around the circle indicate the locations of the 1,183 intragenic insertion mutants: 1,161 distinct locations and 22 double hits (indicated by longer line segments).
Fig. 2.
Fig. 2.
Posterior probability that an M. tuberculosis gene is essential as a function of the number of TA sites determined by the 5′80%–3′100-bp rule. The 770 disrupted genes have a probability of zero to be essential and are jittered vertically so that the points may be distinguished. The vertical scatter in the remaining points is largely caused by Markov chain Monte Carlo sampling error. The identities of 15 genes whose probabilities of being essential were ≥75% are shown in Table 3.
Fig. 3.
Fig. 3.
The minimum number of TA sites an M. tuberculosis gene must contain to have >75% (□) or 90% (○) probability to be essential, if no mutant is observed with insertion in the gene, as a function of the total number of intragenic mutants observed.

Similar articles

See all similar articles

Cited by 139 articles

See all "Cited by" articles

Publication types

LinkOut - more resources

Feedback