Background: The set of indispensable genes that are required by an organism to grow and sustain life are termed as essential genes. There is a strong interest in identification of the set of essential genes, particularly in pathogens, not only for a better understanding of the pathogen biology, but also for identifying drug targets and the minimal gene set for the organism. Essentiality is inherently a systems property and requires consideration of the system as a whole for their identification. The available experimental approaches capture some aspects but each method comes with its own limitations. Moreover, they do not explain the basis for essentiality in most cases. A powerful prediction method to recognize this gene pool including rationalization of the known essential genes in a given organism would be very useful. Here we describe a multi-level multi-scale approach to identify the essential gene pool in a deadly pathogen, Mycobacterium tuberculosis.
Results: The multi-level workflow analyses the bacterial cell by studying (a) genome-wide gene expression profiles to identify the set of genes which show consistent and significant levels of expression in multiple samples of the same condition, (b) indispensability for growth by using gene expression integrated flux balance analysis of a genome-scale metabolic model, (c) importance for maintaining the integrity and flow in a protein-protein interaction network and (d) evolutionary conservation in a set of genomes of the same ecological niche. In the gene pool identified, the functional basis for essentiality has been addressed by studying residue level conservation and the sub-structure at the ligand binding pockets, from which essential amino acid residues in that pocket have also been identified. 283 genes were identified as essential genes with high-confidence. An agreement of about 73.5% is observed with that obtained from the experimental transposon mutagenesis technique. A large proportion of the identified genes belong to the class of intermediary metabolism and respiration.
Conclusions: The multi-scale, multi-level approach described can be generally applied to other pathogens as well. The essential gene pool identified form a basis for designing experiments to probe their finer functional roles and also serve as a ready shortlist for identifying drug targets.