Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 10, 1342
eCollection

Quantifying Gene Essentiality Based on the Context of Cellular Components

Affiliations

Quantifying Gene Essentiality Based on the Context of Cellular Components

Kaiwen Jia et al. Front Genet.

Abstract

Different genes have their protein products localized in various subcellular compartments. The diversity in protein localization may serve as a gene characteristic, revealing gene essentiality from a subcellular perspective. To measure this diversity, we introduced a Subcellular Diversity Index (SDI) based on the Gene Ontology-Cellular Component Ontology (GO-CCO) and a semantic similarity measure of GO terms. Analyses revealed that SDI of human genes was well correlated with some known measures of gene essentiality, including protein-protein interaction (PPI) network topology measurements, dN/dS ratio, homologous gene number, expression level and tissue specificity. In addition, SDI had a good performance in predicting human essential genes (AUC = 0.702) and drug target genes (AUC = 0.704), and drug targets with higher SDI scores tended to cause more side-effects. The results suggest that SDI could be used to identify novel drug targets and to guide the filtering of drug targets with fewer potential side effects. Finally, we developed a user-friendly online database for querying SDI score for genes across eight species, and the predicted probabilities of human drug target based on SDI. The online database of SDI is available at: http://www.cuilab.cn/sdi.

Keywords: cellular components; drug target; gene characteristic; gene essentiality; localization diversity.

Figures

Figure 1
Figure 1
The distribution of SDI in eight species. The density plots show the distribution of SDI in thee mammals (Bandwidth = 0.9134) (A) and 5 other species (Bandwidth = 0.4149) (B). Gene number for each species is presented in the brackets. C.elegans has a larger proportion of genes with two GO-CC terms than other species which may lead to its bimodal distribution ( Table S2 ). SDI, Subcellular Diversity Index; GO-CC, Gene Ontology-Cellular Component.
Figure 2
Figure 2
The correlations between SDI and other known measures of essentiality in human genes. The scatter plots show the correlations between SDI and other measures. Genes with higher SDI tend to have higher PPI degrees (A); higher PPI betweenness (B); lower evolutionary rate measured by dN/dS ratio (C); higher homologous gene number (D); higher expression level (E) and higher tissue specificity (F). SDI, Subcellular Diversity Index; PPI, protein–protein interaction.
Figure 3
Figure 3
Validation of SDI in human essential genes. The human genes were equally divided into ten groups ranked by SDI. The number of the human essential genes (A), and the human drug targets (C) are shown in the bar graphs. The ROC curves show the results from sensitivity tests for validating the human essential genes (B) and the human drug targets (D). The sensitivity tests were performed on 11,355 genes. The AUC scores are presented in the brackets. SDI, Subcellular Diversity Index; ROC, receiver operating characteristic; AUC, Area Under Curve.
Figure 4
Figure 4
Further validation for the performance of SDI in predicting human essential genes and drug targets. The ROC curves show the results from 10-fold cross-validation of the logistic regression models in predicting human essential genes (A) and drug targets (B). All features were used in separate regression models, and a model including all available features was also provided for both essential genes and drug targets. The ROC curves were performed on 11,355 genes. The AUC scores are presented in the brackets. ROC, receiver operating characteristic; AUC, Area Under Curve.
Figure 5
Figure 5
The relationship between SDI and the potential side-effects associated with drug targets. The scatter plots show the correlation between SDI and number of side-effect terms from SIDER database (A); VigiAccess (B) of drug targets. The paired scores were smoothed through adequate window and step size. The involved drug target genes were equally divided into ten groups ranked by SDI, and calculated for average numbers of side effect terms from SIDER (C) and VigiAccess (D). SDI, Subcellular Diversity Index.
Figure 6
Figure 6
SDI based on non-IEA data was proved to be less powerful than IEA included data. The ROC curves show the performance of SDI based on non-IEA and IEA-included data in validating human essential genes and drug targets. The tests are performed in 19,341 human genes. The AUC scores are presented in the brackets. SDI, Subcellular Diversity Index; ROC, receiver operating characteristic; AUC, Area Under Curve; IEA, Inferred from Electronic Annotation.

Similar articles

See all similar articles

References

    1. Acencio M. L., Lemke N. (2009). Towards the prediction of essential genes by integration of network topology, cellular localization and biological process information. BMC Bioinf. 10, 290–290. 10.1186/1471-2105-10-290 - DOI - PMC - PubMed
    1. Aken B. L., Achuthan P., Akanni W., Amode M. R., Bernsdorff F., Bhai J., et al. (2016). Ensembl 2017. Nucleic Acids Res. 45, D635–D642. 10.1093/nar/gkw1104 - DOI - PMC - PubMed
    1. Ashburner M., Ball C. A., Blake J. A., Botstein D., Butler H., Cherry J. M., et al. (2000). Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25. 10.1038/75556 - DOI - PMC - PubMed
    1. Bartha I., di Iulio J., Venter J. C., Telenti A. (2017). Human gene essentiality. Nat. Rev. Genet. 19, 51. 10.1038/nrg.2017.75 - DOI - PubMed
    1. Chatr-aryamontri A., Oughtred R., Boucher L., Rust J., Chang C., Kolas N. K., et al. (2017). The BioGRID interaction database: 2017 update. Nucleic Acids Res. 45, D369–D379. 10.1093/nar/gkw1102 - DOI - PMC - PubMed

LinkOut - more resources

Feedback