Computational functional genomics-based reduction of disease-related gene sets to their key components

Bioinformatics. 2019 Jul 15;35(14):2362-2370. doi: 10.1093/bioinformatics/bty986.


Motivation: The genetic architecture of diseases becomes increasingly known. This raises difficulties in picking suitable targets for further research among an increasing number of candidates. Although expression based methods of gene set reduction are applied to laboratory-derived genetic data, the analysis of topical sets of genes gathered from knowledge bases requires a modified approach as no quantitative information about gene expression is available.

Results: We propose a computational functional genomics-based approach at reducing sets of genes to the most relevant items based on the importance of the gene within the polyhierarchy of biological processes characterizing the disease. Knowledge bases about the biological roles of genes can provide a valid description of traits or diseases represented as a directed acyclic graph (DAG) picturing the polyhierarchy of disease relevant biological processes. The proposed method uses a gene importance score derived from the location of the gene-related biological processes in the DAG. It attempts to recreate the DAG and thereby, the roles of the original gene set, with the least number of genes in descending order of importance. This obtained precision and recall of over 70% to recreate the components of the DAG charactering the biological functions of n=540 genes relevant to pain with a subset of only the k=29 best-scoring genes.

Conclusions: A new method for reduction of gene sets is shown that is able to reproduce the biological processes in which the full gene set is involved by over 70%; however, by using only ∼5% of the original genes.

Availability and implementation: The necessary numerical parameters for the calculation of gene importance are implemented in the R package dbtORA at

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computational Biology
  • Gene Expression
  • Genomics*
  • Knowledge Bases*
  • Software