Gene prioritization through genomic data fusion

Nat Biotechnol. 2006 May;24(5):537-44. doi: 10.1038/nbt1203.


The identification of genes involved in health and disease remains a challenge. We describe a bioinformatics approach, together with a freely accessible, interactive and flexible software termed Endeavour, to prioritize candidate genes underlying biological processes or diseases, based on their similarity to known genes involved in these phenomena. Unlike previous approaches, ours generates distinct prioritizations for multiple heterogeneous data sources, which are then integrated, or fused, into a global ranking using order statistics. In addition, it offers the flexibility of including additional data sources. Validation of our approach revealed it was able to efficiently prioritize 627 genes in disease data sets and 76 genes in biological pathway sets, identify candidates of 16 mono- or polygenic diseases, and discover regulatory genes of myeloid differentiation. Furthermore, the approach identified a novel gene involved in craniofacial development from a 2-Mb chromosomal region, deleted in some patients with DiGeorge-like birth defects. The approach described here offers an alternative integrative method for gene discovery.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Cell Differentiation
  • Chromosome Mapping
  • Computational Biology / methods*
  • Gene Expression Regulation*
  • Genetic Predisposition to Disease*
  • Humans
  • Models, Genetic
  • Models, Statistical
  • ROC Curve
  • Sensitivity and Specificity
  • Software
  • Zebrafish