Inheritance-mode specific pathogenicity prioritization (ISPP) for human protein coding genes

Bioinformatics. 2016 Oct 15;32(20):3065-3071. doi: 10.1093/bioinformatics/btw381. Epub 2016 Jun 26.

Abstract

Motivation: Exome sequencing studies have facilitated the detection of causal genetic variants in yet-unsolved Mendelian diseases. However, the identification of disease causal genes among a list of candidates in an exome sequencing study is still not fully settled, and it is often difficult to prioritize candidate genes for follow-up studies. The inheritance mode provides crucial information for understanding Mendelian diseases, but none of the existing gene prioritization tools fully utilize this information.

Results: We examined the characteristics of Mendelian disease genes under different inheritance modes. The results suggest that Mendelian disease genes with autosomal dominant (AD) inheritance mode are more haploinsufficiency and de novo mutation sensitive, whereas those autosomal recessive (AR) genes have significantly more non-synonymous variants and regulatory transcript isoforms. In addition, the X-linked (XL) Mendelian disease genes have fewer non-synonymous and synonymous variants. As a result, we derived a new scoring system for prioritizing candidate genes for Mendelian diseases according to the inheritance mode. Our scoring system assigned to each annotated protein-coding gene (N = 18 859) three pathogenic scores according to the inheritance mode (AD, AR and XL). This inheritance mode-specific framework achieved higher accuracy (area under curve = 0.84) in XL mode.

Conclusion: The inheritance-mode specific pathogenicity prioritization (ISPP) outperformed other well-known methods including Haploinsufficiency, Recessive, Network centrality, Genic Intolerance, Gene Damage Index and Gene Constraint scores. This systematic study suggests that genes manifesting disease inheritance modes tend to have unique characteristics.

Availability and implementation: ISPP is included in KGGSeq v1.0 (http://grass.cgs.hku.hk/limx/kggseq/), and source code is available from (https://github.com/jacobhsu35/ISPP.git).

Contact: mxli@hku.hkSupplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Area Under Curve
  • Databases, Genetic
  • Genes, Dominant*
  • Genes, Recessive*
  • Genetic Variation
  • Humans
  • Mutation*
  • Proteins / genetics*

Substances

  • Proteins