Mutation severity spectrum of rare alleles in the human genome is predictive of disease type

PLoS Comput Biol. 2020 May 15;16(5):e1007775. doi: 10.1371/journal.pcbi.1007775. eCollection 2020 May.

Abstract

The human genome harbors a variety of genetic variations. Single-nucleotide changes that alter amino acids in protein-coding regions are one of the major causes of human phenotypic variation and diseases. These single-amino acid variations (SAVs) are routinely found in whole genome and exome sequencing. Evaluating the functional impact of such genomic alterations is crucial for diagnosis of genetic disorders. We developed DeepSAV, a deep-learning convolutional neural network to differentiate disease-causing and benign SAVs based on a variety of protein sequence, structural and functional properties. Our method outperforms most stand-alone programs, and the version incorporating population and gene-level information (DeepSAV+PG) has similar predictive power as some of the best available. We transformed DeepSAV scores of rare SAVs in the human population into a quantity termed "mutation severity measure" for each human protein-coding gene. It reflects a gene's tolerance to deleterious missense mutations and serves as a useful tool to study gene-disease associations. Genes implicated in cancer, autism, and viral interaction are found by this measure as intolerant to mutations, while genes associated with a number of other diseases are scored as tolerant. Among known disease-associated genes, those that are mutation-intolerant are likely to function in development and signal transduction pathways, while those that are mutation-tolerant tend to encode metabolic and mitochondrial proteins.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Amino Acid Sequence / genetics
  • Computational Biology / methods
  • Deep Learning
  • Disease / genetics*
  • Exome Sequencing / methods
  • Forecasting / methods*
  • Gene Regulatory Networks / genetics
  • Genome, Human / genetics*
  • Humans
  • Mutation / genetics
  • Mutation, Missense / genetics
  • Nerve Net
  • Open Reading Frames / genetics
  • Sequence Analysis / methods