Many amino acid substitution variants identified in DNA repair genes during human population screenings are predicted to impact protein function

Genomics. 2004 Jun;83(6):970-9. doi: 10.1016/j.ygeno.2003.12.016.

Abstract

Over 520 different amino acid substitution variants have been previously identified in the systematic screening of 91 human DNA repair genes for sequence variation. Two algorithms were employed to predict the impact of these amino acid substitutions on protein activity. Sorting Intolerant from Tolerant (SIFT) classified 226 of 508 variants (44%) as "Intolerant." Polymorphism Phenotyping (PolyPhen) classed 165 of 489 amino acid substitutions (34%) as "Probably or possibly damaging." Another 9-15% of the variants were classed as "Potentially intolerant or damaging." The results from the two algorithms are highly associated, with concordance in predicted impact observed for approximately 62% of the variants. Twenty-one to thirty-one percent of the variant proteins are predicted to exhibit reduced activity by both algorithms. These variants occur at slightly lower individual allele frequency than do the variants classified as "Tolerant" or "Benign." Both algorithms correctly predicted the impact of 26 functionally characterized amino acid substitutions in the APE1 protein on biochemical activity, with one exception. It is concluded that a substantial fraction of the missense variants observed in the general human population are functionally relevant. These variants are expected to be the molecular genetic and biochemical basis for the associations of reduced DNA repair capacity phenotypes with elevated cancer risk.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Substitution / genetics*
  • Computational Biology / methods
  • DNA Repair / genetics*
  • Gene Frequency / genetics
  • Genetic Testing*
  • Genetics, Population
  • Humans
  • Nuclear Proteins / genetics*
  • Nuclear Proteins / physiology
  • Polymorphism, Genetic
  • Sequence Analysis, Protein / methods*
  • Software

Substances

  • Nuclear Proteins