Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 28;20(1):25.
doi: 10.1186/1423-0127-20-25.

CoDP: Predicting the Impact of Unclassified Genetic Variants in MSH6 by the Combination of Different Properties of the Protein

Affiliations
Free PMC article

CoDP: Predicting the Impact of Unclassified Genetic Variants in MSH6 by the Combination of Different Properties of the Protein

Hiroko Terui et al. J Biomed Sci. .
Free PMC article

Abstract

Background: Lynch syndrome is a hereditary cancer predisposition syndrome caused by a mutation in one of the DNA mismatch repair (MMR) genes. About 24% of the mutations identified in Lynch syndrome are missense substitutions and the frequency of missense variants in MSH6 is the highest amongst these MMR genes. Because of this high frequency, the genetic testing was not effectively used in MSH6 so far. We, therefore, developed CoDP (Combination of the Different Properties), a bioinformatics tool to predict the impact of missense variants in MSH6.

Methods: We integrated the prediction results of three methods, namely MAPP, PolyPhen-2 and SIFT. Two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. MSH6 germline missense variants classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The performance of CoDP was compared with those of other conventional tools, namely MAPP, SIFT, PolyPhen-2 and PON-MMR.

Results: A total of 294 germline missense variants were collected from the variant databases and literature. Of them, 34 variants were available for the parameter training and the prediction performance test. We integrated the prediction results of MAPP, PolyPhen-2 and SIFT, and two other structural properties, namely solvent accessibility and the change in the number of heavy atoms of amino acids in the MSH6 protein, were further combined explicitly. Variants data classified by their associated clinical and molecular data were used to fit the parameters for the logistic regression model and to assess the prediction. The values of the positive predictive value (PPV), the negative predictive value (NPV), sensitivity, specificity and accuracy of the tools were compared on the whole data set. PPV of CoDP was 93.3% (14/15), NPV was 94.7% (18/19), specificity was 94.7% (18/19), sensitivity was 93.3% (14/15) and accuracy was 94.1% (32/34). Area under the curve of CoDP was 0.954, that of MAPP for MSH6 was 0.919, of SIFT was 0.864 and of PolyPhen-2 HumVar was 0.819. The power to distinguish between pathogenic and non-pathogenic variants of these methods was tested by Wilcoxon rank sum test (p < 8.9 × 10(-6) for CoDP, p < 3.3 × 10(-5) for MAPP, p < 3.1 × 10(-4) for SIFT and p < 1.2 × 10(-3) for PolyPhen-2 HumVar), and CoDP was shown to outperform other conventional methods.

Conclusion: In this paper, we provide a human curated data set for MSH6 missense variants, and CoDP, the prediction tool, which achieved better accuracy for predicting the impact of missense variants in MSH6 than any other known tools. CoDP is available at http://cib.cf.ocha.ac.jp/CoDP/.

Figures

Figure 1
Figure 1
Domain organization of human MSH6 and the additional sequence set used for optimizing MAPP parameters for MSH6. MSH6 protein is depicted by box diagram. A box indicates a distinct domain structure and a line connecting the boxes indicates an inter-domain sequences. The range of the domain is shown above or beneath the box. “−” denotes non-vertebrate sequences in the secondary sequence set added to the initial set. For the detail, see Optimization of MAPP for MSH6 section in Results and Discussion.
Figure 2
Figure 2
The number of changes in heavy atoms between the original and the substituted amino acid. For instance, in change 0–1 (no or one change in the number of heavy atoms by substitution), the cases of ULS are more frequent than those of LLS. An I-form line on each bar denotes a standard deviation obtained by the bootstrap method with 1,000 resampling. The distributions do not overlap in the number of changes 0–1 and 2–3.
Figure 3
Figure 3
Box and whisker plots for distributions of prediction scores of in silico tools in LLS and ULS variants. The top and the bottom of the box are the 75th and 25th percentile, respectively, and the black line in the box is the median. × denotes an outlier. The distributions of LLS and ULS in CoDP (a) are better separated than those of MAPP for MSH6 (b), SIFT (c) and PolyPhen-2 (d).

Similar articles

See all similar articles

Cited by 5 articles

References

    1. Lynch HT, De la Chapelle A. Hereditary colorectal cancer. N Engl J Med. 2003;348:919–932. doi: 10.1056/NEJMra012242. - DOI - PubMed
    1. Aaltonen LA, Salovaara R, Kristo P, Canzian F, Hemminki A, Peltomäki P, Chadwick RB, Kääriäinen H, Eskelinen M, Järvinen H, Mecklin JP, De la Chapelle A. Incidence of hereditary nonpolyposis colorectal cancer and the feasibility of molecular screening for the disease. N Engl J Med. 1998;338:1481–1487. doi: 10.1056/NEJM199805213382101. - DOI - PubMed
    1. Hampel H, Frankel WL, Martin E, Arnold M, Khanduja K, Kuebler P, Clendenning M, Sotamaa K, Prior T, Westman JA, Panescu J, Fix D, Lockman J, LaJeunesse J, Comeras I, De la Chapelle A. Feasibility of screening for Lynch syndrome among patients with colorectal cancer. J Clin Oncol. 2008;26:5783–5788. doi: 10.1200/JCO.2008.17.5950. - DOI - PMC - PubMed
    1. Grover S, Syngal S. Genetic testing in gastroenterology: Lynch syndrome. Best Pract Res Clin Gastroenterol. 2009;23:185–196. doi: 10.1016/j.bpg.2009.02.006. - DOI - PubMed
    1. Lynch HT, De la Chapelle A. Genetic susceptibility to non-polyposis colorectal cancer. J Med Genet. 1999;36:801–818. - PMC - PubMed

Publication types

Substances

Feedback