High-throughput sequencing data generation demands the development of methods for interpreting the effects of genomic variants. Numerous computational methods have been developed to assess the impact of variations because experimental methods are unable to cope with both the speed and volume of data generation. To harness the strength of currently available predictors, the Pathogenic-or-Not-Pipeline (PON-P) integrates five predictors to predict the probability that nonsynonymous variations affect protein function and may consequently be disease related. Random forest methodology-based PON-P shows consistently improved performance in cross-validation tests and on independent test sets, providing ternary classification and statistical reliability estimate of results. Applied to missense variants in a melanoma cancer cell line, PON-P predicts variants in 17 genes to affect protein function. Previous studies implicate nine of these genes in the pathogenesis of various forms of cancer. PON-P may thus be used as a first step in screening and prioritizing variants to determine deleterious ones for further experimentation.
© 2012 Wiley Periodicals, Inc.