Evaluation of performance of leading algorithms for variant pathogenicity predictions and designing a combinatory predictor method: application to Rett syndrome variants

PeerJ. 2019 Nov 27:7:e8106. doi: 10.7717/peerj.8106. eCollection 2019.


Background: Genomics diagnostic tests are done for a wide spectrum of complex genetics conditions such as autism and cancer. The growth of technology has not only aided in successfully decoding the genetic variants that causes or trigger these disorders. However, interpretation of these variants is not a trivial task even at a level of distinguish pathogenic vs benign variants.

Methods: We used the clinically significant variants from ClinVar database to evaluate the performance of 14 most popular in-silico predictors using supervised learning methods. We implemented a feature selection and random forest classification algorithm to identify the best combination of predictors to evaluate the pathogenicity of a variant. Finally, we have also utilized this combination of predictors to reclassify the variants of unknown significance in MeCP2 gene that are associated with the Rett syndrome.

Results: The results from analysis shows an optimized selection of prediction algorithm and developed a combinatory predictor method. Our combinatory approach of using both best performing independent and ensemble predictors reduces any algorithm biases in variant characterization. The reclassification of variants (such as VUS) in MECP2 gene associated with RETT syndrome suggest that the combinatory in-silico predictor approach had a higher success rate in categorizing their pathogenicity.

Keywords: DNA variants; Disease-causing mutations; Machine learning; Pathogenic mutations classification; Rett syndrome; Variant pathogenicity predictors.