The usage of next-generation sequencing with biomedical/clinical purposes has fuelled the demand for tools that assess the functional impact of sequence variants. For single amino acid variants, general methods (GM), based on biophysics/evolutionary principles and trained by pooling variants from many proteins, are already available. Until now, their accuracy range (∼80%) has limited their usage in clinical applications. In parallel, a series of studies indicate that protein-specific predictors (PSP), using only information from the protein of interest, could frequently surpass the performance of GM. However, two reasons suggest that this may not always be the case: the existence of a performance threshold affecting both GM and PSP, and the effect of training data scarcity. Here, we characterize the relationship between the two approaches deriving 82 PSP and comparing them with several GM (PolyPhen-2, SIFT, PON-P2, MutationTaster2, CADD). We find a complementary relationship between PSP and GM, with no approach always outperforming the other. However, the relationship varies between two limiting situations, for example, PSP are frequently outperformed by PON-P2, the best GM; however, the opposite happens when we compare PSP and SIFT. Finally, we explore how the observed complementarity could lead to increased success rates in pathogenicity prediction.
Keywords: amino acid variants; in silico pathogenicity predictions; missense variants; molecular diagnostics; next-generation sequencing.
© 2016 WILEY PERIODICALS, INC.