The success of methods for predicting the redox state of cysteine residues from the sequence environment seemed to validate the basic assumption that this state is mainly determined locally. However, the accuracy of predictions on randomized sequences or of non-cysteine residues remained high, suggesting that these predictions rather capture global features of proteins such as subcellular localization, which depends on composition. This illustrates that even high prediction accuracy is insufficient to validate implicit assumptions about a biological phenomenon. Correctly identifying the relevant underlying biochemical reasons for the success of a method is essential to gain proper biological insights and develop more accurate and novel bioinformatics tools.
Keywords: Acc, prediction accuracies; Biological inference; Cysteine redox state; FN, false negatives; FP, false positives; PDB, Protein Data Bank; Prediction accuracies; Protein prediction; Protein structure; TN, true negatives; TP, true positives.