RF-Hydroxysite: a random forest based predictor for hydroxylation sites

Hamid D Ismail; Robert H Newman; Dukka B Kc

doi:10.1039/c6mb00179c

RF-Hydroxysite: a random forest based predictor for hydroxylation sites

Mol Biosyst. 2016 Jul 19;12(8):2427-35. doi: 10.1039/c6mb00179c.

Authors

Hamid D Ismail¹, Robert H Newman², Dukka B Kc¹

Affiliations

¹ Department of Computational Science and Engineering, NCA&T State University, Greensboro, NC 27411, USA. dbkc@ncat.edu.
² Department of Biology, NCA&T State University, Greensboro, NC 27411, USA.

Abstract

Protein hydroxylation is an emerging posttranslational modification involved in both normal cellular processes and a growing number of pathological states, including several cancers. Protein hydroxylation is mediated by members of the hydroxylase family of enzymes, which catalyze the conversion of an alkyne group at select lysine or proline residues on their target substrates to a hydroxyl. Traditionally, hydroxylation has been identified using expensive and time-consuming experimental methods, such as tandem mass spectrometry. Therefore, to facilitate identification of putative hydroxylation sites and to complement existing experimental approaches, computational methods designed to predict the hydroxylation sites in protein sequences have recently been developed. Building on these efforts, we have developed a new method, termed RF-hydroxysite, that uses random forest to identify putative hydroxylysine and hydroxyproline residues in proteins using only the primary amino acid sequence as input. RF-Hydroxysite integrates features previously shown to contribute to hydroxylation site prediction with several new features that we found to augment the performance remarkably. These include features that capture physicochemical, structural, sequence-order and evolutionary information from the protein sequences. The features used in the final model were selected based on their contribution to the prediction. Physicochemical information was found to contribute the most to the model. The present study also sheds light on the contribution of evolutionary, sequence order, and protein disordered region information to hydroxylation site prediction. The web server for RF-hydroxysite is available online at .

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Sequence
Amino Acids / chemistry
Amino Acids / metabolism
Computational Biology / methods*
Hydrophobic and Hydrophilic Interactions
Hydroxylation
Lysine / chemistry*
Lysine / metabolism
Proline / chemistry*
Proline / metabolism
Proteins / chemistry*
Proteins / metabolism
ROC Curve

Substances

Amino Acids
Proteins
Proline
Lysine

Grants and funding

SC2 GM113784/GM/NIGMS NIH HHS/United States