Identification of new homologs of PD-(D/E)XK nucleases by support vector machines trained on data derived from profile-profile alignments

Nucleic Acids Res. 2011 Mar;39(4):1187-96. doi: 10.1093/nar/gkq958. Epub 2010 Oct 20.

Abstract

PD-(D/E)XK nucleases, initially represented by only Type II restriction enzymes, now comprise a large and extremely diverse superfamily of proteins. They participate in many different nucleic acids transactions including DNA degradation, recombination, repair and RNA processing. Different PD-(D/E)XK families, although sharing a structurally conserved core, typically display little or no detectable sequence similarity except for the active site motifs. This makes the identification of new superfamily members using standard homology search techniques challenging. To tackle this problem, we developed a method for the detection of PD-(D/E)XK families based on the binary classification of profile-profile alignments using support vector machines (SVMs). Using a number of both superfamily-specific and general features, SVMs were trained to identify true positive alignments of PD-(D/E)XK representatives. With this method we identified several PFAM families of uncharacterized proteins as putative new members of the PD-(D/E)XK superfamily. In addition, we assigned several unclassified restriction enzymes to the PD-(D/E)XK type. Results show that the new method is able to make confident assignments even for alignments that have statistically insignificant scores. We also implemented the method as a freely accessible web server at http://www.ibt.lt/bioinformatics/software/pdexk/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Artificial Intelligence*
  • Catalytic Domain
  • Conserved Sequence
  • DNA Restriction Enzymes / chemistry
  • DNA Restriction Enzymes / classification
  • Endonucleases / chemistry
  • Endonucleases / classification*
  • Exonucleases / classification
  • Holliday Junction Resolvases / chemistry
  • Molecular Sequence Data
  • Protein Structure, Tertiary
  • Sequence Alignment / methods*
  • Sequence Homology, Amino Acid
  • Software

Substances

  • Endonucleases
  • Exonucleases
  • DNA Restriction Enzymes
  • Holliday Junction Resolvases