Novel function discovery through sequence and structural data mining

Curr Opin Struct Biol. 2016 Jun;38:53-61. doi: 10.1016/ Epub 2016 Jun 10.


Large-scale sequence and structural data is a goldmine of novel proteins, but how can this data be effectively mined for new functions? Here, we review protein function prediction methods and recent studies that apply these methods to discover new functionality. Core approaches include sequence-based homology detection, phylogenetic analysis, structural bioinformatics, and inference of functional associations using genomic context and related methods. With such a wide range of approaches, sequences may reveal new functionality regardless of their similarity to a characterized reference. Homologs of known function may be identified in unexpected species or associations. Detection of functional shifts in sequences may reveal new activities and specificities. New protein functions may also be predicted in uncharacterized sequences and structures. Finally, methods and data may be integrated and applied at increasingly large scales due to improved protein domain knowledge and structural coverage, which amplifies the ability to predict and discover novel protein functions.

Publication types

  • Review
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs
  • Computational Biology / methods*
  • Data Mining / methods*
  • Protein Domains
  • Proteins / chemistry*
  • Proteins / metabolism*


  • Proteins