Motivation: Computational approaches that can predict protein functions are essential to bridge the widening function annotation gap especially since <1.0% of all proteins in UniProtKB have been experimentally characterized. We present a domain-based method for protein function classification and prediction of functional sites that exploits functional sub-classification of CATH superfamilies. The superfamilies are sub-classified into functional families (FunFams) using a hierarchical clustering algorithm supervised by a new classification method, FunFHMMer.
Results: FunFHMMer generates more functionally coherent groupings of protein sequences than other domain-based protein classifications. This has been validated using known functional information. The conserved positions predicted by the FunFams are also found to be enriched in known functional residues. Moreover, the functional annotations provided by the FunFams are found to be more precise than other domain-based resources. FunFHMMer currently identifies 110,439 FunFams in 2735 superfamilies which can be used to functionally annotate>16 million domain sequences.
Availability and implementation: All FunFam annotation data are made available through the CATH webpages (http://www.cathdb.info). The FunFHMMer webserver (http://www.cathdb.info/search/by_funfhmmer) allows users to submit query sequences for assignment to a CATH FunFam.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author 2015. Published by Oxford University Press.