The InterPro protein families database: the classification resource after 15 years

Alex Mitchell; Hsin-Yu Chang; Louise Daugherty; Matthew Fraser; Sarah Hunter; Rodrigo Lopez; Craig McAnulla; Conor McMenamin; Gift Nuka; Sebastien Pesseat; Amaia Sangrador-Vegas; Maxim Scheremetjew; Claudia Rato; Siew-Yit Yong; Alex Bateman; Marco Punta; Teresa K Attwood; Christian J A Sigrist; Nicole Redaschi; Catherine Rivoire; Ioannis Xenarios; Daniel Kahn; Dominique Guyot; Peer Bork; Ivica Letunic; Julian Gough; Matt Oates; Daniel Haft; Hongzhan Huang; Darren A Natale; Cathy H Wu; Christine Orengo; Ian Sillitoe; Huaiyu Mi; Paul D Thomas; Robert D Finn

doi:10.1093/nar/gku1243

The InterPro protein families database: the classification resource after 15 years

Nucleic Acids Res. 2015 Jan;43(Database issue):D213-21. doi: 10.1093/nar/gku1243. Epub 2014 Nov 26.

Authors

Alex Mitchell¹, Hsin-Yu Chang¹, Louise Daugherty¹, Matthew Fraser¹, Sarah Hunter¹, Rodrigo Lopez¹, Craig McAnulla¹, Conor McMenamin¹, Gift Nuka¹, Sebastien Pesseat¹, Amaia Sangrador-Vegas¹, Maxim Scheremetjew¹, Claudia Rato¹, Siew-Yit Yong¹, Alex Bateman¹, Marco Punta¹, Teresa K Attwood², Christian J A Sigrist³, Nicole Redaschi³, Catherine Rivoire³, Ioannis Xenarios⁴, Daniel Kahn⁵, Dominique Guyot⁵, Peer Bork⁶, Ivica Letunic⁶, Julian Gough⁷, Matt Oates⁷, Daniel Haft⁸, Hongzhan Huang⁹, Darren A Natale⁹, Cathy H Wu¹⁰, Christine Orengo¹¹, Ian Sillitoe¹¹, Huaiyu Mi¹², Paul D Thomas¹², Robert D Finn¹³

Affiliations

¹ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
² Faculty of Life Science and School of Computer Science, The University of Manchester, Manchester, M13 9PL, UK.
³ Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland.
⁴ Swiss Institute of Bioinformatics (SIB), CMU - Rue Michel-Servet, 1211 Geneva 4, Switzerland Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland Department of Biochemistry, University of Geneva, 1211 Geneva, Switzerland.
⁵ Pôle Rhône-Alpin de Bio-Informatique (PRABI), Batiment G. Mendel, Universite Claude Bernard, 43 bd du 11 novembre 1918, 69622 Villeurbanne Cedex, France.
⁶ European Molecular Laboratory (EMBL), Meyerhofstasse 1, 69117 Heidelberg, Germany.
⁷ Department of Computer Science, University of Bristol, Woodland Road, Bristol, BS8 1UB, UK.
⁸ J. Craig Venter Institute (JCVI), 9704 Medical Center Drive, Rockville, MD 20850, USA.
⁹ Protein Information Resource (PIR), Georgetown University Medical Center, Washington, DC 20007, USA.
¹⁰ Protein Information Resource (PIR), Georgetown University Medical Center, Washington, DC 20007, USA Center for Bioinformatics and Computational Biology, University of Delaware, Newark, DE 19711, USA.
¹¹ Structural and Molecular Biology Department, University College London, University of London, London, WC1E 6BT, UK.
¹² Division of Bioinformatics, Department of Preventive Medicine, University of Southern California, Los Angeles, CA 90089, USA.
¹³ European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK rdf@ebi.ac.uk.

Abstract

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36,766 member database signatures integrated into 26,238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Bacteria / metabolism
Databases, Protein*
Gene Ontology
Protein Structure, Tertiary
Proteins / classification*
Proteins / genetics
Sequence Analysis, Protein
Software

Substances

Proteins

Abstract

Publication types

MeSH terms

Substances

Grants and funding