Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jan 4;45(D1):D190-D199.
doi: 10.1093/nar/gkw1107. Epub 2016 Nov 29.

InterPro in 2017-beyond Protein Family and Domain Annotations

Free PMC article

InterPro in 2017-beyond Protein Family and Domain Annotations

Robert D Finn et al. Nucleic Acids Res. .
Free PMC article


InterPro ( is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Figure 1.
Figure 1.
Example of an InterPro family hierarchical relationship. The FGGY carbohydrate kinases entry (IPR000577) provides a parent to a series of child entries that match smaller, more functionally-specific sets of proteins.
Figure 2.
Figure 2.
Timeline showing the member databases that have joined InterPro since version 1.0, released in 2000.
Figure 3.
Figure 3.
Examples of the CDD and SFLD hierarchies (A and B). (A) CDD models for related domains are organized hierarchically, reflecting major events in the domain family's molecular evolution and functional diversification. The hierarchy usually follows a tree structure obtained from (C) phylogenetic analysis of multiply aligned sequences. The relationship between the CDD entries in panel A and the sequences in panel B is indicated by colour. The top ‘parent’ entry (isoprenoid biosynthesis enzymes, Class 1 superfamily) would be less specific than the ‘leaf’ node entry (trans-isoprenyl diphosphate synthase, head-to-head). (B) The corresponding superfamily, Isoprenoid Synthase Type I, from SFLD. The specificity relationships between the entries is similarly arranged as in panel A. (D) SFLD network analysis graph showing the sequence identity relationships between the Isoprenoid Synthase Type I superfamily members. The E-value threshold for the network is 1e-10 and sequences within nodes share 50% or more sequence identity, calculated using CD-HIT. Note, figures C and D are visualizations from the respective source database and are not available from the InterPro website. These figures demonstrate the different approaches for visualizing and defining relationships between families.
Figure 4.
Figure 4.
Integration of MobiDB Lite annotation within InterPro, enabling annotation of intrinsic disordered (ID) regions within proteins. Top - InterPro annotations for the Human mediator of RNA polymerase II transcription subunit 1 protein (UniProtKB accession Q15648). Middle - Zoomed in view of the consensus long range ID predictions provided by MobiDB Lite. InterPro only captures the consensus output for each sequence, but the graphical representations of the ID regions link to the source website, MobiDB (bottom), where the individual predictions can be viewed.

Similar articles

  • InterPro in 2019: improving coverage, classification and access to protein sequence annotations.
    Mitchell AL, Attwood TK, Babbitt PC, Blum M, Bork P, Bridge A, Brown SD, Chang HY, El-Gebali S, Fraser MI, Gough J, Haft DR, Huang H, Letunic I, Lopez R, Luciani A, Madeira F, Marchler-Bauer A, Mi H, Natale DA, Necci M, Nuka G, Orengo C, Pandurangan AP, Paysan-Lafosse T, Pesseat S, Potter SC, Qureshi MA, Rawlings ND, Redaschi N, Richardson LJ, Rivoire C, Salazar GA, Sangrador-Vegas A, Sigrist CJA, Sillitoe I, Sutton GG, Thanki N, Thomas PD, Tosatto SCE, Yong SY, Finn RD. Mitchell AL, et al. Nucleic Acids Res. 2019 Jan 8;47(D1):D351-D360. doi: 10.1093/nar/gky1100. Nucleic Acids Res. 2019. PMID: 30398656 Free PMC article.
  • The InterPro protein families database: the classification resource after 15 years.
    Mitchell A, Chang HY, Daugherty L, Fraser M, Hunter S, Lopez R, McAnulla C, McMenamin C, Nuka G, Pesseat S, Sangrador-Vegas A, Scheremetjew M, Rato C, Yong SY, Bateman A, Punta M, Attwood TK, Sigrist CJ, Redaschi N, Rivoire C, Xenarios I, Kahn D, Guyot D, Bork P, Letunic I, Gough J, Oates M, Haft D, Huang H, Natale DA, Wu CH, Orengo C, Sillitoe I, Mi H, Thomas PD, Finn RD. Mitchell A, et al. Nucleic Acids Res. 2015 Jan;43(Database issue):D213-21. doi: 10.1093/nar/gku1243. Epub 2014 Nov 26. Nucleic Acids Res. 2015. PMID: 25428371 Free PMC article.
  • InterPro protein classification.
    McDowall J, Hunter S. McDowall J, et al. Methods Mol Biol. 2011;694:37-47. doi: 10.1007/978-1-60761-977-2_3. Methods Mol Biol. 2011. PMID: 21082426
  • In silico characterization of proteins: UniProt, InterPro and Integr8.
    Mulder NJ, Kersey P, Pruess M, Apweiler R. Mulder NJ, et al. Mol Biotechnol. 2008 Feb;38(2):165-77. doi: 10.1007/s12033-007-9003-x. Epub 2007 Oct 4. Mol Biotechnol. 2008. PMID: 18219596 Review.
  • The apoptosis database.
    Doctor KS, Reed JC, Godzik A, Bourne PE. Doctor KS, et al. Cell Death Differ. 2003 Jun;10(6):621-33. doi: 10.1038/sj.cdd.4401230. Cell Death Differ. 2003. PMID: 12761571 Review.
See all similar articles

Cited by 478 articles

See all "Cited by" articles


    1. Mardis E.R. The $1,000 genome, the $100,000 analysis? Genome Med. 2010;2:84. - PMC - PubMed
    1. Galperin M.Y., Koonin E.V. From complete genome sequence to ‘complete’ understanding? Trends Biotechnol. 2010;28:398–406. - PMC - PubMed
    1. Lam S.D., Dawson N.L., Das S., Sillitoe I., Ashford P., Lee D., Lehtinen S., Orengo C.A., Lees J.G. Gene3D: expanding the utility of domain assignments. Nucleic Acids Res. 2016;44:D404–D409. - PMC - PubMed
    1. Pedruzzi I., Rivoire C., Auchincloss A.H., Coudert E., Keller G., de Castro E., Baratin D., Cuche B.A., Bougueleret L., Poux S., et al. HAMAP in 2015: updates to the protein family classification and annotation system. Nucleic Acids Res. 2015;43:D1064–D1070. - PMC - PubMed
    1. Mi H., Poudel S., Muruganujan A., Casagrande J.T., Thomas P.D. PANTHER version 10: expanded protein families and functions, and analysis tools. Nucleic Acids Res. 2016;44:D336–D342. - PMC - PubMed

Publication types