Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 17:52:5.8.1-5.8.15.
doi: 10.1002/0471250953.bi0508s52.

Protein Structure and Function Prediction Using I-TASSER

Affiliations

Protein Structure and Function Prediction Using I-TASSER

Jianyi Yang et al. Curr Protoc Bioinformatics. .

Abstract

I-TASSER is a hierarchical protocol for automated protein structure prediction and structure-based function annotation. Starting from the amino acid sequence of target proteins, I-TASSER first generates full-length atomic structural models from multiple threading alignments and iterative structural assembly simulations followed by atomic-level structure refinement. The biological functions of the protein, including ligand-binding sites, enzyme commission number, and gene ontology terms, are then inferred from known protein function databases based on sequence and structure profile comparisons. I-TASSER is freely available as both an on-line server and a stand-alone package. This unit describes how to use the I-TASSER protocol to generate structure and function prediction and how to interpret the prediction results, as well as alternative approaches for further improving the I-TASSER modeling quality for distant-homologous and multi-domain protein targets.

Keywords: I-TASSER; protein function annotation; protein structure prediction; threading.

PubMed Disclaimer

Figures

Figure 5.8.1
Figure 5.8.1
The I-TASSER protocol for protein structure and function prediction.
Figure 5.8.2
Figure 5.8.2
Screenshot of an illustrative job submission on the I-TASSER server.
Figure 5.8.3
Figure 5.8.3
The submitted sequence and predicted secondary structure and solvent accessibility. The sequence submitted, consisting of 122 residues, is listed at the top of the figure. The predicted secondary structure shown at the middle suggests that this protein is an alpha-beta protein, which contains three alpha-helices (in red) and four beta-strands (in blue). “H,” “S,” and “C” indicate helix, strand, and coil, respectively. The predicted solvent accessibility at the bottom is presented in 10 levels, from buried (0) to highly exposed (9).
Figure 5.8.4
Figure 5.8.4
Prediction on the normalized B-factor. The regions at the N- and C-terminals and most of the loop regions are predicted with positive normalized B-factors in this example, indicating that these regions are structurally more flexible than other regions. On the other hand, the predicted normalized B-factors for the alpha and beta regions are negative or close to zero, suggesting these regions are structurally more stable.
Figure 5.8.5
Figure 5.8.5
The top 10 threading templates used by I-TASSER. The Z-score, which has been widely used for estimating the significance and the quality of template alignments, equals the difference between the raw alignment score and the mean in units of standard deviation. However, since LOMETS contains templates from multiple threading programs where the Z-scores are not comparable between different programs, I-TASSER uses a normalized Z-score (highlighted by the orange box) to specify the quality of the template, which is defined as the Z-score divided by the program-specific Z-score cutoffs. Thus, a normalized Z-score >1 indicates an alignment with high confidence. In this example, because there are multiple templates with the normalized Z-score above 1, the target is categorized by I-TASSER as an ‘Easy’ target. The multiple alignments between the query and the templates are marked by the blue box, where the residue numbers of each template are available by clicking on the corresponding ‘Download’ link. It can be seen from the multiple sequence alignment that, except for a few residues at the N- and C- terminals of the query (i.e., aligned to gaps ‘-’), other residues are well aligned with templates. This usually indicates that there is a high level of conservation between the target and templates.
Figure 5.8.6
Figure 5.8.6
The top five models used I-TASSER, with global and local accuracy estimations. (A) The top five models. In this example, five models are generated and visualized in rainbow cartoon on the results page by JSmol, where blue to red runs from the N- to the C-terminals. Since the C-score is high (=0.56), the first model is expected to have good quality, with an estimated TM-score = 0.79 and RMSD = 3.3 Å relative to the native (highlighted in the blue box). The residue-specific accuracy estimation (in Å) for each model can be viewed by clicking on the link of the ‘Local structure accuracy profile of the top five models’ as highlighted in the orange box. (B) The local accuracy estimation for the first model. This example shows that the majority of residues in the model are modeled accurately, with estimated distance to native below 2 Å. However, the N- and C- terminal residues in the model are estimated with bigger distance, which is probably due to the poor alignments with templates for these residues, as shown in Figure 5.8.5.
Figure 5.8.7
Figure 5.8.7
Ten PDB structures close to the target. The structure of the first I-TASSER model (model 1, shown in rainbow cartoon) is superimposed on the analogous structures from the PDB (shown in medium-purple backbone trace). The structural similarity between the target model and the 10 closest proteins are ranked by TM-scores, which are highlighted in the orange box. The coordinate file of the superimposed structures can be downloaded through the Download link for local visualization. In this example, there are multiple analogous structures from the PDB that have a high TM-score (>0.9), including 4co7A, 3m95A, and 3dowA. However, it is also possible that no similar structures can be found in the PDB; this usually indicates that the target protein is a new-fold protein or the fold by I-TASSER prediction is not correct.
Figure 5.8.8
Figure 5.8.8
Illustration of ligand binding site prediction. The binding site prediction shown on the table is made by COACH, which combines the prediction results from five complementary algorithms of COFACTOR (Roy et al., 2012), TM-SITE, S-SITE (Yang et al., 2013b), FindSite (Brylinski and Skolnick, 2008), and ConCavity (Capra et al., 2009). The predicted binding ligand is highlighted in yellow-green spheres, with the corresponding binding residues shown as blue ball-and-stick illustrations in the picture of the 3-D model. In this example, the first functional template (PDB ID: 3dowA) has a high confidence score (C-score = 0.98) that it binds with a peptide ligand. Except for the predicted peptide, the protein can also bind to other ligands, which are available in a PDB file at the ‘Mult’ link. The ligands separated by ‘TER’ are put in the end of this file.
Figure 5.8.9
Figure 5.8.9
Illustration of enzyme commission (EC) number and active site predictions. In this example, the first model is predicted based on the template of PDB ID: 2j0mA, which is a nonspecific protein-tyrosine kinase with EC number 2.7.10.2. The predicted active-site residues are I8 and L12, shown in colored ball-and-sticks in the right column. Models from other templates can be found by clicking on the radio buttons.
Figure 5.8.10
Figure 5.8.10
Illustration of gene ontology (GO) term prediction. The GO term predictions are presented in two parts. The first part lists the top 10 template proteins ranked by CscoreGO (Roy et al., 2012). The most frequently occurring GO terms in each of the three functional aspects (molecular function, biological process, and cellular component) are reconciled, with the consensus GO terms presented in the second part along with the confidence score for each predicted GO term (i.e., the ‘GO-Score’ in the table). In this example, the predicted top GO terms for the molecular function, biological process, and cellular component are beta-tubulin binding (GO:0048487), autophagosome assembly (GO:0000045), and autophagosome membrane (GO:0000421), respectively.
Figure 5.8.11
Figure 5.8.11
Illustration of domain parsing for multi-domain proteins. The query sequence is shown with a blue line, and the aligned template sequences from LOMETS are shown in black lines. Gaps in the template are blank. (A) The N- and C-terminal domains are well aligned with templates (indicating conserved domains), while the residues in the middle region are aligned to gaps (probably from another domain that is missed from the template). The sequence is parsed into three domains as shown by the two scissors. (B) The C-terminal domain is well aligned with multiple templates, while the residues in the N-terminal domain are aligned to gaps. The sequence is parsed into two putative domains, as shown by the scissor. (C) Only the residues in the middle region are well aligned with multiple templates. The sequence is parsed into three domains, as shown by the two scissors.

Similar articles

Cited by

References

    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–3402. doi: 10.1093/nar/25.17.3389. - DOI - PMC - PubMed
    1. Battey JN, Kopp J, Bordoli L, Read RJ, Clarke ND, Schwede T. Automated server predictions in CASP7. Proteins. 2007;69:68–82. doi: 10.1002/prot.21761. - DOI - PubMed
    1. Biasini M, Bienert S, Waterhouse A, Arnold K, Studer G, Schmidt T, Kiefer F, Cassarino TG, Bertoni M, Bordoli L, Schwede T. SWISS-MODEL: Modelling protein tertiary and quaternary structure using evolutionary information. Nucleic Acids Res. 2014;42:W252–258. doi: 10.1093/nar/gku340. - DOI - PMC - PubMed
    1. Blake JA, Harris MA. The gene ontology (GO) project: Structured vocabularies for molecular biology and their application to genome and expression analysis. Curr Protoc Bioinformatics. 2008;23(7.2):7.2.1–7.2.9. - PubMed
    1. Brylinski M, Skolnick J. A threading-based method (FINDSITE) for ligand-binding site prediction and functional annotation. Proc Natl Acad Sci USA. 2008;105:129–134. doi: 10.1073/pnas.0707684105. - DOI - PMC - PubMed

Publication types

LinkOut - more resources