Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2024 Apr;628(8007):450-457.
doi: 10.1038/s41586-024-07215-4. Epub 2024 Feb 26.

Automated model building and protein identification in cryo-EM maps

Affiliations
Comparative Study

Automated model building and protein identification in cryo-EM maps

Kiarash Jamali et al. Nature. 2024 Apr.

Abstract

Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high levels of expertise and labour-intensive manual intervention in three-dimensional computer graphics programs1,2. Here we present ModelAngelo, a machine-learning approach for automated atomic model building in cryo-EM maps. By combining information from the cryo-EM map with information from protein sequence and structure in a single graph neural network, ModelAngelo builds atomic models for proteins that are of similar quality to those generated by human experts. For nucleotides, ModelAngelo builds backbones with similar accuracy to those built by humans. By using its predicted amino acid probabilities for each residue in hidden Markov model sequence searches, ModelAngelo outperforms human experts in the identification of proteins with unknown sequences. ModelAngelo will therefore remove bottlenecks and increase objectivity in cryo-EM structure determination.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Atomic modelling in ModelAngelo.
a, ModelAngelo builds atomic models in three steps: (1) a CNN predicts protein and nucleic acid residue positions; (2) a GNN optimizes these positions and orientations (shown in b); (3) post-processing of the optimized graph leads to a complete atomic model. b, The GNN, which is arranged in eight layers with three modules, uses a feature vector per residue that is passed through MLP and integrated with additional data through attention mechanisms that have query (Q), key (K) and value (V) vectors. The cryo-EM module also produces a feature vector (C) used for residue prediction. The IPA module uses query points (Qpoints) and their distances to the neighbouring residues (Dq) for attention. Stable gradient propagation is ensured by residual connections with layer norms (Add LN). Residue feature vectors are used to update residue positions and orientations. They are also used to predict torsion angles, confidence scores and residue identities at the end of each layer.
Fig. 2
Fig. 2. Performance of ModelAngelo for proteins.
a, The backbone r.m.s.d. and model completeness plotted as a function of the target model Q-scores. b, Histograms of the Q-scores of residues in the deposited models, comparing those built by ModelAngelo with those not built. c, Q-score comparison between ModelAngelo-predicted models and the deposited models. d, Model-to-map Fourier shell correlation (FSC), as calculated by Servalcat after refining both models and using only residues present in both ModelAngelo and deposited models. e, Model completeness for various automated model-building software for different local-resolution ranges in the maps. f, Model completeness for ModelAngelo and versions of ModelAngelo in which its sequence and/or IPA modules were ablated. For ad, the data relate to the test set of 177 structures; for e and f, the data relate to the subset of 27 structures.
Fig. 3
Fig. 3. Performance of ModelAngelo for nucleic acids.
a, Escherichia coli ribosome built by ModelAngelo (with ribosomal RNA in green and proteins in blue) compared with the deposited model (PDB: 7S1G, black outline). b, Magnified view with nucleotide bases showing high accuracy compared with the deposited model (orange). c, ModelAngelo model of the V-K CAST transpososome from S. hofmanni compared with the deposited model (PDB: 8EA4). Sections that were not built by ModelAngelo (black outline) are in regions of low Q-score (as shown in g). d, Magnified view comparing the nucleotide bases of both models, showing a sequence that was incorrectly identified by ModelAngelo. e, Backbone r.m.s.d., backbone completeness and sequence completeness were plotted against the deposited Q-score for six ribosome structures. f,g, Deposited models for the structures in a and c, respectively, coloured by Q-score, with low-Q-score regions indicated by boxes.
Fig. 4
Fig. 4. Examples of protein identification using ModelAngelo.
a, The ModelAngelo model of the single-PBS–PSII–PSI–LHC supercomplex (grey) showing the positions, models and map densities of six newly identified proteins (green). Backbone traces in the deposited model (PDB: 7Y5E) are shown in orange. b, Atomic model of the central apparatus microtubule C1 showing the positions, models and map densities of two identified proteins—FAP92 and FAP374. The orange cartoons represent poly(UNK) chains deposited in the original model (PDB: 7SQC). c, An atomic model of radial spokes 1 and 2 (RS1 and RS2) bound to a doublet microtubule (grey) showing the positions, models and map densities of four proteins (RSP24–27, green) identified by ModelAngelo. Only RSP27 had a backbone trace in the deposited model (orange). C, C terminus; N, N terminus.
Extended Data Fig. 1
Extended Data Fig. 1. Identified proteins in the phycobilisome.
Atomic models built by ModelAngelo (green) for the six proteins that were identified by ModelAngelo. Side chain densities in the cryo-EM map (transparent grey) are in agreement with those of the atomic models.
Extended Data Fig. 2
Extended Data Fig. 2. Models by ModelAngelo and AlphaFold for identified proteins in the phycobilisome.
Models built by ModelAngelo (green) are shown next to predictions of the corresponding sequences by AlphaFold (coloured by AlphaFold’s confidence from high in blue, to low in red).
Extended Data Fig. 3
Extended Data Fig. 3. Performance around cofactors in the phycobilisome.
a, Cartoon representation of protein backbones (orange) and stick representation of a phycocyanobilin co-factor (pink) in the cryo-EM density (transparent grey) for the deposited phycobilisome structure. b, as in panel a, but for the model built by ModelAngelo (green). ModelAngelo leaves the cofactor density empty. c, d, as in panels a, b but for a phycoerythrobilin cofactor.
Extended Data Fig. 4
Extended Data Fig. 4. Models by ModelAngelo and AlphaFold for identified proteins in the ciliary axoneme.
Models built by ModelAngelo (green) are shown next to predictions of the corresponding sequences by AlphaFold (coloured by AlphaFold’s confidence from high in blue, to low in red). These are split between a, the radial spoke proteins, and b, the central apparatus microtubule proteins.
Extended Data Fig. 5
Extended Data Fig. 5. Identified proteins in the ciliary axoneme.
Atomic models built by ModelAngelo (green) for the six proteins that were identified by ModelAngelo. Side chain densities in the cryo-EM map (transparent grey) are in agreement with those of the atomic models. These are split between a, the radial spoke proteins, and b, the central apparatus microtubule proteins.

Update of

Similar articles

Cited by

References

    1. Emsley P, Lohkamp B, Scott WG, Cowtan K. Features and development of coot. Acta Crystallogr. D. 2010;66:486–501. doi: 10.1107/S0907444910007493. - DOI - PMC - PubMed
    1. Croll TI. Isolde: a physically realistic environment for model building into low-resolution electron-density maps. Acta Crystallogr. D. 2018;74:519–530. doi: 10.1107/S2059798318002425. - DOI - PMC - PubMed
    1. Nakane T, et al. Single-particle cryo-EM at atomic resolution. Nature. 2020;587:152–156. doi: 10.1038/s41586-020-2829-0. - DOI - PMC - PubMed
    1. Yip KM, Fischer N, Paknia E, Chari A, Stark H. Atomic-resolution protein structure determination by cryo-EM. Nature. 2020;587:157–161. doi: 10.1038/s41586-020-2833-4. - DOI - PubMed
    1. Lawson CL, et al. EMDataBank unified data resource for 3DEM. Nucleic Acids Res. 2016;44:D396–D403. doi: 10.1093/nar/gkv1126. - DOI - PMC - PubMed

Publication types