Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 18;13(1):6151.
doi: 10.1038/s41467-022-33879-5.

Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

Affiliations

Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA

Piotr Klukowski et al. Nat Commun. .

Abstract

Nuclear Magnetic Resonance (NMR) spectroscopy is a major technique in structural biology with over 11,800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and dynamics of small and medium size proteins in solution, living cells, and solids, but has been limited by the tedious data analysis process. It typically requires weeks or months of manual work of a trained expert to turn NMR measurements into a protein structure. Automation of this process is an open problem, formulated in the field over 30 years ago. We present a solution to this challenge that enables the completely automated analysis of protein NMR data within hours after completing the measurements. Using only NMR spectra and the protein sequence as input, our machine learning-based method, ARTINA, delivers signal positions, resonance assignments, and structures strictly without human intervention. Tested on a 100-protein benchmark comprising 1329 multidimensional NMR spectra, ARTINA demonstrated its ability to solve structures with 1.44 Å median RMSD to the PDB reference and to identify 91.36% correct NMR resonance assignments. ARTINA can be used by non-experts, reducing the effort for a protein assignment or structure determination by NMR essentially to the preparation of the sample and the spectra measurements.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The ARTINA workflow for automated NMR protein structure determination.
The flowchart presents the interplay between the main components of the automated protein structure determination workflow: Residual Neural Network (ResNet), FLYA automated chemical shift assignment, Graph Neural Network (GNN), Gradient Boosted Trees (GBT), and CYANA structure calculation.
Fig. 2
Fig. 2. NMR benchmark dataset content.
PDB codes (or names, MH04, MDM2, KRAS4B, if PDB code unavailable) of the 100 benchmark proteins are ordered by the number of residues. The histogram shows the number of spectra for backbone assignment, side-chain assignment, and NOE measurement. Spectrum types in each data set are shown by light to dark blue circles indicating the number of individual spectra of the given type. The percentages of benchmark records that contain a given spectrum type are given at the top. Spectrum types present in less than 5% of the data sets have been omitted.
Fig. 3
Fig. 3. 100 protein structures determined automatically by ARTINA (blue) overlaid with corresponding PDB depositions (orange).
The structures are aligned with the RMSD to reference range as indicated on the left and hexagonal frames color-coded by their size as indicated above. Structures with no corresponding PDB depositions are marked by an asterisk.
Fig. 4
Fig. 4. Results of the automated structure determination of 100 proteins.
a Backbone RMSD to reference. b Number of distance restraints per residue. c Chemical shift assignment accuracy. Bars represent quantity values for benchmark proteins, identified by PDB codes (or protein names). Proteins are ordered by size, which is indicated by a color-coded circle. Values in the center of each panel are 10th, 50th, and 90th percentiles of values presented in the bar plot. Short/medium/long-range restraints are between residues i and j with |ij| ≤ 1, 2 ≤ |ij| ≤  4, and |ij| ≥ 5, respectively.
Fig. 5
Fig. 5. Actual and predicted RMSD between ARTINA and reference PDB structures.
The predicted RMSD to reference (pRMSD) is calculated from the ARTINA results without knowledge of the reference PDB structure (see “Methods”) and, by definition, always in the range of 0–4 Å. For comparability, actual RMSD values to reference are also truncated at 4 Å (protein 2M47 with RMSD 4.47 Å). The dotted lines represent deviations of ±1 Å between the two RMSD quantities.
Fig. 6
Fig. 6. Commonly occurring challenges in visual spectrum analysis.
A fragment of a 15N-HSQC spectrum of the protein 1T0Y is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses). a1, a2 Initial peak picking marker position is refined by the deconvolution model. b1, b2 pp-ResNet output is deconvolved into two components. c The deconvolution model supports maximally 3 components per initial signal. d Two peak picking markers are merged by the deconvolution model. e Peak picking output deconvolved into three components.
Fig. 7
Fig. 7. Performance of the peak picking model on a spectrum fragment with high peak overlap.
A fragment of the 13C-HSQC spectrum of protein 2K0M is shown. Initial signal positions identified by the peak picking model pp-ResNet (black dots) are deconvolved by deconv-ResNet, yielding the final coordinates used for automated assignment and structure determination (blue crosses).

Similar articles

Cited by

References

    1. Wüthrich K. NMR studies of structure and function of biological macromolecules (Nobel Lecture) Angew. Chem. Int. Ed. 2003;42:3340–3363. doi: 10.1002/anie.200300595. - DOI - PubMed
    1. Sakakibara D, et al. Protein structure determination in living cells by in-cell NMR spectroscopy. Nature. 2009;458:102–105. doi: 10.1038/nature07814. - DOI - PubMed
    1. Guerry P, Herrmann T. Advances in automated NMR protein structure determination. Q. Rev. Biophys. 2011;44:257–309. doi: 10.1017/S0033583510000326. - DOI - PubMed
    1. Güntert P. Automated structure determination from NMR spectra. Eur. Biophys. J. 2009;38:129–143. doi: 10.1007/s00249-008-0367-z. - DOI - PubMed
    1. Garrett DS, Powers R, Gronenborn AM, Clore GM. A common sense approach to peak picking two-, three- and four-dimensional spectra using automatic computer analysis of contour diagrams. J. Magn. Reson. 1991;95:214–220. - PubMed

Publication types