2020 Jan 10
eCollection Jan 2020
Beyond Mass Spectrometry, the Next Step in Proteomics
Item in Clipboard
Beyond Mass Spectrometry, the Next Step in Proteomics
Proteins can be the root cause of a disease, and they can be used to cure it. The need to identify these critical actors was recognized early (1951) by Sanger; the first biopolymer sequenced was a peptide, insulin. With the advent of scalable, single-molecule DNA sequencing, genomics and transcriptomics have since propelled medicine through improved sensitivity and lower costs, but proteomics has lagged behind. Currently, proteomics relies mainly on mass spectrometry (MS), but instead of truly sequencing, it classifies a protein and typically requires about a billion copies of a protein to do it. Here, we offer a survey that illuminates a few alternatives with the brightest prospects for identifying whole proteins and displacing MS for sequencing them. These alternatives all boast sensitivity superior to MS and promise to be scalable and seem to be adaptable to bioinformatics tools for calling the sequence of amino acids that constitute a protein.
Copyright © 2020 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution License 4.0 (CC BY).
Fig. 1. Inferring the primary structure of protein by MS.
MS uses two approaches to infer the primary structure of a protein: (
A) a bottom-up (BU-MS) approach that is used prevalently and ( B) a top-down (TD-MS) approach that analyzes intact protein <70 kDa. According to the process flow in BU-MS, to infer the primary structure, the proteins are first digested by trypsin, and then the resulting peptides 0.8 to 3 kDa in size, on average, are analyzed in the gas phase by MS. First, the mass of the peptides are determined, and then peptide ions are fragmented to inform on the sequence using tandem MS (MS/MS). BU-MS does not inform on the entire sequence but only fragments as represented by the incomplete word “protein.” In contrast, in TD-MS, intact protein ions are introduced in the gas phase and are fragmented (10 kDa in size on average) and analyzed by MS to identify the mass of the protein and protein ion fragments, which are then puzzled out to reveal the primary structure of the protein. Both methods subsequently rely extensively on the searches through databases to identify the protein. The entire sequence can be revealed this way along with PTMs (represented by the capital letter in the word “pRoteins”). CID, collision-induced dissociation; ECD, electron-capture dissociation; ETD, electron-transfer dissociation.
Fig. 2. Long-read transcriptomics.
A) Direct RNA sequencing reads from GM12878 are shown mapped to the CDKN2B genomic locus. Plotted by Integrated Genomics Viewer, individual RNA reads aligned against the reference human genome (hg38) in this region are shown with the reads matching to the p14 isoform colored red and the ARF p16 isoform colored blue. Note that exons 2 and 3 are in common, while exon 1α is specific to INK4a p16 and exon 1β is specific to p14. ( B and C) Zoomed-in regions are shown with predicted translations at the 5′ exon 2 boundary. Reads are aligned against the transcriptome (Gencode v27). Integrative Genomics Viewer reads aligned against the transcriptome (Gencode v27) are shown with p16 on the top and p14 on the bottom. Although insertions/deletions (indels) and mismatches with the reference are observed, the consensus agrees 99% of the time with the reference. The translated codons are shown between the aligned reads; note that although exon 2 is the same RNA sequence in both, the resulting protein is completely different because of the shifted reading frame from the splice variation.
Fig. 3. CITE-seq enables simultaneous detection of single-cell transcriptomes and protein markers.
A) Illustration of the DNA-barcoded antibodies used in CITE-seq. ( B) Schematic representation of CITE-seq in combination with Drop-seq. Cells are incubated with antibodies, washed, and passed through a microfluidic chip where a single cell and one bead are occasionally encapsulated in the same droplet. After cell lysis, mRNAs and antibody-oligos anneal to oligos on Drop-seq beads, linking cell barcodes with cellular transcripts and antibody-derived oligos [adapted from Stoeckius et al. ( 70)].
Fig. 4. Peptide fingerprinting with fluorosequencing.
A) A schematic that represents millions of peptides that are each covalently labeled with (two) different AA-specific fluorescent dyes and immobilized at their C termini using amide linkage to aminosilanes on a glass coverslip mounted on a TIRF microscope stage perfusion chamber [adapted from Swaminathan et al. ( 71)]. Through TIRF, each peptide is imaged, and its N-terminal AA is chemically removed via Edman degradation, thus leaving each peptide one AA shorter and regenerating its free N terminus. Repeated cycles of chemistry and fluorescent imaging reveal the positions of fluorescent dyes within each molecule. The pattern of drops in fluorescence intensity is interpreted to provide a partial sequence annotation for each peptide, which can be matched and scored against a protein sequence database to infer the most likely set of proteins present in the sample. ( B) Schematic of the single-molecule peptide fingerprinting platform leveraging a ClpXP translocation developed by van Ginkel et al. ( 72). Donor-labeled ClpXP is immobilized on a polyethylene glycol–coated slide via biotin-streptavidin conjugation. ClpX6 recognizes an acceptor-labeled substrate and translocates it into the ClpP14 chamber, upon which FRET occurs. A typical fluorescence time trace is shown below for each cycle. High FRET reports on the presence of the substrate in ClpP14, whereas the loss of fluorescence signal indicates the release of the substrate [adapted from van Ginkel et al. ( 72)].
Fig. 5. “5D” fingerprinting with a nanopore.
et al., the approximate shape, dipole moment, and rotational diffusion coefficient are extracted from current modulations within individual current blockades from the translocation of a single protein through a 30-nm-diameter pore. ( A) The approximate shape of the antibody immunoglobulin G1 (IgG1) protein as determined by analysis of individual resistive pulses (blue) with crystal structures in red (blue spheroids show the median values of m and volume from single event analyses of the protein). ( B) Top and side views of a 30-nm-diameter nanopore illustrating the two extreme orientations of a spheroidal protein that is anchored to a fluid lipid coating on the pore wall. A crosswise orientation disturbs the field lines inside the pore more than a lengthwise orientation due to the angle-dependent electrical shape factor. Rotational dynamics of individual proteins inside a nanopore reveal a spheroidal approximation of the protein’s shape. ( C) Current blockade from the translocation of a single IgG1 molecule. Red dots mark the beginning and end of the resistive pulse. ( D) Distribution of all current values within this one blockade. The dark blue curve shows the solution of the model, p(Δ I), after a nonlinear least-squares fitting procedure, and the red dashed curve shows the estimated distribution of the blockade current, Δ I, values due to the distribution of shape factors, p(Δ Iγ) [adapted from Yusko et al. ( 74)]. ( E and F) Arrays of nanopores can be used to boost throughput. A transmission electron micrograph of a nanopore array fabricated using electron beam lithography and reactive-ion etching through a freestanding SiN membrane is shown in (E). The average pore diameter was 29 ± 3 nm [adapted from Verschueren et al. ( 146)]. A transmission electron micrograph of a 1.7 × 2.8–μm 2 silicon nitride membrane 11.3 nm thick with an array of 2-nm-diameter nanopores sputtered on the same 200-nm pitch using STEM is shown.
Fig. 6. Ionic current blockades produced by an (FDFD)
12 peptide translocating through a nanopore spanning a 2D MoS 2 membrane.
A) i to iv: Snapshots of the representative conformations of the (FDFD) 12 peptide translocating through a 2.2-nm-diameter pore at a 600-mV bias at 26, 80, 145, and 200 ns, corresponding to the first, second, third, and fourth translocation pauses indicated by boxes highlighted in cyan, green, yellow, and orange, respectively. The AA phenylalanine (F) is shown in magenta, and aspartic acid (D) is shown in red. ( B) Top: A tally of the number of AA residues of (FDFD) 12 peptide that have translocated through the nanopore. The colored horizontal lines highlight the pauses in the translocation. The corresponding AA fragments in the nanopore are indicated above the colored lines. Bottom: In correspondence with (top), the ionic current through the pore is shown. The gray line represents the actual fluctuating current through the pore, while the colored horizontal lines denote the average ionic current at each translocation pause. The color and the length of the line match that from the translocation trace shown in (B) [adapted from Chen et al. ( 80)].
Fig. 7. Protein sequence analysis using a subnanopore spanning a silicon nitride membrane.
A) ( i) The topography of a subnanopore is revealed by an HAADF-STEM image acquired with an aberration-corrected microscope. ( ii) The corresponding line plot through the subnanopore is shown associated with the white dashed line in (i), which indicates the mass density under the beam. The shot noise between the red dashed lines indicates the subnanopore diameter. The subnanopore has a (geometric) mean diameter at the waist of 0.28 nm. ( iii) A 2D projection from the top through the model that indicates the atomic distribution near the pore waist. The atoms are depicted by space-filling models in which each Si is represented by a blue sphere with a 0.235-nm diameter and each N is a pink sphere with a 0.130-nm diameter. ( iv) A 3D perspective of space-filled representations of the pore model of (iii). For clarity, only atoms on the pore surface are depicted. ( B) Finite-element simulations of the electric field distribution along the vertical z axis of a pore with a 0.4-nm diameter at the waist and a biconical structure with a 10°/20° cone angle through a nominally 10-nm-thick silicon nitride membrane immersed in 250 mM NaCl at 0.6-V bias. The field is focused over an extent of 1.5 nm near the pore waist. Inset: Superimposed on a model of the pore topography is shown a heat map of the field distribution with a 20° cone angle [adapted from Rigo et al. ( 78)]. ( C) A schematic representation of the translocation of a protein through a subnanopore. The denatured protein is supposed to be rod-like. ( D) Consecutive current traces are shown that illustrate the distribution of the duration and fractional blockade currents associated with translocations of single molecules of CCL5 through a 0.5 × 0.6 nm 2 pore at 1V. In the figure, higher values correspond to larger blockade currents. ( E) A 400-blockade consensus (red) for CCL5 through a pore with a 0.5 × 0.6–nm 2 cross section, juxtaposed with an AA volume model (assuming k = 3; black) and a single highly correlated blockade [Pearson correlation coefficient (PCC) = 0.67; blue]. The error map above the plot indicates the read accuracy. ( F) A grayscale error map of 400 partitioned blockades illustrating correct reads and misreads [adapted from Kennedy et al. ( 75)]. ( G) A comparison between the signed error for AAs constituting H3.2 protein in order of increasing volume naïve volume (top) and random-forest (RF) regression model (bottom) for CCL5. The volume model underestimates signals associated with small volumes, whereas the RF model shows no bias. ( H) The median P value is shown as a function of the number of blockades in a cluster for H4 and H3.3 trained on H3.2. The solid lines represent exponential fits. The decoy database size is 10 5 for H4 and 5 × 10 6 for H3.3. The P value approaches zero for a consensus >10 [adapted from Kolmogorov et al. ( 77)].
All figures (7)
Proteomics technologies for the global identification and quantification of proteins.
Adv Protein Chem Struct Biol. 2010;80:1-44. doi: 10.1016/B978-0-12-381264-3.00001-1.
Adv Protein Chem Struct Biol. 2010.
Mass spectrometry at the interface of proteomics and genomics.
Mol Biosyst. 2011 Feb;7(2):284-91. doi: 10.1039/c0mb00168f. Epub 2010 Oct 21.
Mol Biosyst. 2011.
Introduction to Proteomics Technologies.
Methods Mol Biol. 2016;1362:3-27. doi: 10.1007/978-1-4939-3106-4_1.
Methods Mol Biol. 2016.
Next-Generation Sequencing: The Translational Medicine Approach from "Bench to Bedside to Population".
Medicines (Basel). 2016 Jun 2;3(2):14. doi: 10.3390/medicines3020014.
Medicines (Basel). 2016.
28930123 Free PMC article.
Corra: Computational framework and tools for LC-MS discovery and targeted mass spectrometry-based proteomics.
BMC Bioinformatics. 2008 Dec 16;9:542. doi: 10.1186/1471-2105-9-542.
BMC Bioinformatics. 2008.
19087345 Free PMC article.
Paracrine Mechanisms of Mesenchymal Stromal Cells in Angiogenesis.
Stem Cells Int. 2020 Mar 9;2020:4356359. doi: 10.1155/2020/4356359. eCollection 2020.
Stem Cells Int. 2020.
32215017 Free PMC article.
Boersma S., Khuperkar D., Verhagen B. M. P., Sonneveld S., Grimm J. B., Lavis L. D., Tanenbaum M. E., Multi-color single-molecule imaging uncovers extensive heterogeneity in mRNA decoding. Cell 178, 458–472.e19 (2019).
Sanger F., Tuppy H., The amino-acid sequence in the phenylalanyl chain of insulin. I. The identification of lower peptides from partial hydrolysates. Biochem. J. 49, 463–481 (1951).
Edman P., Högfeldt E., Sillén L. G., Kinell P.-O., Method for determination of the amino acid sequence in peptides. Acta Chem. Scand. 4, 283–293 (1995).
Holley R. W., Apgar J., Everett G. A., Madison J. T., Marquisee M., Merrill S. H., Penswick J. R., Zamir A., Structure of a ribonucleic acid. Science 147, 1462–1465 (1965).
Sanger F., Brownlee G. G., Barrell B. G., A two-dimensional fractionation procedure for radioactive nucleotides. J. Mol. Biol. 13, 373–398 (1965).
Research Support, Non-U.S. Gov't