Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 50 (6), 874-882

Multiplex Assessment of Protein Variant Abundance by Massively Parallel Sequencing


Multiplex Assessment of Protein Variant Abundance by Massively Parallel Sequencing

Kenneth A Matreyek et al. Nat Genet.


Determining the pathogenicity of genetic variants is a critical challenge, and functional assessment is often the only option. Experimentally characterizing millions of possible missense variants in thousands of clinically important genes requires generalizable, scalable assays. We describe variant abundance by massively parallel sequencing (VAMP-seq), which measures the effects of thousands of missense variants of a protein on intracellular abundance simultaneously. We apply VAMP-seq to quantify the abundance of 7,801 single-amino-acid variants of PTEN and TPMT, proteins in which functional variants are clinically actionable. We identify 1,138 PTEN and 777 TPMT variants that result in low protein abundance, and may be pathogenic or alter drug metabolism, respectively. We observe selection for low-abundance PTEN variants in cancer, and show that p.Pro38Ser, which accounts for ~10% of PTEN missense variants in melanoma, functions via a dominant-negative mechanism. Finally, we demonstrate that VAMP-seq is applicable to other genes, highlighting its generalizability.

Conflict of interest statement


The authors declare that the variant functional data presented herein are copyrighted, and may be freely used for noncommercial purposes. Licensing for commercial use may benefit the authors. The authors declare no additional competing financial interests.


Figure 1
Figure 1. Overview of Variant Abundance by Massively Parallel Sequencing (VAMP-seq)
A mixed population of cells each expressing one protein variant fused to EGFP is created. The variant dictates the abundance of the variant-EGFP fusion protein, resulting in a range of cellular EGFP fluorescence levels. Cells are then sorted into bins based on their level of fluorescence, and high throughput sequencing is used to quantify every variant in each bin. VAMP-seq scores are calculated from the scaled, weighted average of variants across bins. The resulting sequence-function maps describe the relative intracellular abundance of thousands of protein variants.
Figure 2
Figure 2. VAMP-seq abundance scores for PTEN and TPMT
a, Flow cytometry profiles for PTEN (left) and TPMT (right), with WT (red), known low-abundance variant controls (blue), and the variant libraries (gray) overlaid. Bin thresholds used to sort the library are shown above the plots. Each smoothed histogram was generated from at least 1,500 recombined cells from control constructs, and at least 6,000 recombined cells from the library. b, VAMP-seq abundance score density plots for PTEN (left) and TPMT (right) nonsense variants (blue dotted line), synonymous variants (red dotted line), and missense variants (filled, solid line). The missense variant densities are colored as gradients between the lowest 10% of abundance scores (blue), the WT abundance score (white), and abundance scores above WT (red). c, d, Heatmap of PTEN (c) and TPMT (d) abundance scores, colored according to the scale in b. Variants that were not scored are colored gray. e, f, Number of amino acid substitutions scored at each position for PTEN and TPMT. g, h, Positional median PTEN and TPMT abundance scores, computed for positions with a minimum of 5 variants, are shown as dots. The gray line represents the mean abundance score in a three-residue sliding window. i, j, PTEN and TPMT position-specific PSIC conservation scores are shown as dots, and the gray line represents the mean PSIC score within a three-residue sliding window. k, l, PTEN and TPMT domain architecture is shown, with positions in alpha helices and beta sheets colored cyan and pink, respectively.
Figure 3
Figure 3. Biochemical features influencing intracellular protein abundance
a, Scatterplots of variant abundance scores averaged over all twenty WT residues (left) or mutant residues (right) for PTEN (x-axis) and TPMT (y-axis). b, A scatterplot of Spearman’s rho values for PTEN (x-axis) or TPMT (y-axis) abundance score correlations with various evolutionary (red), structural (blue), or primary protein sequence (cyan) features (n = 3,411 for PTEN, n = 3,230 for TPMT). See legend of Supplementary Table 2 for information regarding these features. c, d, PTEN (c, PDB: 1d5r) and TPMT (d, PDB: 2h11) crystal structures are shown. Chains are colored according to positional median abundance scores using a gradient between the lowest 10% of positional median abundance scores (blue), the WT abundance score (white), and abundance scores above WT (red). The 20% of positions with the lowest scores are shown as a semi-transparent surface. The substrate mimicking compounds tartrate and S-adenosyl-L-homocysteine are displayed as magenta spheres. e, Low-abundance PTEN residues with predicted hydrogen bonds or salt bridges are shown as sticks with a semi-transparent surface representation. Residues within 11 Å of each other are clustered and colored as discrete groups. The residues in each group are identified by number, followed, in parentheses, by the number of times any variant at the residue is found in the COSMIC database. f, Residues with high abundance scores are shown as semi-transparent red spheres, and known membrane-interacting side-chains shown as opaque cyan spheres. Residues that are both membrane-interacting and have high abundance scores are shown in gray.
Figure 4
Figure 4. PTEN variant abundance classes across PHTS and cancer
a, A histogram of PTEN abundance scores for all missense variants observed in the experiment, with bars colored according to abundance classification. Abundance scores for three possibly benign variants present in the GnomAD database are shown as dots colored by classification. b, c, d, Abundance score histograms, colored by abundance classification, for PTEN germline variants listed in ClinVar as known pathogenic (b), likely pathogenic (c), or variants of uncertain significance (d). e, PTEN missense and nonsense variants in TCGA and the AACR GENIE project databases are arranged by cancer type. The top bar in each cancer type panel shows the observed frequency of variants in each abundance class as determined using VAMP-seq data. The bottom bar in each cancer type panel shows the expected abundance class frequencies based on cancer type-specific nucleotide substitution rates. Abundance classes are colored blue (low-abundance), light blue (possibly low-abundance), pink (possibly WT-like), or red (WT-like). The p.Pro38Ser variant is additionally colored with yellow stripes. The four known PTEN dominant negative variants are colored yellow. Variants not scored in the experiment are colored grey. n is the number of instances of PTEN variants observed in the indicated cancer type and also scored in our experiments. f, A western blot analysis of cells stably expressing WT or missense variants of N-terminally HA-tagged PTEN. This experiment was independently performed twice with similar results (See Supplementary Figure 5e).
Figure 5
Figure 5. TPMT variant abundance classes across pharmacogenomics phenotypes
a, A histogram of TPMT abundance scores for all missense variants observed in the experiment, with bars colored according to abundance classification (top; n = 1,529 data points). Abundance scores for variants previously identified and characterized in patients are shown as dots colored by classification. Variants found in gnomAD at frequencies higher than 4×10−6 are also shown (bottom; n = 118 data points). b, A scatterplot of abundance score and mean 6-MP dose tolerated by individuals heterozygous for each variant. Dose intensity is the dose at which 6-MP becomes toxic to the patient before the 100% protocol dose of 75 mg/m2. r and ρ denote Pearson’s and Spearman’s correlation coefficients, respectively.
Figure 6
Figure 6. Additional drug- and disease-related genes are compatible with VAMP-seq
Representative flow cytometry EGFP:mCherry smoothed histogram plots for WT (red) and known or predicted destabilized variants (blue) for VKOR, CYP2C9, CYP2C19, MLH1, PMS2, and LMNA. Each smoothed histogram was generated from at least 1,000 recombined cells. This experiment was independently performed three times with similar results.

Similar articles

See all similar articles

Cited by 28 PubMed Central articles

See all "Cited by" articles


    1. Shirts BH, Pritchard CC, Walsh T. Family-Specific Variants and the Limits of Human Genetics. Trends Mol Med. 2016;22:925–934. - PMC - PubMed
    1. Lek M, et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature. 2016;536:285–291. - PMC - PubMed
    1. Landrum MJ, et al. ClinVar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 2014;42:980–985. - PMC - PubMed
    1. Fowler DM, Stephany JJ, Fields S. Measuring the activity of protein variants on a large scale using deep mutational scanning. Nat Protoc. 2014;9:2267–2284. - PMC - PubMed
    1. Gasperini M, Starita L, Shendure J. The power of multiplexed functional analysis of genetic variants. Nat Protoc. 2016;11:1782–1787. - PMC - PubMed

Publication types