Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Dec;26(12):1639-1650.
doi: 10.1101/gr.205070.116. Epub 2016 Sep 19.

Translational plasticity facilitates the accumulation of nonsense genetic variants in the human population

Affiliations

Translational plasticity facilitates the accumulation of nonsense genetic variants in the human population

Sujatha Jagannathan et al. Genome Res. 2016 Dec.

Abstract

Genetic variants that disrupt protein-coding DNA are ubiquitous in the human population, with about 100 such loss-of-function variants per individual. While most loss-of-function variants are rare, a subset have risen to high frequency and occur in a homozygous state in healthy individuals. It is unknown why these common variants are well tolerated, even though some affect essential genes implicated in Mendelian disease. Here, we combine genomic, proteomic, and biochemical data to demonstrate that many common nonsense variants do not ablate protein production from their host genes. We provide computational and experimental evidence for diverse mechanisms of gene rescue, including alternative splicing, stop codon readthrough, alternative translation initiation, and C-terminal truncation. Our results suggest a molecular explanation for the mild fitness costs of many common nonsense variants and indicate that translational plasticity plays a prominent role in shaping human genetic diversity.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Putatively disabled genes exhibit normal levels of the encoded proteins. (A) Histogram of alternate allele frequencies for synonymous and nonsense variants identified by the 1000 Genomes Project (1000 Genomes Project Consortium 2012). (B) Genome-wide measurements of mRNA abundance (Lappalainen et al. 2013), mRNA:ribosome association (Battle et al. 2015), and protein abundance (Battle et al. 2015) in lymphoblastoid cell lines (LCLs) that we analyzed here. These data were available for 462, 72, and 62 individuals, respectively; however, we restricted our analyses to the 421, 62, and 50 individuals that were genotyped by sequencing. The represented populations are British in England and Scotland, Finnish in Finland, Toscani in Italy, Utah residents with Northern and Western European ancestry, and Yoruba in Ibadan, Nigeria. (SILAC) Stable isotope labeling by amino acids in cell culture (Ong et al. 2002). (C) Median RNA levels of CDSs containing nonsense variants, averaged across samples that are homozygous for the reference (0/0) or alternate (1/1) alleles. Error bars, first and fourth quartiles of expression across all 0/0 or 1/1 samples. Units are fragments per kilobase per million mapped reads (FPKM) of the CDS containing each variant. (N) Number of analyzed variants. Analysis based on the 194 nonsense variants stated in B but restricted to variants within Ensembl coding genes that were present as both 0/0 and 1/1 in samples with RNA-seq data. (D) As in C but illustrates mRNA:ribosome association as measured by ribosome profiling. Units are FPKM of the CDS containing each variant. Analysis based on the 118 nonsense variants stated in B but restricted to variants within Ensembl coding genes that were present as both 0/0 and 1/1 in samples with ribosome profiling data. (E) As in C but illustrates relative levels of proteins encoded by CDSs containing nonsense variants as measured by SILAC mass spectrometry. Units are protein levels relative to the sample standard (e.g., 20 indicates no change in protein levels relative to the standard), whose genotype for each variant is indicated by the point color. Protein levels were taken directly from Battle et al. (2015), who estimated protein levels as the median sample:standard ratio across all peptides arising from a single parent gene in each sample. Analysis based on the 106 nonsense variants stated in B but restricted to variants within Ensembl coding genes that were present as both 0/0 and 1/1 in samples with SILAC data. Fewer variants can be analyzed here than in C,D due to the low coverage of mass spectrometry data relative to RNA-seq or ribosome profiling. (F) Possible mechanisms to enable protein production from genes containing nonsense variants. Nonsense variants may be isoform specific, may result in N- or C-terminal truncation of the encoded protein, or may be subject to readthrough during translation.
Figure 2.
Figure 2.
Alternative splicing, promoter usage, and polyadenylation remove nonsense genetic variants from mature mRNA. (A) Percentages of variants that are isoform specific. Isoform-specific variants are defined as those that do not induce the indicated coding change within at least one RefSeq CDS of the parent gene or that lie within known alternatively spliced mRNA sequence. Plot restricted to variants that lie within genes containing nonsense variants. (N) Number of analyzed variants. Error bars, 95% confidence intervals as estimated by the binomial proportion test. (B) Median inclusion of variants lying within known alternatively spliced mRNA sequence across 16 human tissues. An inclusion level of 75% indicates that 75% of mRNAs transcribed from the parent gene contain the variant, while 25% do not. Inclusion was computed using the Body Map 2.0 data. (Notches) Approximate 95% confidence interval for the median. Plot restricted to variants that lie within genes containing nonsense variants. (C) Histogram of observed versus expected coverage of variants by ribosome footprints. Observed coverage was computed as the number of footprints overlapping each variant; expected coverage was computed as the total number of footprints overlapping each variant's host CDS, normalized such that the median of the ratio observed:expected over all variants was equal to one. All footprint coverage calculations were restricted to 0/0 samples to avoid potentially confounding effects of nonsense variants, and the plotted values indicate medians over those samples. (Black line) Best-fit normal distribution estimated from synonymous variants. Plot restricted to variants that lie within genes containing nonsense variants and that are present in samples for which ribosome profiling data were available. RNA-seq (D) and ribosome profiling (E) read coverage of a nonsense variant lying within a cassette exon of LGALS8, stratified by sample genotype. Coverage was normalized per sample to control for sequencing depth and then averaged over all samples with the indicated genotypes. Units are reads per million. (Red triangle) Location of nonsense variant. (F) LGALS8 protein levels relative to a sample standard with genotype 0/1. RNA-seq (G) and ribosome profiling (H) read coverage of a nonsense variant lying within an alternate 5′ exon of MOB3C. (I) MOB3C protein levels relative to a sample standard with genotype 0/1.
Figure 3.
Figure 3.
Stop codon readthrough likely enables translation of nonsense variant–containing mRNAs. Ribosome profiling read coverage of nonsense variants lying within PVRIG (A) and SLFN13 (B). Units and data normalization are as in Figure 2. Bottom plots are zoomed-in versions of the top plots. (Green bars) ATG start codons. (Full/half-height green bars) ATG codons that are/are not within a Kozak consensus context, defined as RnnATGG (R = A/G).
Figure 4.
Figure 4.
N- and C-terminal protein truncation enable translation of nonsense variant–containing mRNAs. (A) Positional distribution of variants within their host transcripts. Plot indicates the percentages of variants lying within the indicated deciles of their host CDSs’ lengths. Error bars, 95% confidence intervals as estimated by the binomial proportion test. Percentages of variants lying within the first 10% (B) or middle 10% (C) of their host CDS's length that have a downstream methionine (M) within 50 amino acids. Plot restricted to variants that lie within genes containing nonsense variants. Error bars, 95% confidence intervals as estimated by the binomial proportion test. (D) Percentages of variants lying within the last 10% of their host CDS's length. Plot restricted to variants that lie within genes containing nonsense variants. Ribosome profiling read coverage of nonsense variants lying within CCHCR1 (E) and ABHD14B (F). Units, data normalization, and notation are as in Figure 3. (Blue bars) CTG or GTG noncanonical start codons (no downstream ATG codons are present within the plot region). (Full/half-height blue bars) CTG or GTG codons that are/are not within a Kozak consensus context, defined as RnnSTGG (R = A/G, S = C/G).
Figure 5.
Figure 5.
A reporter system recapitulates translation of N- and C-terminally truncated proteins from mRNAs containing nonsense variants. (A) Design of reporter constructs carrying the reference (sense) or alternate (nonsense) alleles of selected genes. (B) Cartoon depicting how permissive translation may enable productive translation of mRNAs containing nonsense variants, as well as the expected protein products from our reporter constructs for each mechanism of permissive translation. (C) Western blot of total protein from HEK293 cells transfected with constructs containing the reference alleles of all candidate genes. The FLAG and HA tags are illustrated in green and red. (White arrows) Constructs that produced the expected protein with both FLAG and HA tags. (D) As in C, but for constructs containing either the reference (Ref) or alternate (Alt) alleles of the genes marked with white arrows in C. For CCHCR1, the “Alt1” and “Alt2” variants correspond to SNVs at genomic positions 31,124,849 and 31,125,257 on Chromosome 6.
Figure 6.
Figure 6.
Stop codon readthrough enables protein production from a gene containing a nonsense variant. (A) LCLs used in this study and their genotype for PVRIG and SLFN13. (B) Cartoon depicting the experimental setup for immunoprecipitation and subsequent Western blotting for PVRIG and SLFN13 to measure endogenous protein in the LCLs listed in A. (C) Western blots for PVRIG and SLFN13 on immunoprecipitates that were enriched for the corresponding proteins from the illustrated LCLs. Two distinct antibodies were separately used for protein enrichment, and the antibodies were used together for Western blot detection. (*) Expected molecular weight for PVRIG (detected by both antibodies) and SLFN13 (detected by neither antibody). (M) Marker lane containing the ladder. (+) Positive control for PVRIG (PVRIG overexpression lysate). (+) High exposure, lane for the positive control (green box) shown in a higher exposure to illustrate specificity of the assay. Note that even though nondenaturing elution limits the bulk release of the Protein G–bound antibodies that were used to enrich for the antigen, some antibody release is unavoidable and is visible as heavy and light chains, which migrate at 50 and 25 kDa. Those bands do not interfere with the detection of PVRIG and SLFN13, which are expected to migrate at 32 and 102 kDa, respectively. (D) Higher exposure of the relevant portion (red box) of the PVRIG blot in C. PVRIG was detected by both antibodies at the expected molecular weight in all cell lysates.

Similar articles

Cited by

References

    1. 1000 Genomes Project Consortium. 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491: 56–65. - PMC - PubMed
    1. Andrés AM, Dennis MY, Kretzschmar WW, Cannons JL, Lee-Lin S-Q, Hurle B; NISC Comparative Sequencing Program, Schwartzberg PL, Williamson SH, Bustamante CD, et al. 2010. Balancing selection maintains a form of ERAP2 that undergoes nonsense-mediated decay and affects antigen presentation. PLoS Genet 6: e1001157. - PMC - PubMed
    1. Ayadi A, Birling M-C, Bottomley J, Bussell J, Fuchs H, Fray M, Gailus-Durner V, Greenaway S, Houghton R, Karp N, et al. 2012. Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project. Mamm Genome 23: 600–610. - PMC - PubMed
    1. Battle A, Khan Z, Wang SH, Mitrano A, Ford MJ, Pritchard JK, Gilad Y. 2015. Genomic variation. Impact of regulatory variation from RNA to protein. Science 347: 664–667. - PMC - PubMed
    1. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6: 80–92. - PMC - PubMed

Publication types