Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jan 29:17:14.
doi: 10.1186/s13059-016-0873-8.

Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans

Affiliations

Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans

Aleksandra E Kornienko et al. Genome Biol. .

Abstract

Background: Long non-coding RNAs (lncRNAs) are increasingly implicated as gene regulators and may ultimately be more numerous than protein-coding genes in the human genome. Despite large numbers of reported lncRNAs, reference annotations are likely incomplete due to their lower and tighter tissue-specific expression compared to mRNAs. An unexplored factor potentially confounding lncRNA identification is inter-individual expression variability. Here, we characterize lncRNA natural expression variability in human primary granulocytes.

Results: We annotate granulocyte lncRNAs and mRNAs in RNA-seq data from 10 healthy individuals, identifying multiple lncRNAs absent from reference annotations, and use this to investigate three known features (higher tissue-specificity, lower expression, and reduced splicing efficiency) of lncRNAs relative to mRNAs. Expression variability was examined in seven individuals sampled three times at 1- or more than 1-month intervals. We show that lncRNAs display significantly more inter-individual expression variability compared to mRNAs. We confirm this finding in two independent human datasets by analyzing multiple tissues from the GTEx project and lymphoblastoid cell lines from the GEUVADIS project. Using the latter dataset we also show that including more human donors into the transcriptome annotation pipeline allows identification of an increasing number of lncRNAs, but minimally affects mRNA gene number.

Conclusions: A comprehensive annotation of lncRNAs is known to require an approach that is sensitive to low and tight tissue-specific expression. Here we show that increased inter-individual expression variability is an additional general lncRNA feature to consider when creating a comprehensive annotation of human lncRNAs or proposing their use as prognostic or disease markers.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Defining the lncRNA transcriptome of human primary granulocytes. a Sample processing overview. b LncRNA identification overview. Granulocyte PolyA+ RNA-seq data from 10 donors was used for transcriptome assembly and filtered to create an annotation with 1,591 lncRNA loci containing 6,249 lncRNA transcripts (Additional file 1: Figures S1-3). c Positional classification of lncRNA loci relative to the nearest protein-coding gene. Twenty-five percent (402) are bidirectional (light gray), 33 % (530) are antisense (medium gray), and 42 % (659) are intergenic (dark gray). Positional classes are illustrated underneath (blue: protein-coding gene, green: lncRNA). d Example of a novel granulocyte antisense lncRNA locus. Top: 3' part of AJAP1 protein-coding gene (blue) and the novel antisense gra1110 lncRNA locus (green). Underneath: normalized to read number RNA-seq signal from sample D2-2_pa_100ss (Additional file 2B); GENCODE-v19 protein-coding genes (blue lines) and de novo annotated mRNAs (blue) and lncRNAs (green) showing lncRNA transcripts in locus gra1110 (Additional files 3, 4, and 6). e Overlap of granulocyte de novo lncRNA annotations (green) with commonly used public lncRNA annotations (gray) (RefSeq: 8,236 lncRNA transcripts, GENCODE-v19: 23,898 lncRNA transcripts, Cabili [14]: 21,630 lncRNA transcripts) and the ‘MiTranscriptome’ annotation (brown) [29]. f Validation of granulocyte de novo lncRNAs by cloning. Three de novo lncRNA loci (84, 152, 187) are shown (see also Additional file 1: Figures S4-S8). Top to bottom for each: scale and chromosome, de novo lncRNA transcript annotation in each locus (green isoforms), cloning result (black lines) showing BLAT alignment of the Sanger sequenced cloned cDNA
Fig. 2
Fig. 2
LncRNAs not in public annotations show less mRNA-like features. a Distribution of 6,249 granulocyte de novo annotated lncRNA transcripts according to coverage by three commonly used public annotations (PA): RefSeq, GENCODE-v19, Cabili [14, 58, 59]. Known lncRNA loci contain two transcript types: ‘PA transcripts’ that show full exonic overlap with an annotated lncRNA transcript (32 %, 2,003 transcripts, dark gray), or ‘isoform not in PA’ transcripts, that can share exons but contain one or more additional exons not present in public annotation (37 %, 2,331 transcripts medium gray). New lncRNA loci: contain 1,921 ‘not in PA’ transcripts (31 % of lncRNA transcripts identified in granulocytes, light gray). b An example of a publicly-annotated lncRNA locus (GENCODE-v19 AC007950.1) that contains additional upstream exons not in PA, from sample D2-2_pa_100ss (Additional file 2B). The annotation identifies locus gra912 (thick green bar). The annotated lncRNA isoforms of locus gra912 with alternative transcription start sites (TSS) are shown underneath as gray lines (the shorter PA transcript is shown in black for comparison). c Granulocyte-specificity analysis. Bar plot shows the percentage of granulocyte-specific (purple) and not-specific (light gray) transcripts de novo annotated in granulocytes. Each bar shows the percentage of granulocyte-specific transcripts for each transcript class while the dashed green line shows the percentage for all lncRNAs together. d Average expression level (RPKM) in granulocyte PolyA+ RNA-seq samples used for annotation. The median values are: all mRNA transcripts (blue): 6.14, all lncRNA transcripts (green dashed line): 0.65, lncRNA transcripts ‘in PA’ (dark gray): 1.00, lncRNA transcripts ‘isoform not in PA’ (medium gray): 0.68, lncRNA transcripts ‘not in PA’ (light gray): 0.47. e PolyA+ enrichment of de novo granulocyte annotated transcripts calculated as a ratio between abundance of a transcript in PolyA+ RNA and abundance in total ribosome-depleted RNA. Transcript abundance (RPKM) is averaged among all PolyA+ RNA-seq samples or all total RNA-Ribosomal depleted RNA-seq samples. Transcripts not detected in total RNA-seq data (average RPKM <0.2) were not analyzed. The median values are: all mRNA transcripts (blue): 2.62, all lncRNA transcripts (dashed green line): 1.56, lncRNA transcripts ‘in PA’ (dark gray): 1.80, lncRNA transcripts ‘isoform not in PA’ (medium gray): 1.54, lncRNA transcripts ‘not in PA’ (light gray): 1.29. f Splicing efficiency of de novo granulocyte annotated transcripts. Only transcripts with average RPKM >0.2 in 21 ribosomal-depleted RNA-seq samples were analyzed and the efficiency of the most efficiently-spliced site in each transcript is plotted. The median values are: all mRNA transcripts: 99.02 %, all lncRNA transcripts: 88.13 %, lncRNA transcripts ‘in PA’: 87.18 %, lncRNA transcripts ‘isoform not in PA’: 90.90 %, lncRNA transcripts ‘not in PA’: 77.97 %. Remarks to boxplots d, e, and f: the box plot displays the full population but P values are calculated using Mann–Whitney U test on equalized population sizes. *0.001 < P < 10-5, **10-5 < P < 10-10, ***P < 10-16. Green asterisks indicate the significance of the difference between mRNAs and all lncRNAs (only the median level is plotted as a dashed green line). Outliers are not displayed
Fig. 3
Fig. 3
Reproducibility of de novo lncRNA and mRNA expression. a Study overview. Top: the granulocyte de novo transcriptome annotation was generated from 10 healthy donors. Bottom: seven donors were sampled at three time points spaced by ≥1 month (Additional file 2A) and RNA was sequenced to assess intra-individual (using three time points from one donor) and inter-individual (using samples from seven different donors) expression reproducibility. b Granulocyte intra-individual (top) and inter-individual (bottom) expression reproducibility for de novo annotated lncRNAs (green) and mRNAs (blue). Transcripts detectable (RPKM >0.2) at each of three time points or not detected (RPKM <0.2) at any time point in all seven donors show intra-individual reproducibility. Transcripts detectable in each of seven donors (average RPKM of three replicates >0.2) show inter-individual reproducibility. Five expression bins were used: (1) 0.5 < RPKM ≤1; (2) 1 < RPKM ≤2; (3) 2 < RPKM ≤4; (4) 4 < RPKM ≤8; and (5) RPKM >8 (n = transcript number per bin). Chromosomes X, Y were discarded
Fig. 4
Fig. 4
LncRNAs are more variably expressed than mRNAs. a, b Genome wide inter-individual variability (normalized standard deviation between expression of each transcript/locus in granulocytes from seven donors) of de novo granulocyte lncRNA (green) and mRNA (blue) transcripts (a) and loci (b). Donor expression level is averaged from three replicates (***P <10–16). Median values: lncRNA transcripts: 0.29, mRNA transcripts: 0.15, lncRNA loci: 0.26, mRNA loci: 0.15. c LncRNA inter-individual expression variability allows correct clustering (normalized level among seven donors) of three time points per donor. Maximum transcript expression among all 21 samples is set to 1 (red), minimum is 0 (white). Clustering was performed using pheatmap function in R (clustering_distance_rows = ‘euclidean’, clustering_distance_cols = ‘correlation’). Only transcripts detected (RPKM >0.2) in at least one of the total RNA-seq samples were analyzed. Chromosomes X, Y were discarded. d Significance of granulocyte de novo lncRNA and mRNA expression variability in seven donors assessed by ANOVA test (the three time points are used as replicates). Bars show the percentage of significantly variable lncRNA (green) and mRNA (blue) transcripts (left) and loci (right). Criteria for calling a transcript/locus ‘significantly variable’: ANOVA test P value <0.01, FDR (Benjamini-Hochberg correction) <0.05, fold change between highest and lowest expression in seven donors >3. Only transcripts/loci with RPKM >0.2 in at least one donor are included. Chromosomes X and Y were discarded from the analysis. Total number analyzed: lncRNA transcripts 4,464, mRNA transcripts 119,412, lncRNA loci 658, mRNA loci 5,797. e Example of a significantly variable transcript from lncRNA locus gra896. Top: an alternative gra896 TSS overlaps the publicly-annotated lncRNA RP11-1008C21.1 locus. Underneath: normalized total RNA-seq signal for three replicates of four donors scaling from -0.001 (reverse strand, light gray) to 0.004 (forward strand, black). Calculated expression level of the annotated lncRNA transcript marked with * is shown for each RNA-seq track. Significance result for this transcript among seven donors: ANOVA test P = 10–7, FDR (Benjamini-Hochberg) = 10–6, expression fold change = 5.2). f Bidirectional lncRNA transcripts show reduced expression variability. Boxplots show inter-individual variability of lncRNA transcripts split according to their position relative to protein-coding genes as in Fig. 1c. Median normalized standard deviation values: bidirectional: 0.22, antisense: 0.29, intergenic: 0.30. Dashed blue line indicates median expression variability of all de novo mRNA transcripts. g Inter-individual expression variability is lower for known ‘in PA’ lncRNA transcripts compared to those newly annotated in granulocytes (‘not in PA’ and ‘isoform not in PA’). Median normalized standard deviation values: ‘not in PA’: 0.33, ‘isoform not in PA’: 0.30, ‘in PA’: 0.24. Dashed blue line indicates median expression variability of all de novo mRNA transcripts. Remarks to boxplots a, b, c, g: Transcripts/loci not expressed (RPKM <0.2) in any of seven donors (total RNA-seq data) and data from chromosomes X, Y were discarded and outliers are not displayed. The box plot displays the full population but P value is calculated using Mann–Whitney U test on equalized sample size. n.s. not significant, ***P <10–16
Fig. 5
Fig. 5
GEUVADIS RNA-seq data confirm increased lncRNA expression variability. a Sample processing overview: 462 lymphoblastoid cell lines (LCL) established from healthy donors by EBV transformation were processed by the GEUVADIS RNA-seq Project [50]. b LncRNA identification overview. We picked 20 unrelated donors (total of 522 million uniquely mapped reads) from 462 donors and processed the raw RNA-seq data through the same pipeline used to annotate lncRNAs in granulocytes (Additional file 1: Figure S24). The resulting LCL lncRNA transcriptome contained 2,611 lncRNA loci formed by 8,560 lncRNA transcripts. c Top: overlap between LCL and granulocyte de novo transcriptome annotations created in the study. A total of 536 of 2,611 LCL lncRNA loci overlap granulocyte loci. A total of 9,357 of 12,241 LCL de novo mRNA loci overlap granulocyte loci. Bottom: overlap of de novo lncRNA annotation in LCL with commonly used public annotations (PA): RefSeq, GENCODE-v19, and Cabili [14, 58, 59], and the MiTranscriptome annotation [29] identifies 295 new lncRNA loci. Of these, only 18 loci overlap the de novo lncRNA granulocyte annotation. d, e LncRNAs show higher expression variability than mRNAs in LCL. The boxplots show inter-individual variability of LCL lncRNA (green) and mRNA (blue) transcripts (d) and loci (e). Inter-individual variability is estimated by calculating standard deviation between expression of each transcript/locus in 462 donors normalized to the mean expression. Both transcripts and loci variability is significantly (***P <10–16) different between lncRNAs and mRNA. Median values: lncRNA transcripts: 0.56, mRNA transcripts: 0.24, lncRNA loci: 0.51, mRNA loci: 0.25. f Inter-individual expression variability is higher for newly annotated lncRNA transcripts in LCL. Boxplot shows inter-individual expression variability of LCL lncRNA transcripts split according to coverage by public annotations (PA), which is higher for ‘not in PA’ and ‘isoform not in PA’ lncRNA transcripts compared to ‘in PA’. Median normalized standard deviation values: not in PA: 0.66, isoform not in PA: 0.58, in PA: 0.46. Blue dashed line indicates median expression variability of all de novo mRNA transcripts in (d). Remarks to boxplots d, e, f: transcripts or loci not expressed (RPKM <0.2) in any of the 462 donors were discarded. The box plot displays the full population but P value is calculated using Mann–Whitney U test on equalized sample size (**P <10–10, ***P <10–16). Data from chromosomes X, Y were discarded and outliers are not displayed
Fig. 6
Fig. 6
GTEx RNA-seq data show increased lncRNA expression variability in multiple human tissues. Inter-individual variability of multi-exonic MiTranscriptome lncRNA (green) and mRNA (blue) transcripts analyzed in GTEx RNA-seq dataset [64]. Twenty donors per tissue are analyzed (Additional file 2J). Standard deviation is normalized to the mean expression among all 20 analyzed donors for each tissue. Only transcripts expressed in the given tissue in at least one donor (RPKM >0.2) are displayed. Number of transcripts in each box from left to right: LCL (lncRNAs: 28,571; mRNAs: 102,449), adipose (lncRNAs: 38,060; mRNAs: 113,688), artery (lncRNAs: 29,965; mRNAs: 108,082), cerebellum (lncRNAs: 44,912; mRNAs: 115,039), heart (lncRNAs: 32,827; mRNAs: 111,564), lung (lncRNAs: 39,909; mRNAs: 117,901), muscle (lncRNAs: 31,507; mRNAs: 106,099), nerve (lncRNAs: 39,167; mRNAs: 115,038), and thyroid (lncRNAs: 40,099; mRNAs: 116,206). Median expression values from left to right: LCL: 0.55, 0.27, adipose: 0.66, 0.32, artery: 0.59, 0.30, cerebellum: 0.60, 0.33, heart: 0.66, 0.36, lung: 0.63, 0.31, muscle: 0.85, 0.41, nerve: 0.54, 0.26, and thyroid: 0.56, 0.27. The box plots display the full population but P values are calculated using Mann–Whitney U test on equalized sample size (***P <10–16). Data from chromosomes X, Y were discarded and outliers are not displayed
Fig. 7
Fig. 7
Increasing donor number identifies more lncRNA loci. a Example of a highly variable LCL lncRNA locus lcl1580 not in public annotations. GENCODE-v19 annotates lncRNA RP11-555G19.1 and protein coding gene AP003062.1 transcribed in antisense direction to lcl1580 (top). Normalized non-strand-specific PolyA+ RNA-seq signal for three donors is displayed (scaling from 0 to 0.6). RPKM of the *transcript isoform is shown for each sample. b Analysis overview. GEUVADIS project LCL RNA-seq data from 120 donors was used to create 30 data pools (each with 100 million reads from two female (red) and two male (blue) donors) and to assemble 30 transcriptomes (Methods). An increasing number of assemblies (corresponding to from 4 to up to 120 donors) was merged to serve as input into the de novo lncRNA and mRNA identification pipeline (Additional file 1: Figure S1A). This created a series of LCL de novo lncRNA and mRNA annotations from an increasing number of donors. c LCL de novo lncRNA (green) and mRNA (blue) loci number annotated using increased donor number. Left: Y-axis for lncRNA loci (green). Right: Y-axis for mRNA loci (blue). The range of values is set to 3,500 on both Y-axes. Maximum number of lncRNA / mRNA loci annotated (at 120 donors): 4,166 / 12,857. Error bars: standard deviation of loci number between three replicates of random picking for each number of assemblies used (Additional file 11C)

Similar articles

Cited by

References

    1. Morris KV, Mattick JS. The rise of regulatory RNA. Nat Rev Genet. 2014;15(6):423–37. doi: 10.1038/nrg3722. - DOI - PMC - PubMed
    1. Bonasio R, Shiekhattar R. Regulation of transcription by long noncoding RNAs. Annu Rev Genet. 2014;48:433–55. doi: 10.1146/annurev-genet-120213-092323. - DOI - PMC - PubMed
    1. Geisler S, Coller J. RNA in unexpected places: long non-coding RNA functions in diverse cellular contexts. Nat Rev Mol Cell Biol. 2013;14(11):699–712. doi: 10.1038/nrm3679. - DOI - PMC - PubMed
    1. Quinodoz S, Guttman M. Long noncoding RNAs: an emerging link between gene regulation and nuclear organization. Trends Cell Biol. 2014;24(11):651–63. doi: 10.1016/j.tcb.2014.08.009. - DOI - PMC - PubMed
    1. Bergmann JH, Spector DL. Long non-coding RNAs: modulators of nuclear structure and function. Curr Opin Cell Biol. 2014;26:10–8. doi: 10.1016/j.ceb.2013.08.005. - DOI - PMC - PubMed

Publication types