Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 3;9(4):e93972.
doi: 10.1371/journal.pone.0093972. eCollection 2014.

Characterization of human pseudogene-derived non-coding RNAs for functional potential

Affiliations

Characterization of human pseudogene-derived non-coding RNAs for functional potential

Xingyi Guo et al. PLoS One. .

Abstract

Thousands of pseudogenes exist in the human genome and many are transcribed, but their functional potential remains elusive and understudied. To explore these issues systematically, we first developed a computational pipeline to identify transcribed pseudogenes from RNA-Seq data. Applying the pipeline to datasets from 16 distinct normal human tissues identified ∼ 3,000 pseudogenes that could produce non-coding RNAs in a manner of low abundance but high tissue specificity under normal physiological conditions. Cross-tissue comparison revealed that the transcriptional profiles of pseudogenes and their parent genes showed mostly positive correlations, suggesting that pseudogene transcription could have a positive effect on the expression of their parent genes, perhaps by functioning as competing endogenous RNAs (ceRNAs), as previously suggested and demonstrated with the PTEN pseudogene, PTENP1. Our analysis of the ENCODE project data also found many transcriptionally active pseudogenes in the GM12878 and K562 cell lines; moreover, it showed that many human pseudogenes produced small RNAs (sRNAs) and some pseudogene-derived sRNAs, especially those from antisense strands, exhibited evidence of interfering with gene expression. Further integrated analysis of transcriptomics and epigenomics data, however, demonstrated that trimethylation of histone 3 at lysine 9 (H3K9me3), a posttranslational modification typically associated with gene repression and heterochromatin, was enriched at many transcribed pseudogenes in a transcription-level dependent manner in the two cell lines. The H3K9me3 enrichment was more prominent in pseudogenes that produced sRNAs at pseudogene loci and their adjacent regions, an observation further supported by the co-enrichment of SETDB1 (a H3K9 methyltransferase), suggesting that pseudogene sRNAs may have a role in regional chromatin repression. Taken together, our comprehensive and systematic characterization of pseudogene transcription uncovers a complex picture of how pseudogene ncRNAs could influence gene and pseudogene expression, at both epigenetic and post-transcriptional levels.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Identification of transcribed pseudogenes from RNA-Seq data.
A) A schematic illustration of the key concept of filtering out reads not-uniquely matched to pseudogenes. Black and gray arrows represent perfectly matched and mismatched RNA-Seq reads, respectively, and the matched locations were kept. Yellow arrows represent a read initially put on a processed pseudogene but mapped back to the parent, based on aligning reads to coding sequences, because it is from an exon-exon junction. Green lines denote identical short sequences shared between gene and pseudogene. The left and right cartoons represent processed and duplicated pseudogenes, respectively. The bottom plots final read coverage on a pseudogene (red) and its parent (black), indicating that RNA-Seq signals have largely been resolved. B) Filtering effectively reduces the correlation between the number of mapped reads and sequence identity of a pseudogene to its parental gene. The number of mapped reads (y-axis) within every 200-bp region of a pseudogene is plotted against this region's sequence identity (x-axis) to the parental gene. Representative data for two tissues (brain and heart) were shown (top, before filtering; bottom, after filtering). C) Distributions of transcription values (i.e., FPKMs) of pseudogenes in all 16 tissues (the two vertical dash lines mark 1 and 10 FPKM, respectively). D) Distributions of the maximal FPKMs for lincRNAs, pseudogenes, their parents, and the rest of coding genes.
Figure 2
Figure 2. High tissue specificity of pseudogene transcription.
A) Heatmap for the transcription levels of 982 highly transcribed pseudogenes (maximal FPKM >10). B) Violin plots showing tissue-specificity JS scores of lincRNAs, transcribed pseudogenes, their parents, and the coding genes without pseudogenes. C) Comparison of JS scores at different transcription levels. The white dots mark median and the thick boxes mark the first and third quartile values.
Figure 3
Figure 3. Transcriptional correlations (ρpg:g) between pseudogenes and their parents.
A) A heatmap for distribution of ρpg:g, including data from separation of processed and duplicated pseudogenes into two groups based on the presence of a coding gene within 20 kb. The coefficients between transcribed pseudogenes and randomly chosen coding genes (top) were used as a control for p-value estimation. Colors represent relative numbers of pseudogenes in each ρpg:g range (in Z-score transformation). B) Pseudogenes transcribed in the sense direction (S) exhibited higher ρpg:g than those in the antisense (A). C) The transcriptional correlation between pseudogenes and their parents (ρpg:g) is inversely correlated to the transcriptional correlation between miRNAs and their putative targets (ρmiRNA:g). Genes were binned on their ρmiRNA:g values (x-axis) and then the mean and standard deviation of ρpg:g (y-axis) for each group of genes was plotted. D) Expression of parental genes targeted by miRNAs was less affected by miRNA KD than the targeting genes without pseudogenes. Only genes in response to KD (up >1.3 fold) were analyzed here. Y-axis shows the fold change of KD over control. The miRNA targets were experimentally determined by the CLASH analysis . The middle line in the boxplots mark median and the box lines mark the first and third quartile values (same for boxplots below).
Figure 4
Figure 4. Pseudogene transcription increases the mean and variance of parental gene expression.
A) A cartoon illustrating the computational procedure. For each pseudogene, we computed the means (μh and μl) and variances (Sh and Sl) of its parental gene expression values in the 8 tissue samples with more pseudogene transcripts and the 8 with fewer pseudogene transcripts. Distribution of mean (B) and variance (C) differences for all transcribed pseudogenes, pseudogenes with positive (ρpg:g>0.2) and negative (ρpg:g<−0.2) transcriptional correlation with their parents.
Figure 5
Figure 5. Pseudogene-derived sRNAs and their relationship to parental gene repression.
A) Processed pseudogenes had higher sRNA read densities than any other annotated genomic elements and randomly chosen genomic regions in both GM12878 and K562 cell lines. B) Pseudogenes with mapped sRNA reads (≥5 reads per kb) were separated into two groups based on the abundance of sRNA reads in the adjacent non-pseudogene regions (±1 kb, orange). Group I was considered to produce sRNA interactively with their parents while group II produced sRNA independently. Venn diagrams show the data comparison between GM12878 (red) and K562 (green). C) The parental genes of group I pseudogenes showed significantly lower expression than either those of the pseudogenes without sRNA (control) or those of the group II pseudogenes, in both GM12878 (red) and K562 (green). The parents of antisense transcribed pseudogenes (>5 sRNA/kb) exhibited even lower expression. The same trends held when the analysis was carried out for pseudogenes with >10 sRNA/kb. Parents not expressed in the 16 normal tissues (i.e., FPKM = 0) were not included in these plots.
Figure 6
Figure 6. Enrichment of H3K9me3 modification at transcribed pseudogene loci.
A) Heatmap of H3K36me3 near the transcription start sites (TSS) and transcription end sites (TES) of transcribed (bottom) and non-transcribed pseudogenes (top). The color scheme is based on column-based normalization data in GM12878, whereas each row is a pseudogene. B) Transcription level dependent enrichment of H3K9me3 at transcribed pseudogenes. Y-axis shows the average number of H3K9me3 ChIP-Seq reads per 500 bp. C) & D) The level of H3K9me3 (red) but not H3K27me3 (green) was significantly higher at group II pseudogenes (Fig. 5) than at group I pseudogenes or at pseudogenes loci producing no sRNAs (“C”, controls). The H3K9me3 level at a randomly selected set of LINE (blue) was also plotted as positive controls. Y-axis plots ChIP-Seq reads at pseudogene bodies, normalized to per 500-bp sequences. E) The densities of H3K36me3, H3K27me3, and H3K9me3 ChIP-Seq reads and sRNA-Seq reads at a region with multiple pseudogenes derived from a gene encoding NADH dehydrogenase. F–H) The average ChIP-Seq profiles, anchored on pseudogene centers, of H3K9me3 in GM12878 (F) and in K562 (G) and of SETDB1 in K562 (H) for the three groups of pseudogenes. Y-axes show the average numbers of ChIP-Seq reads per 100 bp.
Figure 7
Figure 7. Selection constraints on transcribed pseudogenes.
Comparison of nucleotide diversities in human population (A) and cross-species conservations (B) between non-transcribed (‘n’) and transcribed pseudogenes (‘y’). AluY, a young repeats that emerged recently in primates, was used as control. For duplicated pseudogenes, the median diversities for transcribed and non-transcribed are 0. 00051 and 0.00054 (p<0.02, Wilcoxon test), the values for processed pseudogenes are 0.00055 and 0.00064 (p<3e-06, Wilcoxon test).

Similar articles

Cited by

References

    1. Balakirev ES, Ayala FJ (2003) Pseudogenes: are they “junk” or functional DNA? Annu Rev Genet 37: 123–151. - PubMed
    1. Zheng D, Gerstein MB (2007) The ambiguous boundary between genes and pseudogenes: the dead rise up, or do they? Trends Genet 23: 219–224. - PubMed
    1. Mighell AJ, Smith NR, Robinson PA, Markham AF (2000) Vertebrate pseudogenes. FEBS Lett 468: 109–114. - PubMed
    1. Zhang Z, Harrison PM, Liu Y, Gerstein M (2003) Millions of years of evolution preserved: a comprehensive catalog of the processed pseudogenes in the human genome. Genome Res 13: 2541–2558. - PMC - PubMed
    1. Zhang Z, Gerstein M (2004) Large-scale analysis of pseudogenes in the human genome. Curr Opin Genet Dev 14: 328–335. - PubMed

Publication types

Substances