Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Dec;41(22):10044-61.
doi: 10.1093/nar/gkt818. Epub 2013 Sep 13.

Long non-coding RNA identification over mouse brain development by integrative modeling of chromatin and genomic features

Affiliations

Long non-coding RNA identification over mouse brain development by integrative modeling of chromatin and genomic features

Jie Lv et al. Nucleic Acids Res. 2013 Dec.

Abstract

In silico prediction of genomic long non-coding RNAs (lncRNAs) is prerequisite to the construction and elucidation of non-coding regulatory network. Chromatin modifications marked by chromatin regulators are important epigenetic features, which can be captured by prevailing high-throughput approaches such as ChIP sequencing. We demonstrate that the accuracy of lncRNA predictions can be greatly improved when incorporating high-throughput chromatin modifications over mouse embryonic stem differentiation toward adult Cerebellum by logistic regression with LASSO regularization. The discriminating features include H3K9me3, H3K27ac, H3K4me1, open reading frames and several repeat elements. Importantly, chromatin information is suggested to be complementary to genomic sequence information, highlighting the importance of an integrated model. Applying integrated model, we obtain a list of putative lncRNAs based on uncharacterized fragments from transcriptome assembly. We demonstrate that the putative lncRNAs have regulatory roles in vicinity of known gene loci by expression and Gene Ontology enrichment analysis. We also show that the lncRNA expression specificity can be efficiently modeled by the chromatin data with same developmental stage. The study not only supports the biological hypothesis that chromatin can regulate expression of tissue-specific or developmental stage-specific lncRNAs but also reveals the discriminating features between lncRNA and coding genes, which would guide further lncRNA identifications and characterizations.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Selected informative features determined by the strict threshold using binomial logistic regression with LASSO regularization. Feature weights in predicting lncRNAs are with respect to log lambda, a penalty to shrink feature weights in the regression model. Weights of features with less discriminative power of the lncRNAs and protein-coding genes shrink to 0 as lambda is increasing. Informative features are those with above-zero weights based on lambda value determined by cross-validation.
Figure 2.
Figure 2.
The genomic property of putative lncRNAs with developmental stage specificity, compared with known lncRNAs and protein-coding genes with developmental stage specificity. (A) Putative lncRNAs display shorter transcript length than that of known lncRNAs and known protein-coding genes. (B) Putative lncRNAs display fewer number of exons than that of known lncRNAs and known protein-coding genes. (C) Putative lncRNAs display lower PhastCons conservation scores than that of known lncRNAs and known protein-coding genes. (D) Putative lncRNAs display comparable ORF length with that of known lncRNAs and shorter ORF length than that of known protein-coding genes.
Figure 3.
Figure 3.
Novel putative enhancer related lncRNAs located between Dlk1 and Meg3 in mouse genome, which are supported by histone modification patterns and known literature. These lncRNAs are ∼35 kb downstream of Dlk1 and ∼40 kb upstream of Meg3. Coverage plots of several histone modification ChIP-seq data in E14.5 brain are shown at the bottom. Data for each chromatin modification are shown as a ‘wiggle’ track of extended reads. Other genomic annotations derived from the UCSC Genome Browser database are also shown with the direction of transcription indicated by arrows. The visualization is based on a local mirror of the UCSC Genome Browser. Chromosome coordinates (mm9) are shown on top of this figure.
Figure 4.
Figure 4.
The average profile of PolII ChIP-seq tags and CAGE tags around TSS and within gene body for stage-specific lncRNAs in E14.5 brain. (A) The TSS of lncRNAs is enriched with PolII tags over basal levels, where PolII density is aligned around TSS with ± 3000 bp extensions. The average signal represents the average number of reads per 100-bp interval. (B) The gene body of lncRNAs normalized by length of 3000 bp with 1000-bp extension from TSS toward upstream and TTS toward downstream is enriched with PolII tags over basal levels. (C) The TSS of lncRNAs is enriched with CAGE tags over basal levels, where CAGE tag density is aligned around TSS with ± 3000-bp extensions. (D) The gene body of lncRNAs normalized by length of 3000 bp with 1000-bp extension from TSS toward upstream and TTS toward downstream is enriched with CAGE tags over basal levels. The size of the gene body of all lncRNAs is scaled to 3000 bp for comparison (Meta-gene defined by the CEAS package).
Figure 5.
Figure 5.
Developmental stage-specific lncRNAs are positively associated with neighboring protein-coding genes with regard to gene expression level. (A) A hypothetical model is proposed, assuming stage-specific lncRNAs can regulate neighboring protein-coding genes and therefore have a positive coexpression relationship. (B) We analyze the known protein-coding gene expression changes from E14.5 brain to CB and find that 75% of protein-coding genes closest to the E14.5-specific lncRNAs are downregulated, whereas 71% of protein-coding genes closest to the CB-specific lncRNAs are upregulated. (C) We analyze the expression changes from E14.5 brain to CB for neighboring lncRNA–lncRNA pairs as a control and found that 55% of neighboring lncRNAs around the E14.5-specific lncRNAs are downregulated, whereas 47% of neighboring lncRNAs around the CB-specific lncRNAs are upregulated.
Figure 6.
Figure 6.
GO enrichment analysis of genes close to developmental stage-specific lncRNAs in E14.5 brain. The GO enrichment is done from DAVID (FDR < 0.01) and is followed by clustering of resulting function terms with significant numbers of shared genes using Enrichment map. Dense gene functions are surrounded by circles with function terms labeled aside. Line thickness between connected nodes is proportional to gene numbers shared between terms. Twenty-three terms indicated by dark gray are enriched by q < 0.021 filtering. Many terms are related to Brain development, Neuron differentiation and Transcriptional regulation.
Figure 7.
Figure 7.
GO enrichment analysis of genes close to developmental stage-specific lncRNAs in adult CB. Terms are related to Neuron differentiation, Transcriptional regulation and Synaptic transmission.
Figure 8.
Figure 8.
Selected informative features of embryonic E14.5 brain expression specificity determined by the strict threshold using multinomial logistic regression with LASSO regularization. Feature weights in predicting expression specificity of embryonic E14.5 brain lncRNAs with respect to log lambda, a penalty to shrink feature weights in the regression model. Weights of features with less discriminative power of the lncRNAs expressed in embryonic E14.5 brain shrink to 0 as lambda is increasing. Informative features of embryonic E14.5 brain lncRNAs are those with above-zero weights based on lambda value determined by cross-validation.

Similar articles

Cited by

References

    1. Maher B. ENCODE: The human encyclopaedia. Nature. 2012;489:46–48. - PubMed
    1. Mercer TR, Dinger ME, Mattick JS. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 2009;10:155–159. - PubMed
    1. Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H, et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002;420:563–573. - PubMed
    1. Guttman M, Amit I, Garber M, French C, Lin MF, Feldser D, Huarte M, Zuk O, Carey BW, Cassady JP, et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature. 2009;458:223–227. - PMC - PubMed
    1. Banfai B, Jia H, Khatun J, Wood E, Risk B, Gundling WE, Jr, Kundaje A, Gunawardena HP, Yu Y, Xie L, et al. Long noncoding RNAs are rarely translated in two human cell lines. Genome Res. 2012;22:1646–1657. - PMC - PubMed

Publication types