Population-level annotation of lncRNAs in Arabidopsis reveals extensive expression variation associated with transposable element-like silencing

Plant Cell. 2023 Sep 8;koad233. doi: 10.1093/plcell/koad233. Online ahead of print.


Long non-coding RNAs (lncRNAs) are understudied and underannotated in plants. In mammals, lncRNA loci are nearly as ubiquitous as protein-coding genes, and their expression is highly variable between individuals of the same species. Using Arabidopsis thaliana as a model, we aimed to elucidate the true scope of lncRNA transcription across plants from different regions and study its natural variation. We used transcriptome deep sequencing datasets spanning hundreds of natural accessions and several developmental stages to create a population-wide annotation of lncRNAs, revealing thousands of previously unannotated lncRNA loci. While lncRNA transcription is ubiquitous in the genome, most loci appear to be actively silenced and their expression is extremely variable between natural accessions. This high expression variability is largely caused by the high variability of repressive chromatin levels at lncRNA loci. High variability was particularly common for intergenic lncRNAs (lincRNAs), where pieces of transposable elements (TEs) present in 50% of these lincRNA loci are associated with increased silencing and variation, and such lncRNAs tend to be targeted by the TE silencing machinery. We created a population-wide lncRNA annotation in Arabidopsis and improve our understanding of plant lncRNA genome biology, raising fundamental questions about what causes transcription and silencing across the genome.

Keywords: Arabidopsis thaliana; epigenetics; gene expression; lncRNA annotation; long non-coding RNAs; natural variation; transposon silencing.