A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation
- PMID: 31178116
- PMCID: PMC6599575
- DOI: 10.1016/j.cell.2019.04.046
A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation
Abstract
Alternative polyadenylation (APA) is a major driver of transcriptome diversity in human cells. Here, we use deep learning to predict APA from DNA sequence alone. We trained our model (APARENT, APA REgression NeT) on isoform expression data from over 3 million APA reporters. APARENT's predictions are highly accurate when tasked with inferring APA in synthetic and human 3'UTRs. Visualizing features learned across all network layers reveals that APARENT recognizes sequence motifs known to recruit APA regulators, discovers previously unknown sequence determinants of 3' end processing, and integrates these features into a comprehensive, interpretable, cis-regulatory code. We apply APARENT to forward engineer functional polyadenylation signals with precisely defined cleavage position and isoform usage and validate predictions experimentally. Finally, we use APARENT to quantify the impact of genetic variants on APA. Our approach detects pathogenic variants in a wide range of disease contexts, expanding our understanding of the genetic origins of disease.
Keywords: MPRA; SNV; alternative polyadenylation; cis-regulation; deep learning; generative model; mRNA processing; machine learning; massively parallel reporter assay; single nucleotide variant; synthetic biology.
Copyright © 2019 Elsevier Inc. All rights reserved.
Conflict of interest statement
Declaration of Interests
The authors declare no competing interests.
Figures
Similar articles
-
Reprogramming of 3' untranslated regions of mRNAs by alternative polyadenylation in generation of pluripotent stem cells from different cell types.PLoS One. 2009 Dec 23;4(12):e8419. doi: 10.1371/journal.pone.0008419. PLoS One. 2009. PMID: 20037631 Free PMC article.
-
Deciphering the impact of genetic variation on human polyadenylation using APARENT2.Genome Biol. 2022 Nov 5;23(1):232. doi: 10.1186/s13059-022-02799-4. Genome Biol. 2022. PMID: 36335397 Free PMC article.
-
APA-Scan: detection and visualization of 3'-UTR alternative polyadenylation with RNA-seq and 3'-end-seq data.BMC Bioinformatics. 2022 Sep 28;23(Suppl 3):396. doi: 10.1186/s12859-022-04939-w. BMC Bioinformatics. 2022. PMID: 36171568 Free PMC article.
-
Implications of polyadenylation in health and disease.Nucleus. 2014;5(6):508-19. doi: 10.4161/nucl.36360. Epub 2014 Oct 31. Nucleus. 2014. PMID: 25484187 Free PMC article. Review.
-
Alternative polyadenylation analysis in animals and plants: newly developed strategies for profiling, processing and validation.Int J Biol Sci. 2018 Sep 7;14(12):1709-1714. doi: 10.7150/ijbs.27168. eCollection 2018. Int J Biol Sci. 2018. PMID: 30416385 Free PMC article. Review.
Cited by
-
A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq.Genomics Proteomics Bioinformatics. 2023 Feb;21(1):67-83. doi: 10.1016/j.gpb.2022.09.005. Epub 2022 Sep 24. Genomics Proteomics Bioinformatics. 2023. PMID: 36167284 Free PMC article. Review.
-
Scaffolding protein functional sites using deep learning.Science. 2022 Jul 22;377(6604):387-394. doi: 10.1126/science.abn2100. Epub 2022 Jul 21. Science. 2022. PMID: 35862514 Free PMC article.
-
Virtual Gene Concept and a Corresponding Pragmatic Research Program in Genetical Data Science.Entropy (Basel). 2021 Dec 23;24(1):17. doi: 10.3390/e24010017. Entropy (Basel). 2021. PMID: 35052043 Free PMC article. Review.
-
Effects of sequence motifs in the yeast 3' untranslated region determined from massively parallel assays of random sequences.Genome Biol. 2021 Oct 18;22(1):293. doi: 10.1186/s13059-021-02509-6. Genome Biol. 2021. PMID: 34663436 Free PMC article.
-
In silico design of DNA sequences for in vivo nucleosome positioning.Nucleic Acids Res. 2024 Jul 8;52(12):6802-6810. doi: 10.1093/nar/gkae468. Nucleic Acids Res. 2024. PMID: 38828788 Free PMC article.
References
-
- Alipanahi B, Delong A, Weirauch MT, and Frey BJ (2015). Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol 33, 831–838. - PubMed
-
- Ausländer S, Ausländer D, Müller M, Wieland M, and Fussenegger M (2012). Programmable single-cell mammalian biocomputers. Nature 487, 123–127. - PubMed
-
- Bennett CL, Brunkow ME, Ramsdell F, O’Briant KC, Zhu Q, Fuleihan RL, Shigeoka AO, Ochs HD, and Chance PF (2001). A rare polyadenylation signal mutation of the FOXP3 gene (AAUAAA-->AAUGAA) leads to the IPEX syndrome. Immunogenetics 53, 435–439. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Molecular Biology Databases
