Deciphering the impact of genetic variation on human polyadenylation using APARENT2
- PMID: 36335397
- PMCID: PMC9636789
- DOI: 10.1186/s13059-022-02799-4
Deciphering the impact of genetic variation on human polyadenylation using APARENT2
Abstract
Background: 3'-end processing by cleavage and polyadenylation is an important and finely tuned regulatory process during mRNA maturation. Numerous genetic variants are known to cause or contribute to human disorders by disrupting the cis-regulatory code of polyadenylation signals. Yet, due to the complexity of this code, variant interpretation remains challenging.
Results: We introduce a residual neural network model, APARENT2, that can infer 3'-cleavage and polyadenylation from DNA sequence more accurately than any previous model. This model generalizes to the case of alternative polyadenylation (APA) for a variable number of polyadenylation signals. We demonstrate APARENT2's performance on several variant datasets, including functional reporter data and human 3' aQTLs from GTEx. We apply neural network interpretation methods to gain insights into disrupted or protective higher-order features of polyadenylation. We fine-tune APARENT2 on human tissue-resolved transcriptomic data to elucidate tissue-specific variant effects. By combining APARENT2 with models of mRNA stability, we extend aQTL effect size predictions to the entire 3' untranslated region. Finally, we perform in silico saturation mutagenesis of all human polyadenylation signals and compare the predicted effects of [Formula: see text] million variants against gnomAD. While loss-of-function variants were generally selected against, we also find specific clinical conditions linked to gain-of-function mutations. For example, we detect an association between gain-of-function mutations in the 3'-end and autism spectrum disorder. To experimentally validate APARENT2's predictions, we assayed clinically relevant variants in multiple cell lines, including microglia-derived cells.
Conclusions: A sequence-to-function model based on deep residual learning enables accurate functional interpretation of genetic variants in polyadenylation signals and, when coupled with large human variation databases, elucidates the link between functional 3'-end mutations and human health.
Keywords: Deep learning; Explainable AI; Genomics; Neural networks; Polyadenylation; RNA; Untranslated region; Variant interpretation.
© 2022. The Author(s).
Conflict of interest statement
A.K. is a scientific co-founder of Ravel Biotechnology Inc.; is on the SAB of PatchBio Inc., SerImmune Inc., AINovo Inc., TensorBio Inc., and OpenTargets; is a consultant with Illumina Inc.; and owns shares in DeepGenomics Inc., Immuni Inc. and Freenome Inc. G.S. is a co-founder of Parse Biosciences and is on the SAB of Modulus Therapeutics.
Figures
Similar articles
-
A Deep Neural Network for Predicting and Engineering Alternative Polyadenylation.Cell. 2019 Jun 27;178(1):91-106.e23. doi: 10.1016/j.cell.2019.04.046. Epub 2019 Jun 6. Cell. 2019. PMID: 31178116 Free PMC article.
-
Implications of polyadenylation in health and disease.Nucleus. 2014;5(6):508-19. doi: 10.4161/nucl.36360. Epub 2014 Oct 31. Nucleus. 2014. PMID: 25484187 Free PMC article. Review.
-
Inference of the human polyadenylation code.Bioinformatics. 2018 Sep 1;34(17):2889-2898. doi: 10.1093/bioinformatics/bty211. Bioinformatics. 2018. PMID: 29648582 Free PMC article.
-
Genome-wide identification and predictive modeling of tissue-specific alternative polyadenylation.Bioinformatics. 2013 Jul 1;29(13):i108-16. doi: 10.1093/bioinformatics/btt233. Bioinformatics. 2013. PMID: 23812974 Free PMC article.
-
Emerging roles of alternative cleavage and polyadenylation (APA) in human disease.J Cell Physiol. 2022 Jan;237(1):149-160. doi: 10.1002/jcp.30549. Epub 2021 Aug 11. J Cell Physiol. 2022. PMID: 34378793 Review.
Cited by
-
TDP-43 nuclear loss in FTD/ALS causes widespread alternative polyadenylation changes.bioRxiv [Preprint]. 2024 Jan 22:2024.01.22.575730. doi: 10.1101/2024.01.22.575730. bioRxiv. 2024. PMID: 38328059 Free PMC article. Preprint.
-
Generative and predictive neural networks for the design of functional RNA molecules.bioRxiv [Preprint]. 2023 Jul 14:2023.07.14.549043. doi: 10.1101/2023.07.14.549043. bioRxiv. 2023. PMID: 37503279 Free PMC article. Preprint.
-
Quantifying 3'UTR length from scRNA-seq data reveals changes independent of gene expression.Nat Commun. 2024 May 14;15(1):4050. doi: 10.1038/s41467-024-48254-9. Nat Commun. 2024. PMID: 38744866 Free PMC article.
-
Active learning of enhancer and silencer regulatory grammar in photoreceptors.bioRxiv [Preprint]. 2023 Aug 22:2023.08.21.554146. doi: 10.1101/2023.08.21.554146. bioRxiv. 2023. PMID: 37662358 Free PMC article. Preprint.
-
Decoding biology with massively parallel reporter assays and machine learning.Genes Dev. 2024 Oct 16;38(17-20):843-865. doi: 10.1101/gad.351800.124. Genes Dev. 2024. PMID: 39362779 Free PMC article. Review.
References
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Molecular Biology Databases
Research Materials
