An expectation-maximization framework for comprehensive prediction of isoform-specific functions

Bioinformatics. 2023 Apr 3;39(4):btad132. doi: 10.1093/bioinformatics/btad132.

Abstract

Motivation: Advances in RNA sequencing technologies have achieved an unprecedented accuracy in the quantification of mRNA isoforms, but our knowledge of isoform-specific functions has lagged behind. There is a need to understand the functional consequences of differential splicing, which could be supported by the generation of accurate and comprehensive isoform-specific gene ontology annotations.

Results: We present isoform interpretation, a method that uses expectation-maximization to infer isoform-specific functions based on the relationship between sequence and functional isoform similarity. We predicted isoform-specific functional annotations for 85 617 isoforms of 17 900 protein-coding human genes spanning a range of 17 430 distinct gene ontology terms. Comparison with a gold-standard corpus of manually annotated human isoform functions showed that isoform interpretation significantly outperforms state-of-the-art competing methods. We provide experimental evidence that functionally related isoforms predicted by isoform interpretation show a higher degree of domain sharing and expression correlation than functionally related genes. We also show that isoform sequence similarity correlates better with inferred isoform function than with gene-level function.

Availability and implementation: Source code, documentation, and resource files are freely available under a GNU3 license at https://github.com/TheJacksonLaboratory/isopretEM and https://zenodo.org/record/7594321.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing
  • Humans
  • Motivation*
  • Protein Isoforms / genetics
  • Sequence Analysis, RNA
  • Software*

Substances

  • Protein Isoforms