Translating a GO Term List to Human Readable Function Description Using GO2Sum

Methods Mol Biol. 2025:2941:85-99. doi: 10.1007/978-1-0716-4623-6_5.

Abstract

Understanding the functions of proteins is one of the most important challenges in modern biology. Typically, protein function prediction methods generate a list of gene ontology (GO) terms, sometimes consisting of 50-100 functional terms. While GO serves the purpose of standardizing terms, interpreting a long list of GO terms is difficult for biologists. To address this challenge, we developed Gene Ontology terms Summarizer (GO2Sum), a language model-based summarizer that takes a list of GO terms as input and converts them into a concise, free-text summary describing the protein's function, subunit structure, and pathway information. GO2Sum was fine-tuned on GO term assignments and free-text function descriptions from UniProt entries. We built a Web server of GO2Sum, which offers an easy use of GO2Sum for biology users.

Keywords: Function prediction; GO2Sum; Gene ontology; LLM; Large language model; Protein function; Summarization.

MeSH terms

  • Computational Biology* / methods
  • Databases, Protein
  • Gene Ontology*
  • Humans
  • Proteins* / chemistry
  • Proteins* / genetics
  • Proteins* / metabolism
  • Software*

Substances

  • Proteins