Understanding the functions of proteins is one of the most important challenges in modern biology. Typically, protein function prediction methods generate a list of gene ontology (GO) terms, sometimes consisting of 50-100 functional terms. While GO serves the purpose of standardizing terms, interpreting a long list of GO terms is difficult for biologists. To address this challenge, we developed Gene Ontology terms Summarizer (GO2Sum), a language model-based summarizer that takes a list of GO terms as input and converts them into a concise, free-text summary describing the protein's function, subunit structure, and pathway information. GO2Sum was fine-tuned on GO term assignments and free-text function descriptions from UniProt entries. We built a Web server of GO2Sum, which offers an easy use of GO2Sum for biology users.
Keywords: Function prediction; GO2Sum; Gene ontology; LLM; Large language model; Protein function; Summarization.
© 2025. The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature.