Genetic Diagnosis and Discovery Enabled by Large Language Models

Adv Sci (Weinh). 2026 Feb 8:e18656. doi: 10.1002/advs.202518656. Online ahead of print.

Abstract

Artificial intelligence (AI) has been used in many areas of medicine, and large language models (LLMs) have shown potential utility for various clinical applications. However, to determine if LLMs can accelerate the pace of genetic diagnosis and discovery, we examined whether recently developed LLMs (Med-PaLM 2 and Gemini) could assist in solving four types of genetic problems with sequentially increasing complexity. First, in response to free-text input, Med-PaLM 2 correctly identified murine genes with experimentally verified causative genetic factors for six previously studied murine models of biomedical traits. Second, Med-PaLM 2 identified a novel causative murine genetic factor for spontaneous hearing loss that was validated using knock-in mice. Third, we developed a retrieval and grounding pipeline that enabled Gemini 2.5 Pro to analyze large lists of genes, which contained genetic variants that were identified in the genomic sequences of 20 human subjects with hearing loss, and demonstrated that it can assist in identifying causative genetic factors for hearing loss. Fourth, we modified the genetic analysis pipeline to enable Gemini 2.5 Pro without any task-specific fine-tuning to identify causative genetic factors for six subjects with rare genetic diseases, which required 14 to 34 different terms to describe their multi-faceted symptom complexes. These results demonstrate that an AI pipeline can facilitate genetic diagnosis and discovery in mice and humans.

Keywords: artificial intelligence; genetic discovery; large language model.