JoGo 1.0: the ACTG hierarchical nomenclature and database covering 4.7 million haplotypes across 19,194 human genes

Nucleic Acids Res. 2026 Jan 6;54(D1):D1159-D1173. doi: 10.1093/nar/gkaf1232.

Abstract

The Joint Open Genome and Omics Platform 1.0 (JoGo) is a global, long-read-based human haplotype database covering 19 194 MANE-standardized protein-coding genes. JoGo introduces a novel ACTG hierarchical nomenclature-A (amino acid), C (coding), T (transcript), and G (gene body)-that assigns numeric identifiers in descending order of global frequency. Using high-fidelity long-read sequencing, we assembled haplotype-resolved contigs for 258 globally sampled genomes, including 108 sequenced in-house. We cataloged 174 376 A-, 300 610 C-, 486 288 T-, and 3 695 204 G-level haplotypes (4 656 478 in total). Haplotype IDs are assigned once globally across all sequences, including those originating from GRCh38 and CHM13v2 reference assemblies, embedding reference haplotypes within the same frequency-ranked space and enabling direct cross-assembly comparison. JoGo maps functional variants from ClinVar, GWAS Catalog, and GTEx onto their corresponding ACTG-haplotypes and provides haplotype-expression QTL results from 1280 HapMap RNA-seq samples across three independent studies. The web portal provides flexible search by gene name, variant ID, or ACTG code. It offers both an interactive online viewer and a privacy-preserving local viewer for secure integration with user data. JoGo enables high-resolution exploration of haplotype diversity, facilitating the identification of functional variants relevant to gene regulation, disease associations, and precision medicine. JoGo 1.0 is freely accessible at https://jogo.csml.org.

Plain language summary

JoGo 1.0 presents a simple, hierarchical naming system (ACTG) and a database that together organize 4.7 million haplotypes spanning 19,194 human genes. A haplotype is a set of DNA variants that tend to be inherited together. Today, many genes have dozens to thousands of known haplotypes, but their labels are inconsistent, hard to compare, and often tied to a single study. Our ACTG scheme gives each haplotype an intuitive, stable identifier that reflects its relationships to others, while the JoGo database links names to sequence, frequency and functional annotations. This standard makes genetic results easier to share, reproduce and interpret across studies, populations and clinical settings, accelerating research and supporting future precision medicine. The database is available from https://jogo.csml.org/.

MeSH terms

  • Databases, Genetic*
  • Genome, Human*
  • Genomics / methods
  • Haplotypes*
  • Humans
  • Polymorphism, Single Nucleotide
  • Software*
  • Terminology as Topic*