Utility of long-read sequencing for All of Us

Nat Commun. 2024 Jan 29;15(1):837. doi: 10.1038/s41467-024-44804-3.

Abstract

The All of Us (AoU) initiative aims to sequence the genomes of over one million Americans from diverse ethnic backgrounds to improve personalized medical care. In a recent technical pilot, we compare the performance of traditional short-read sequencing with long-read sequencing in a small cohort of samples from the HapMap project and two AoU control samples representing eight datasets. Our analysis reveals substantial differences in the ability of these technologies to accurately sequence complex medically relevant genes, particularly in terms of gene coverage and pathogenic variant identification. We also consider the advantages and challenges of using low coverage sequencing to increase sample numbers in large cohort analysis. Our results show that HiFi reads produce the most accurate results for both small and large variants. Further, we present a cloud-based pipeline to optimize SNV, indel and SV calling at scale for long-reads analysis. These results lead to widespread improvements across AoU.

MeSH terms

  • Genome, Human
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • INDEL Mutation
  • Population Health*
  • Sequence Analysis, DNA / methods