StellarPGx: A Nextflow Pipeline for Calling Star Alleles in Cytochrome P450 Genes

Clin Pharmacol Ther. 2021 Sep;110(3):741-749. doi: 10.1002/cpt.2173. Epub 2021 Feb 28.

Abstract

Bioinformatics pipelines for calling star alleles (haplotypes) in cytochrome P450 (CYP) genes are important for the implementation of precision medicine. Genotyping CYP genes using high throughput sequencing data is complicated, e.g., by being highly polymorphic, not to mention the structural variations especially in CYP2D6, CYP2A6, and CYP2B6. Genome graph-based variant detection approaches have been shown to be reliable for genotyping HLA alleles. However, their application to enhancing star allele calling in CYP genes has not been extensively explored. We present StellarPGx, a Nextflow pipeline for accurately genotyping CYP genes by combining genome graph-based variant detection, read coverage information from the original reference-based alignments, and combinatorial diplotype assignments. The implementation of StellarPGx using Nextflow facilitates its portability, reproducibility, and scalability on various user platforms. StellarPGx is currently able to genotype 12 important pharmacogenes belonging to the CYP1, 2, and 3 families. For purposes of validation, we use CYP2D6 as a model gene owing to its high degree of polymorphisms (over 130 star alleles defined to date, including complex structural variants) and clinical importance. We applied StellarPGx and three existing callers to 109 whole genome sequenced samples for which the Genetic Testing Reference Material Coordination Program (GeT-RM) has recently provided consensus truth CYP2D6 diplotypes. StellarPGx had the highest CYP2D6 diplotype concordance (99%) with GeT-RM compared with Cyrius (98%), Aldy (82%), and Stargazer (84%). This exemplifies the high accuracy of StellarPGx and highlights its importance for both research and clinical pharmacogenomics applications. The StellarPGx pipeline is open-source and available from https://github.com/SBIMB/StellarPGx.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Computational Biology / methods
  • Cytochrome P-450 Enzyme System / genetics*
  • Genotype
  • Haplotypes / genetics*
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Pharmacogenetics / methods
  • Polymorphism, Genetic / genetics
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods
  • Whole Genome Sequencing / methods

Substances

  • Cytochrome P-450 Enzyme System