Improving Single-Nucleotide Polymorphism-Based Fetal Fraction Estimation of Maternal Plasma Circulating Cell-Free DNA Using Bayesian Hierarchical Models

J Comput Biol. 2018 Sep;25(9):1040-1049. doi: 10.1089/cmb.2018.0056. Epub 2018 Jun 22.

Abstract

The recent advances in next-generation sequencing (NGS) technologies have enabled the development of effective high-throughput noninvasive prenatal screening (NIPS) assays for fetal genetic abnormalities using maternal circulating cell-free DNA (ccfDNA). An important NIPS quality assurance is quantifying the fetal proportion of the sampled ccfDNA. For methods using allelic read count ratios from targeted sequencing of single-nucleotide polymorphisms (SNPs), systematic biases and errors may reduce accuracy and diminish assay performance. We collected ccfDNA NIPS MiSeq sequencing data from an amplicon-based 92 SNP panel along with complementary low-depth whole-genome sequencing (WGS) on 243 normal male fetus pregnancies along with additional 144 nonpregnant female donor samples. Using fetal fraction estimates based on X and Y chromosome WGS coverage as gold standard, we compared an existing SNP-based approach, FetalQuant, to a more flexible Bayesian hierarchical modeling strategy that borrows information across interrogated SNPs to character SNP-level error rates and biases to improve fetal fraction estimates. Posterior distributions for SNP-level model parameters indicate most SNPs exhibited modest to moderate extrabinomial variation and a consistent underrepresentation of fetal alleles, with some extreme outliers in both regards. Fetal fraction estimates using FetalQuant, naive to these SNP properties, were relatively poor (R2 = 0.14, root mean squared error [RMSE] = 0.050), particularly when the true fetal fraction was low (<5%). In contrast, by quantifying SNP-level biases and error rates, our proposed approach demonstrated improved performance by reducing the bias and variability in fetal fraction estimates (R2 = 0.794, RMSE = 0.025). Using high-depth targeted SNP sequencing data, we identified a high degree of variability in distributional properties across SNP allelic read counts. These results highlight the benefits of leveraging hierarchical modeling for SNP-based fetal quantification assays (FQAs) and the need to properly calibrate FQAs dependent on NGS allelic ratio data.

Keywords: Bayesian hierarchical models; cell-free DNA; next-generation sequencing; noninvasive prenatal screening.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adolescent
  • Adult
  • Bayes Theorem*
  • Cell-Free Nucleic Acids / blood*
  • Female
  • Fetus / metabolism*
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Male
  • Maternal Serum Screening Tests / methods*
  • Middle Aged
  • Models, Statistical*
  • Polymorphism, Single Nucleotide*
  • Pregnancy
  • Sequence Analysis, DNA / methods*
  • Young Adult

Substances

  • Cell-Free Nucleic Acids