Statistical modeling of STR capillary electrophoresis signal

BMC Bioinformatics. 2019 Dec 2;20(Suppl 16):584. doi: 10.1186/s12859-019-3074-0.


Background: In order to isolate an individual's genotype from a sample of biological material, most laboratories use PCR and Capillary Electrophoresis (CE) to construct a genetic profile based on polymorphic loci known as Short Tandem Repeats (STRs). The resulting profile consists of CE signal which contains information about the length and number of STR units amplified. For samples collected from the environment, interpretation of the signal can be challenging given that information regarding the quality and quantity of the DNA is often limited. The signal can be further compounded by the presence of noise and PCR artifacts such as stutter which can mask or mimic biological alleles. Because manual interpretation methods cannot comprehensively account for such nuances, it would be valuable to develop a signal model that can effectively characterize the various components of STR signal independent of a priori knowledge of the quantity or quality of DNA.

Results: First, we seek to mathematically characterize the quality of the profile by measuring changes in the signal with respect to amplicon size. Next, we examine the noise, allele, and stutter components of the signal and develop distinct models for each. Using cross-validation and model selection, we identify a model that can be effectively utilized for downstream interpretation. Finally, we show an implementation of the model in NOCIt, a software system that calculates the a posteriori probability distribution on the number of contributors.

Conclusion: The model was selected using a large, diverse set of DNA samples obtained from 144 different laboratory conditions; with DNA amounts ranging from a single copy of DNA to hundreds of copies, and the quality of the profiles ranging from pristine to highly degraded. Implemented in NOCIt, the model enables a probabilisitc approach to estimating the number of contributors to complex, environmental samples.

Keywords: Capillary electrophoresis; DNA degradation; STR genotyping; Stochastic analysis and modelling.

MeSH terms

  • Alleles
  • DNA / genetics
  • Electrophoresis, Capillary / methods*
  • Humans
  • Microsatellite Repeats / genetics*
  • Models, Statistical*
  • Probability
  • Software


  • DNA