A large-scale dataset of single and mixed-source short tandem repeat profiles to inform human identification strategies: PROVEDIt

Forensic Sci Int Genet. 2018 Jan;32:62-70. doi: 10.1016/j.fsigen.2017.10.006. Epub 2017 Oct 24.


DNA-based human identity testing is conducted by comparison of PCR-amplified polymorphic Short Tandem Repeat (STR) motifs from a known source with the STR profiles obtained from uncertain sources. Samples such as those found at crime scenes often result in signal that is a composite of incomplete STR profiles from an unknown number of unknown contributors, making interpretation an arduous task. To facilitate advancement in STR interpretation challenges we provide over 25,000 multiplex STR profiles produced from one to five known individuals at target levels ranging from one to 160 copies of DNA. The data, generated under 144 laboratory conditions, are classified by total copy number and contributor proportions. For the 70% of samples that were synthetically compromised, we report the level of DNA damage using quantitative and end-point PCR. In addition, we characterize the complexity of the signal by exploring the number of detected alleles in each profile.

Keywords: Forensic DNA; Human identification; PROVEDIt; STR database; STRs.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Alleles
  • DNA Damage
  • DNA Fingerprinting*
  • Datasets as Topic*
  • Forensic Genetics
  • Genotype
  • Humans
  • Microsatellite Repeats*
  • Polymerase Chain Reaction