Partial matches in heterogeneous offender databases do not call into question the validity of random match probability calculations

Int J Legal Med. 2009 Jan;123(1):59-63. doi: 10.1007/s00414-008-0239-1. Epub 2008 May 6.

Abstract

Offender DNA databases have been highly successful tools for generating investigative leads. Due to their success, the database sizes have increased such that some have suggested using the DNA profiles in offender databases for empirical pairwise studies to provide inferences regarding the validity of the current practices for generating random match probability estimates. These critics use observations under the assumption of independence to suggest that the current forensic DNA statistical calculations are invalid. However, some of these databases, such as CODIS, are not appropriate for such studies because they contain duplicate profiles and profiles of close relatives and are highly heterogeneous (i.e., comprised of individuals from many different population groups with unknown proportions). Observed departures from expectations will occur using these databases, but would have no relevance for questioning the reliability of statistical practices because the very heterogeneous data sets would be expected to violate the basic assumptions of independence. In addition, 9-, 10-, 11-, and 12-locus (out of 13 loci) matching profiles have been observed, are expected, and do not call into question the reliability of statistical practices. The phenomenon of matching profiles is similar to the concept of the birthday scenario. Regardless, simple computations under the assumption of independence for guideline purposes only show that partial matches observed in offender databases are not inconsistent with expectations. Indeed, computed random match probabilities that explain the observed matching profiles from pairwise comparisons are smaller than those observed based on routine casework calculations. Data analyses from offender databases based on assumptions of independence do not provide any basis for questioning the legitimacy of computations of random match probability values of any specific target profile based on the modified product rule that are currently followed in the DNA forensic community. Defined population data, which are sufficiently abundant, have already demonstrated the validity of the basic assumptions of DNA forensic statistical assumptions.

MeSH terms

  • DNA / genetics
  • DNA Fingerprinting*
  • Databases, Factual*
  • Forensic Genetics
  • Gene Frequency
  • Genotype
  • Humans
  • Models, Genetic*
  • Models, Statistical*
  • Probability*
  • Tandem Repeat Sequences

Substances

  • DNA