Cleaning genotype data

Genet Epidemiol. 1999;17 Suppl 1:S79-83. doi: 10.1002/gepi.1370170714.


The identification of genes contributing to variation in complex phenotypes requires genetic data of high fidelity. Thus, the identification of pedigree and genotyping errors is a crucial prerequisite to the analysis of data from a genome scan for disease genes. The problem has been given little attention in most gene hunting papers; the focus has often been on eliminating mendelian inconsistencies in order that the analysis may proceed, rather than on achieving the best possible data. Though a number of computer programs are available to assist in the identification of genotyping and pedigree errors, the process is still not completely automated. While the Collaborative Study on the Genetics of Alcoholism (COGA) data set for GAW11 is completely compatible with Mendel's rules, there are still some errors present. We inspected the COGA data for the presence of additional errors, and identified five possible pedigree errors.

MeSH terms

  • Alcoholism / genetics
  • Databases, Factual
  • Female
  • Genetic Testing
  • Genome
  • Genotype*
  • Humans
  • Male
  • Nuclear Family
  • Pedigree*
  • Quality Control
  • Reproducibility of Results