Preliminary implementation of new data mining techniques for the analysis of simulation data from Genetic Analysis Workshop 12: problem 2

Genet Epidemiol. 2001:21 Suppl 1:S390-5. doi: 10.1002/gepi.2001.21.s1.s390.

Abstract

We introduce a new data mining method applicable to complex disease genetics. Our approach is suited to a broad spectrum of diseases, identifying the noteworthy sharing of combinations of alleles in unrelated affected individuals. Furthermore, this approach may be extended to comprise the common types of genotype data, including single-nucleotide polymorphisms, candidate-gene sequences, etc. Using a method derived from data-mining computer algorithms, we analyze a data set of unrelated affected individuals chosen from the simulated pedigrees of problem 2 of the Genetics Analysis Workshop 12. We observe that most marker subsets containing a flanking marker for each of six or seven of the disease-gene loci yield significant numbers of individuals manifesting substantially similar genotypes. However, initial attempts (blind to the generating model) to identify the predisposing loci have not been successful. Refining our methods so that such loci may routinely be found and validated is underway.

MeSH terms

  • Algorithms
  • Alleles
  • Chromosome Mapping / statistics & numerical data
  • Data Collection / statistics & numerical data*
  • Genetic Markers / genetics
  • Genetic Predisposition to Disease / genetics*
  • Genotype
  • Humans
  • Mathematical Computing
  • Models, Statistical*
  • Polymorphism, Single Nucleotide / genetics
  • Software

Substances

  • Genetic Markers