A pilot study on the application of statistical classification procedures to molecular epidemiological data

Holger Schwender; Manuela Zucknick; Katja Ickstadt; Hermann M Bolt; GENICA network

doi:10.1016/j.toxlet.2004.02.021

A pilot study on the application of statistical classification procedures to molecular epidemiological data

Toxicol Lett. 2004 Jun 15;151(1):291-9. doi: 10.1016/j.toxlet.2004.02.021.

Authors

Holger Schwender¹, Manuela Zucknick, Katja Ickstadt, Hermann M Bolt; GENICA network

Affiliation

¹ Department of Statistics, Collaborative Research Centre 475, University of Dortmund, Dortmund, Germany. holgers@statistik.uni-dortmund.de

PMID: 15177665
DOI: 10.1016/j.toxlet.2004.02.021

Abstract

The development of new statistical methods for use in molecular epidemiology comprises the building and application of appropriate classification rules. The aim of this study was to assess various classification methods that can potentially handle genetic interactions. A data set comprising genotypes at 25 single nucleotide polymorphic (SNP) loci from 518 breast cancer cases and 586 age-matched population-based controls from the GENICA study was used to built a classification rule with the discrimination methods SVM (support vector machine), CART (classification and regression tree), Bagging, Random Forest, LogitBoost and k nearest neighbours (kNN). A blind pilot analysis of the genotypic data set was a first approach to obtain an impression of the statistical structure of the data. Furthermore, this analysis was performed to explore classification methods that may be applied to molecular-epidemiological evaluation. The results showed that all blindly applied classification methods had a slightly smaller misclassification rate than a random classification. The findings, nevertheless, suggest that SNP data might be useful for the classification of individuals into categories of high or low risk of diseases.

Publication types

Comparative Study
Research Support, Non-U.S. Gov't

MeSH terms

Breast Neoplasms / epidemiology
Breast Neoplasms / genetics
Case-Control Studies
Data Interpretation, Statistical*
Discriminant Analysis
Female
Humans
Molecular Epidemiology / methods*
Pilot Projects
Polymorphism, Genetic
Polymorphism, Single Nucleotide / genetics