A principal components-based clustering method to identify variants associated with complex traits

Hum Hered. 2011;71(1):50-8. doi: 10.1159/000323567. Epub 2011 Mar 10.


Background: Multivariate methods ranging from joint SNP to principal components analysis (PCA) have been developed for testing multiple markers in a region for association with disease and disease-related traits. However, these methods suffer from low power and/or the inability to identify the subset of markers contributing to evidence for association under various scenarios.

Methods: We introduce orthoblique principal components-based clustering (OPCC) as an alternative approach to identify specific subsets of markers showing association with a quantitative outcome of interest. We demonstrate the utility of OPCC using simulation studies and an example from the literature on type 2 diabetes.

Results: Compared to traditional methods, OPCC has similar or improved power under various scenarios of linkage disequilibrium structure and genotype availability. Most importantly, our simulations show how OPCC accurately parses large numbers of markers to a subset containing the causal variant or its proxy.

Conclusion: OPCC is a powerful and efficient data reduction method for detecting associations between gene variants and disease-related traits. Unlike alternative methodologies, OPCC has the ability to isolate the effect of causal SNP(s) from among large sets of markers in a candidate region. Therefore, OPCC is an improvement over PCA for testing multiple SNP associations with phenotypes of interest.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Cluster Analysis
  • Computer Simulation
  • Diabetes Mellitus, Type 2 / genetics
  • Genetic Predisposition to Disease
  • Genome-Wide Association Study*
  • Genotype
  • Hepatocyte Nuclear Factor 4 / genetics
  • Humans
  • Models, Genetic
  • Multifactorial Inheritance / genetics*
  • Phenotype
  • Polymorphism, Single Nucleotide
  • Principal Component Analysis*
  • Transcription Factor 7-Like 2 Protein / genetics


  • HNF4A protein, human
  • Hepatocyte Nuclear Factor 4
  • TCF7L2 protein, human
  • Transcription Factor 7-Like 2 Protein