Genetic ancestry inference using support vector machines, and the active emergence of a unique American population

Eur J Hum Genet. 2013 May;21(5):554-62. doi: 10.1038/ejhg.2012.258. Epub 2012 Dec 5.


We use genotype data from the Marshfield Clinical Research Foundation Personalized Medicine Research Project to investigate genetic similarity and divergence between Europeans and the sampled population of European Americans in Central Wisconsin, USA. To infer recent genetic ancestry of the sampled Wisconsinites, we train support vector machines (SVMs) on the positions of Europeans along top principal components (PCs). Our SVM models partition continent-wide European genetic variance into eight regional classes, which is an improvement over the geographically broader categories of recent ancestry reported by personal genomics companies. After correcting for misclassification error associated with the SVMs (<10%, in all cases), we observe a >14% discrepancy between insular ancestries reported by Wisconsinites and those inferred by SVM. Values of FST as well as Mantel tests for correlation between genetic and European geographic distances indicate minimal divergence between Europe and the local Wisconsin population. However, we find that individuals from the Wisconsin sample show greater dispersion along higher-order PCs than individuals from Europe. Hypothesizing that this pattern is characteristic of nascent divergence, we run computer simulations that mimic the recent peopling of Wisconsin. Simulations corroborate the pattern in higher-order PCs, demonstrate its transient nature, and show that admixture accelerates the rate of divergence between the admixed population and its parental sources relative to drift alone. Together, empirical and simulation results suggest that genetic divergence between European source populations and European Americans in Central Wisconsin is subtle but already under way.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Analysis of Variance
  • Computer Simulation
  • Databases, Genetic
  • Genetic Variation*
  • Genetics, Population
  • Genotype
  • Humans
  • Principal Component Analysis
  • Support Vector Machine*
  • White People / genetics*
  • Wisconsin