An ABC Method for Whole-Genome Sequence Data: Inferring Paleolithic and Neolithic Human Expansions

Mol Biol Evol. 2019 Jul 1;36(7):1565-1579. doi: 10.1093/molbev/msz038.


Species generally undergo a complex demographic history consisting, in particular, of multiple changes in population size. Genome-wide sequencing data are potentially highly informative for reconstructing this demographic history. A crucial point is to extract the relevant information from these very large data sets. Here, we design an approach for inferring past demographic events from a moderate number of fully sequenced genomes. Our new approach uses Approximate Bayesian Computation, a simulation-based statistical framework that allows 1) identifying the best demographic scenario among several competing scenarios and 2) estimating the best-fitting parameters under the chosen scenario. Approximate Bayesian Computation relies on the computation of summary statistics. Using a cross-validation approach, we show that statistics such as the lengths of haplotypes shared between individuals, or the decay of linkage disequilibrium with distance, can be combined with classical statistics (e.g., heterozygosity and Tajima's D) to accurately infer complex demographic scenarios including bottlenecks and expansion periods. We also demonstrate the importance of simultaneously estimating the genotyping error rate. Applying our method on genome-wide human-sequence databases, we finally show that a model consisting in a bottleneck followed by a Paleolithic and a Neolithic expansion is the most relevant for Eurasian populations.

Keywords: Approximate Bayesian Computation; demographic inference; human expansions; population genetics; whole-genome sequence data.

Publication types

  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bayes Theorem
  • Genetics, Population / methods*
  • Genome, Human*
  • Human Migration*
  • Humans
  • Models, Genetic*
  • Whole Genome Sequencing