Inferring Demography From Runs of Homozygosity in Whole-Genome Sequence, With Correction for Sequence Errors

Mol Biol Evol. 2013 Sep;30(9):2209-23. doi: 10.1093/molbev/mst125. Epub 2013 Jul 10.

Abstract

Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.

Keywords: PSMC; effective population size; haplotype homozygosity; linkage disequilibrium; next generation sequencing.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cattle* / classification
  • Cattle* / genetics
  • Female
  • Genetic Loci
  • Genetics, Population*
  • Genome*
  • Haplotypes
  • Homozygote*
  • Linkage Disequilibrium
  • Male
  • Markov Chains
  • Models, Genetic
  • Phylogeny*
  • Polymorphism, Single Nucleotide
  • Population Density
  • Sequence Analysis, DNA
  • Time Factors