Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 28;15(5):e1008175.
doi: 10.1371/journal.pgen.1008175. eCollection 2019 May.

A Statistical Model for Reference-Free Inference of Archaic Local Ancestry

Free PMC article

A Statistical Model for Reference-Free Inference of Archaic Local Ancestry

Arun Durvasula et al. PLoS Genet. .
Free PMC article


Statistical analyses of genomic data from diverse human populations have demonstrated that archaic hominins, such as Neanderthals and Denisovans, interbred or admixed with the ancestors of present-day humans. Central to these analyses are methods for inferring archaic ancestry along the genomes of present-day individuals (archaic local ancestry). Methods for archaic local ancestry inference rely on the availability of reference genomes from the ancestral archaic populations for accurate inference. However, several instances of archaic admixture lack reference archaic genomes, making it difficult to characterize these events. We present a statistical method that combines diverse population genetic summary statistics to infer archaic local ancestry without access to an archaic reference genome. We validate the accuracy and robustness of our method in simulations. When applied to genomes of European individuals, our method recovers segments that are substantially enriched for Neanderthal ancestry, even though our method did not have access to any Neanderthal reference genomes.

Conflict of interest statement

The authors have declared that no competing interests exist.


Fig 1
Fig 1. Outline of the demographic model used for training ArchIE.
We simulate a population starting at size N0 and splitting into archaic and modern human (MH) populations at time T0. The MH population splits into a reference and target population of size N1 and N2, respectively, at time Ts. Then, at time Ta, the archaic population admixes with the target population with an associated admixture proportion m. We use data simulated from this model to train a logistic regression classifier.
Fig 2
Fig 2. ArchIE obtains improved accuracy over related methods.
(A) Precision-Recall (PR) and (B) Receiver Operator Characteristic (ROC) curves for ArchIE (black circles), S* (red crosses), and S’ (purple triangles) in a 2% admixture scenario with a Human-Neanderthal demography. The dashed line corresponds to a false discovery rate of 20%.
Fig 3
Fig 3. Relative importance of the features used as input to ArchIE.
We examined the log of the absolute value of the standardized weights associated with each of the features included in the logistic regression model underlying ArchIE. Negative values indicate standardize weights with absolute values less than 1. (A) The individual frequency spectrum mostly has small weights and lower frequency entries generally have larger weights associated with them. (B) The first three entries indicate the moments of the distance vector. The minimum distance to the reference population, skew, and variance of the distance vector have the largest weights associated with them.
Fig 4
Fig 4. ArchIE is robust to misspecification in the demographic model.
We tested ArchIE on data simulated after perturbing single demographic parameters lower (left, orange) and higher (right, blue) relative to their values in the training data. Values are reported as log10 fold changes compared to the baseline model performance. We report (a, b) recall and (c,d) precision at the threshold that gives a precision of 0.8 on the unperturbed test data (P(archaic) = 0.62).
Fig 5
Fig 5. Application of ArchIE to 1000 Genomes European population (CEU).
(A) Percentage of genome called archaic as a function of the threshold on the probability of archaic ancestry estimated by ArchIE. The dashed line refers to the threshold that yields a 20% FDR in simulations. (B) Mean Neanderthal match statistic (higher implies more similar to the sequenced Altai Neanderthal genome) for haplotypes inferred as archaic vs non-archaic as a function of the probability threshold. (C) Frequency of haplotypes confidently labeled as archaic near the BNC2 gene and (D) the OAS gene cluster. (E) Mean frequency of confidently archaic segments increases with B-statistic (a measure of selective constraint). Low B-statistic denotes more selectively constrained regions (standard errors estimates are obtained using a 1 Mb block jackknife).

Similar articles

See all similar articles

Cited by 1 article


    1. Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature. 2016;538(7624):201 10.1038/nature18964 - DOI - PMC - PubMed
    1. Sankararaman S, Mallick S, Dannemann M, Prüfer K, Kelso J, Pääbo S, et al. The genomic landscape of Neanderthal ancestry in present-day humans. Nature. 2014;507(7492):354–357. 10.1038/nature12961 - DOI - PMC - PubMed
    1. Vernot B, Akey JM. Resurrecting Surviving Neandertal Lineages from Modern Human Genomes. Science. 2014;343(6174):1017–1021. 10.1126/science.1245938 - DOI - PubMed
    1. Simonti CN, Vernot B, Bastarache L, Bottinger E, Carrell DS, Chisholm RL, et al. The phenotypic legacy of admixture between modern humans and Neandertals. Science. 2016;351(6274):737–741. 10.1126/science.aad2149 - DOI - PMC - PubMed
    1. McCoy RC, Wakefield J, Akey JM. Impacts of Neanderthal-Introgressed Sequences on the Landscape of Human Gene Expression. Cell. 2017;168(5):916–927.e12. 10.1016/j.cell.2017.01.038 - DOI - PMC - PubMed

Publication types