fastNGSadmix: admixture proportions and principal component analysis of a single NGS sample

Emil Jørsboe; Kristian Hanghøj; Anders Albrechtsen

doi:10.1093/bioinformatics/btx474

fastNGSadmix: admixture proportions and principal component analysis of a single NGS sample

Bioinformatics. 2017 Oct 1;33(19):3148-3150. doi: 10.1093/bioinformatics/btx474.

Authors

Emil Jørsboe¹, Kristian Hanghøj^{2

3}, Anders Albrechtsen¹

Affiliations

¹ Department of Biology, The Bioinformatics Centre, University of Copenhagen, 2200 Copenhagen N, Denmark.
² Center for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, 1350 Copenhagen K, Denmark.
³ Université de Toulouse, University Paul Sabatier (UPS), Laboratoire AMIS, CNRS UMR 5288, Toulouse, France.

PMID: 28957500
DOI: 10.1093/bioinformatics/btx474

Abstract

Motivation: Estimation of admixture proportions and principal component analysis (PCA) are fundamental tools in populations genetics. However, applying these methods to low- or mid-depth sequencing data without taking genotype uncertainty into account can introduce biases.

Results: Here we present fastNGSadmix, a tool to fast and reliably estimate admixture proportions and perform PCA from next generation sequencing data of a single individual. The analyses are based on genotype likelihoods of the input sample and a set of predefined reference populations. The method has high accuracy, even at low sequencing depth and corrects for the biases introduced by small reference populations.

Availability and implementation: The admixture estimation method is implemented in C ++ and the PCA method is implemented in R. The code is freely available at http://www.popgen.dk/software/index.php/FastNGSadmix.

Contact: emil.jorsboe@bio.ku.dk.

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Genetics, Population / methods
Genotype
High-Throughput Nucleotide Sequencing / methods*
Humans
Principal Component Analysis*
Probability
Software*