Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Dec 15;34(24):4165-4171.
doi: 10.1093/bioinformatics/bty507.

snpAD: an ancient DNA genotype caller

Affiliations

snpAD: an ancient DNA genotype caller

Kay Prüfer. Bioinformatics. .

Abstract

Motivation: The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling.

Results: I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals.

Availability and implementation: The C++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic overview of the method implemented in snpAD
Fig. 2.
Fig. 2.
Accuracy of parameter estimation for simulated datasets. Left: Deviation from simulated genotype probabilities for the six heterozygous genotypes. Each simulation is indicated by a vertical blue dotted line and the estimates are shown as blue points. Estimates for 3-fold coverage deviated by more than 0.5 and are not visible at the depicted range. Right: Average deviation from simulated error probabilities
Fig. 3.
Fig. 3.
Parameter estimates for subsampled Vindija 33.19 data compared to full data. Left: Estimated genotype frequencies (points) compared to full data (shown as horizontal lines). Right: Average deviation from error rates in the full dataset. Estimates for 1-fold coverage fall outside of the plotted ranges
Fig. 4.
Fig. 4.
Genotype frequencies for autosomal sites with at least 4-fold coverage. Left: Vindija 33.19 full data and data from a single subsampled library. Violin plots show the distribution over Vindija 33.19 chromosomes. Right: Low-coverage Neandertals. Horizontal lines show genome-wide Vindija 33.19 estimate

Similar articles

Cited by

References

    1. Briggs A.W. et al. (2007) Patterns of damage in genomic DNA sequences from a Neandertal. Proc. Natl. Acad. Sci. USA, 104, 14616–14621. - PMC - PubMed
    1. Briggs A.W. et al. (2010) Removal of deaminated cytosines and detection of in vivo methylation in ancient DNA. Nucleic Acids Res., 38, e87.. - PMC - PubMed
    1. Ewing B., Green P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res., 8, 186–194. - PubMed
    1. Frederico L.A. et al. (1990) A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation energy. Biochemistry, 29, 2532–2537. - PubMed
    1. Gansauge M.-T., Meyer M. (2013) Single-stranded DNA library preparation for the sequencing of ancient or damaged DNA. Nat. Protoc., 8, 737–748. - PubMed

Publication types