Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph
- PMID: 32943086
- PMCID: PMC7499850
- DOI: 10.1186/s13059-020-02160-7
Removing reference bias and improving indel calling in ancient DNA data analysis by mapping to a sequence variation graph
Abstract
Background: During the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA molecules are short and frequently mutated by post-mortem chemical modifications. These features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Alternative approaches have been developed to replace the linear reference with a variation graph which includes known alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for aDNA and compare with existing methods.
Results: We use vg to align simulated and real aDNA samples to a variation graph containing 1000 Genome Project variants and compare with the same data aligned with bwa to the human linear reference genome. Using vg leads to a balanced allelic representation at polymorphic sites, effectively removing reference bias, and more sensitive variant detection in comparison with bwa, especially for insertions and deletions (indels). Alternative approaches that use relaxed bwa parameter settings or filter bwa alignments can also reduce bias but can have lower sensitivity than vg, particularly for indels.
Conclusions: Our findings demonstrate that aligning aDNA sequences to variation graphs effectively mitigates the impact of reference bias when analyzing aDNA, while retaining mapping sensitivity and allowing detection of variation, in particular indel variation, that was previously missed.
Keywords: Ancient DNA; Reference bias; Sequence alignment; Variation graph.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures
Similar articles
-
Systematic benchmark of ancient DNA read mapping.Brief Bioinform. 2021 Sep 2;22(5):bbab076. doi: 10.1093/bib/bbab076. Brief Bioinform. 2021. PMID: 33834210
-
Improving ancient DNA read mapping against modern reference genomes.BMC Genomics. 2012 May 10;13:178. doi: 10.1186/1471-2164-13-178. BMC Genomics. 2012. PMID: 22574660 Free PMC article.
-
Analysis of optimal alignments unfolds aligners' bias in existing variant profiles.BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):349. doi: 10.1186/s12859-016-1216-1. BMC Bioinformatics. 2016. PMID: 27766935 Free PMC article.
-
Calling known variants and identifying new variants while rapidly aligning sequence data.J Dairy Sci. 2019 Apr;102(4):3216-3229. doi: 10.3168/jds.2018-15172. Epub 2019 Feb 14. J Dairy Sci. 2019. PMID: 30772032
-
Toward high-resolution population genomics using archaeological samples.DNA Res. 2016 Aug;23(4):295-310. doi: 10.1093/dnares/dsw029. Epub 2016 Jul 19. DNA Res. 2016. PMID: 27436340 Free PMC article. Review.
Cited by
-
Assessing the impact of post-mortem damage and contamination on imputation performance in ancient DNA.Sci Rep. 2024 Mar 14;14(1):6227. doi: 10.1038/s41598-024-56584-3. Sci Rep. 2024. PMID: 38486065 Free PMC article.
-
Introgressions lead to reference bias in wheat RNA-seq analysis.BMC Biol. 2024 Mar 7;22(1):56. doi: 10.1186/s12915-024-01853-w. BMC Biol. 2024. PMID: 38454464 Free PMC article.
-
Ancient genomes illuminate Eastern Arabian population history and adaptation against malaria.Cell Genom. 2024 Mar 13;4(3):100507. doi: 10.1016/j.xgen.2024.100507. Epub 2024 Feb 27. Cell Genom. 2024. PMID: 38417441 Free PMC article.
-
Minimizing Reference Bias with an Impute-First Approach.bioRxiv [Preprint]. 2023 Dec 2:2023.11.30.568362. doi: 10.1101/2023.11.30.568362. bioRxiv. 2023. PMID: 38076784 Free PMC article. Preprint.
-
Pan-genome de Bruijn graph using the bidirectional FM-index.BMC Bioinformatics. 2023 Oct 26;24(1):400. doi: 10.1186/s12859-023-05531-6. BMC Bioinformatics. 2023. PMID: 37884897 Free PMC article.
References
-
- Brunson K, Reich D. The promise of paleogenomics beyond our own species. Trends Genet. 2019. 10.1016/j.tig.2019.02.006. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Miscellaneous
