Parasite infection of public databases: a data mining approach to identify apicomplexan contaminations in animal genome and transcriptome assemblies
- PMID: 28103801
- PMCID: PMC5244568
- DOI: 10.1186/s12864-017-3504-1
Parasite infection of public databases: a data mining approach to identify apicomplexan contaminations in animal genome and transcriptome assemblies
Abstract
Background: Contaminations from various exogenous sources are a common problem in next-generation sequencing. Another possible source of contaminating DNA are endogenous parasites. On the one hand, undiscovered contaminations of animal sequence assemblies may lead to erroneous interpretation of data; on the other hand, when identified, parasite-derived sequences may provide a valuable source of information.
Results: Here we show that sequences deriving from apicomplexan parasites can be found in many animal genome and transcriptome projects, which in most cases derived from an infection of the sequenced host specimen. The apicomplexan sequences were extracted from the sequence assemblies using a newly developed bioinformatic pipeline (ContamFinder) and tentatively assigned to distinct taxa employing phylogenetic methods. We analysed 920 assemblies and found 20,907 contigs of apicomplexan origin in 51 of the datasets. The contaminating species were identified as members of the apicomplexan taxa Gregarinasina, Coccidia, Piroplasmida, and Haemosporida. For example, in the platypus genome assembly, we found a high number of contigs derived from a piroplasmid parasite (presumably Theileria ornithorhynchi). For most of the infecting parasite species, no molecular data had been available previously, and some of the datasets contain sequences representing large amounts of the parasite's gene repertoire.
Conclusion: Our study suggests that parasite-derived contaminations represent a valuable source of information that can help to discover and identify new parasites, and provide information on previously unknown host-parasite interactions. We, therefore, argue that uncurated assembly data should routinely be made available in addition to the final assemblies.
Keywords: Apicomplexa; Coccidia; Contamination; Database analysis; Gregarinasina; Haemosporida; Malaria; Parasites; Phylogeny; Piroplasmida.
Figures
Similar articles
-
Wider than Thought Phylogenetic Occurrence of Apicortin, A Characteristic Protein of Apicomplexan Parasites.J Mol Evol. 2016 Jun;82(6):303-14. doi: 10.1007/s00239-016-9749-5. Epub 2016 Jun 9. J Mol Evol. 2016. PMID: 27282556
-
In silico hybridization enables transcriptomic illumination of the nature and evolution of Myxozoa.BMC Genomics. 2015 Oct 23;16:840. doi: 10.1186/s12864-015-2039-6. BMC Genomics. 2015. PMID: 26494377 Free PMC article.
-
Next generation sequencing from Hepatozoon canis (Apicomplexa: Coccidia: Adeleorina): Complete apicoplast genome and multiple mitochondrion-associated sequences.Int J Parasitol. 2019 Apr;49(5):375-387. doi: 10.1016/j.ijpara.2018.12.001. Epub 2019 Feb 19. Int J Parasitol. 2019. PMID: 30790556
-
Why the -omic future of Apicomplexa should include gregarines.Biol Cell. 2020 Jun;112(6):173-185. doi: 10.1111/boc.202000006. Epub 2020 Apr 7. Biol Cell. 2020. PMID: 32176937 Review.
-
Phylogeny and evolution of apicoplasts and apicomplexan parasites.Parasitol Int. 2015 Jun;64(3):254-9. doi: 10.1016/j.parint.2014.10.005. Epub 2014 Oct 14. Parasitol Int. 2015. PMID: 25451217 Review.
Cited by
-
Where Have All the Diagnostic Morphological Parasitologists Gone?J Clin Microbiol. 2022 Nov 16;60(11):e0098622. doi: 10.1128/jcm.00986-22. Epub 2022 Oct 31. J Clin Microbiol. 2022. PMID: 36314793 Free PMC article.
-
Phylogenomics from transcriptomic "bycatch" clarify the origins and diversity of avian trypanosomes in North America.PLoS One. 2020 Oct 8;15(10):e0240062. doi: 10.1371/journal.pone.0240062. eCollection 2020. PLoS One. 2020. PMID: 33031471 Free PMC article.
-
A Bioinformatics Guide to Plant Microbiome Analysis.Front Plant Sci. 2019 Oct 23;10:1313. doi: 10.3389/fpls.2019.01313. eCollection 2019. Front Plant Sci. 2019. PMID: 31708944 Free PMC article. Review.
-
Humic-acid-driven escape from eye parasites revealed by RNA-seq and target-specific metabarcoding.Parasit Vectors. 2020 Aug 28;13(1):433. doi: 10.1186/s13071-020-04306-9. Parasit Vectors. 2020. PMID: 32859251 Free PMC article.
-
Apicortin, a Constituent of Apicomplexan Conoid/Apical Complex and Its Tentative Role in Pathogen-Host Interaction.Trop Med Infect Dis. 2021 Jun 30;6(3):118. doi: 10.3390/tropicalmed6030118. Trop Med Infect Dis. 2021. PMID: 34209186 Free PMC article. Review.
References
-
- Naccache SN, Greninger AL, Lee D, Coffey LL, Phan T, Rein-Weston A, Aronsohn A, Hackett JJ, Delwart EL, Chiu CY. The perils of pathogen discovery: origin of a novel parvovirus-like hybrid genome traced to nucleic acid extraction spin columns. J Virol. 2013;87:11966–11977. doi: 10.1128/JVI.02323-13. - DOI - PMC - PubMed
Publication types
MeSH terms
Associated data
LinkOut - more resources
Full Text Sources
Other Literature Sources
