Background: Recently, alignment-free sequence analysis methods have gained popularity in the field of personal genomics. These methods are based on counting frequencies of short k-mer sequences, thus allowing faster and more robust analysis compared to traditional alignment-based methods.
Results: We have created a fast alignment-free method, AluMine, to analyze polymorphic insertions of Alu elements in the human genome. We tested the method on 2,241 individuals from the Estonian Genome Project and identified 28,962 potential polymorphic Alu element insertions. Each tested individual had on average 1,574 Alu element insertions that were different from those in the reference genome. In addition, we propose an alignment-free genotyping method that uses the frequency of insertion/deletion-specific 32-mer pairs to call the genotype directly from raw sequencing reads. Using this method, the concordance between the predicted and experimentally observed genotypes was 98.7%. The running time of the discovery pipeline is approximately 2 h per individual. The genotyping of potential polymorphic insertions takes between 0.4 and 4 h per individual, depending on the hardware configuration.
Conclusions: AluMine provides tools that allow discovery of novel Alu element insertions and/or genotyping of known Alu element insertions from personal genomes within few hours.
Keywords: Alignment-free sequence analysis; Alu repeat element; Mobile element insertions.
Conflict of interest statement
Competing interestsThe authors declare that they have no competing interests.
Discovery and characterization of Alu repeat sequences via precise local read assembly.Nucleic Acids Res. 2015 Dec 2;43(21):10292-307. doi: 10.1093/nar/gkv1089. Epub 2015 Oct 25. Nucleic Acids Res. 2015. PMID: 26503250 Free PMC article.
African origin of human-specific polymorphic Alu insertions.Proc Natl Acad Sci U S A. 1994 Dec 6;91(25):12288-92. doi: 10.1073/pnas.91.25.12288. Proc Natl Acad Sci U S A. 1994. PMID: 7991620 Free PMC article.
TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data.Nucleic Acids Res. 2020 Apr 6;48(6):e36. doi: 10.1093/nar/gkaa074. Nucleic Acids Res. 2020. PMID: 32067044
Effects of Alu insertions on gene function.Electrophoresis. 1998 Jun;19(8-9):1260-4. doi: 10.1002/elps.1150190806. Electrophoresis. 1998. PMID: 9694261 Review.
Structural Variation of Alu Element and Human Disease.Genomics Inform. 2016 Sep;14(3):70-77. doi: 10.5808/GI.2016.14.3.70. Epub 2016 Sep 30. Genomics Inform. 2016. PMID: 27729835 Free PMC article. Review.