Indexing Graphs for Path Queries with Applications in Genome Research
- PMID: 26355784
- DOI: 10.1109/TCBB.2013.2297101
Indexing Graphs for Path Queries with Applications in Genome Research
Abstract
We propose a generic approach to replace the canonical sequence representation of genomes with graph representations, and study several applications of such extensions. We extend the Burrows-Wheeler transform (BWT) of strings to acyclic directed labeled graphs, to support path queries as an extension to substring searching. We develop, apply, and tailor this technique to a) read alignment on an extended BWT index of a graph representing pan-genome, i.e., reference genome and known variants of it; and b) split-read alignment on an extended BWT index of a splicing graph. Other possible applications include probe/primer design, alignments to assembly graphs, and alignments to phylogenetic tree of partial-order graphs. We report several experiments on the feasibility and applicability of the approach. Especially on highly-polymorphic genome regions our pan-genome index is making a significant improvement in alignment accuracy.
Similar articles
-
Efficient Construction of a Complete Index for Pan-Genomics Read Alignment.J Comput Biol. 2020 Apr;27(4):500-513. doi: 10.1089/cmb.2019.0309. Epub 2020 Mar 16. J Comput Biol. 2020. PMID: 32181684 Free PMC article.
-
Hardness of Covering Alignment: Phase Transition in Post-Sequence Genomics.IEEE/ACM Trans Comput Biol Bioinform. 2019 Jan-Feb;16(1):23-30. doi: 10.1109/TCBB.2018.2831691. Epub 2018 Apr 30. IEEE/ACM Trans Comput Biol Bioinform. 2019. PMID: 29994032
-
CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform.Bioinformatics. 2012 Jul 15;28(14):1830-7. doi: 10.1093/bioinformatics/bts276. Epub 2012 May 9. Bioinformatics. 2012. PMID: 22576173
-
Phylogenetic understanding of clonal populations in an era of whole genome sequencing.Infect Genet Evol. 2009 Sep;9(5):1010-9. doi: 10.1016/j.meegid.2009.05.014. Epub 2009 May 27. Infect Genet Evol. 2009. PMID: 19477301 Review.
-
Ten years of pan-genome analyses.Curr Opin Microbiol. 2015 Feb;23:148-54. doi: 10.1016/j.mib.2014.11.016. Epub 2014 Dec 5. Curr Opin Microbiol. 2015. PMID: 25483351 Review.
Cited by
-
Multi-omics profiling reveal responses of three major Dendrobium species from different growth years to medicinal components.Front Plant Sci. 2024 Feb 23;15:1333989. doi: 10.3389/fpls.2024.1333989. eCollection 2024. Front Plant Sci. 2024. PMID: 38463561 Free PMC article.
-
Genes involved in auxin biosynthesis, transport and signalling underlie the extreme adventitious root phenotype of the tomato aer mutant.Theor Appl Genet. 2024 Mar 8;137(4):76. doi: 10.1007/s00122-024-04570-8. Theor Appl Genet. 2024. PMID: 38459215 Free PMC article.
-
Systems-wide view of host-pathogen interactions across COVID-19 severities using integrated omics analysis.iScience. 2024 Feb 2;27(3):109087. doi: 10.1016/j.isci.2024.109087. eCollection 2024 Mar 15. iScience. 2024. PMID: 38384846 Free PMC article.
-
Dataset from transcriptome profiling of Musa resistant and susceptible cultivars in response to Fusarium oxysporum f.sp. cubense race1 and TR4 challenges using Illumina NovaSeq.Data Brief. 2023 Nov 13;52:109803. doi: 10.1016/j.dib.2023.109803. eCollection 2024 Feb. Data Brief. 2023. PMID: 38370021 Free PMC article.
-
Early resource scarcity causes cortical astrocyte enlargement and sex-specific changes in the orbitofrontal cortex transcriptome in adult rats.Neurobiol Stress. 2024 Jan 15;29:100607. doi: 10.1016/j.ynstr.2024.100607. eCollection 2024 Mar. Neurobiol Stress. 2024. PMID: 38304302 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
