Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. Jul-Aug 2005;12(6):762-76.
doi: 10.1089/cmb.2005.12.762.

Finding Anchors for Genomic Sequence Comparison

Affiliations

Finding Anchors for Genomic Sequence Comparison

Ross A Lippert et al. J Comput Biol. .

Abstract

Recent sequencing of the human and other mammalian genomes has brought about the necessity to align them, to identify and characterize their commonalities and differences. Programs that align whole genomes generally use a seed-and-extend technique, starting from exact or near-exact matches and selecting a reliable subset of these, called anchors, and then filling in the remaining portions between the anchors using a combination of local and global alignment algorithms, but their choices for the parameters so far have been primarily heuristic. We present a statistical framework and practical methods for selecting a set of matches that is both sensitive and specific and can constitute a reliable set of anchors for a one-to-one mapping of two genomes from which a whole-genome alignment can be built. Starting from exact matches, we introduce a novel per-base repeat annotation, the Z-score, from which noise and repeat filtering conditions are explored. Dynamic programming-based chaining algorithms are also evaluated as context-based filters. We apply the methods described here to the comparison of two progressive assemblies of the human genome, NCBI build 28 and build 34 (www.genome.ucsc.edu), and show that a significant portion of the two genomes can be found in selected exact matches, with very limited amount of sequence duplication.

Similar articles

See all similar articles

Cited by 5 articles

  • STAR3D: a stack-based RNA 3D structural alignment tool.
    Ge P, Zhang S. Ge P, et al. Nucleic Acids Res. 2015 Nov 16;43(20):e137. doi: 10.1093/nar/gkv697. Epub 2015 Jul 15. Nucleic Acids Res. 2015. PMID: 26184875 Free PMC article.
  • Local similarity search to find gene indicators in mitochondrial genomes.
    Moritz RL, Bernt M, Middendorf M. Moritz RL, et al. Biology (Basel). 2014 Mar 11;3(1):220-42. doi: 10.3390/biology3010220. Biology (Basel). 2014. PMID: 24833343 Free PMC article.
  • Separating significant matches from spurious matches in DNA sequences.
    Devillers H, Schbath S. Devillers H, et al. J Comput Biol. 2012 Jan;19(1):1-12. doi: 10.1089/cmb.2011.0070. Epub 2011 Dec 9. J Comput Biol. 2012. PMID: 22149632 Free PMC article.
  • The HuRef Browser: a web resource for individual human genomics.
    Axelrod N, Lin Y, Ng PC, Stockwell TB, Crabtree J, Huang J, Kirkness E, Strausberg RL, Frazier ME, Venter JC, Kravitz S, Levy S. Axelrod N, et al. Nucleic Acids Res. 2009 Jan;37(Database issue):D1018-24. doi: 10.1093/nar/gkn939. Epub 2008 Nov 26. Nucleic Acids Res. 2009. PMID: 19036787 Free PMC article.
  • The diploid genome sequence of an individual human.
    Levy S, Sutton G, Ng PC, Feuk L, Halpern AL, Walenz BP, Axelrod N, Huang J, Kirkness EF, Denisov G, Lin Y, MacDonald JR, Pang AW, Shago M, Stockwell TB, Tsiamouri A, Bafna V, Bansal V, Kravitz SA, Busam DA, Beeson KY, McIntosh TC, Remington KA, Abril JF, Gill J, Borman J, Rogers YH, Frazier ME, Scherer SW, Strausberg RL, Venter JC. Levy S, et al. Version 2. PLoS Biol. 2007 Sep 4;5(10):e254. doi: 10.1371/journal.pbio.0050254. PLoS Biol. 2007. PMID: 17803354 Free PMC article.

LinkOut - more resources

Feedback