An algorithm for automated closure during assembly
- PMID: 20831800
- PMCID: PMC2945939
- DOI: 10.1186/1471-2105-11-457
An algorithm for automated closure during assembly
Abstract
Background: Finishing is the process of improving the quality and utility of draft genome sequences generated by shotgun sequencing and computational assembly. Finishing can involve targeted sequencing. Finishing reads may be incorporated by manual or automated means. One automated method uses targeted addition by local re-assembly of gap regions. An obvious alternative uses de novo assembly of all the reads.
Results: A procedure called the bounding read algorithm was developed for assembly of shotgun reads plus finishing reads and their constraints, targeting repeat regions. The algorithm was implemented within the Celera Assembler software and its pyrosequencing-specific variant, CABOG. The implementation was tested on Sanger and pyrosequencing data from six genomes. The bounding read assemblies were compared to assemblies from two other methods on the same data. The algorithm generates improved assemblies of repeat regions, closing and tiling some gaps while degrading none.
Conclusions: The algorithm is useful for small-genome automated finishing projects. Our implementation is available as open-source from http://wgs-assembler.sourceforge.net under the GNU Public License.
Figures
Similar articles
-
Aggressive assembly of pyrosequencing reads with mates.Bioinformatics. 2008 Dec 15;24(24):2818-24. doi: 10.1093/bioinformatics/btn548. Epub 2008 Oct 24. Bioinformatics. 2008. PMID: 18952627 Free PMC article.
-
Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology.PLoS One. 2012;7(11):e47768. doi: 10.1371/journal.pone.0047768. Epub 2012 Nov 21. PLoS One. 2012. PMID: 23185243 Free PMC article.
-
Consensus generation and variant detection by Celera Assembler.Bioinformatics. 2008 Apr 15;24(8):1035-40. doi: 10.1093/bioinformatics/btn074. Epub 2008 Mar 4. Bioinformatics. 2008. PMID: 18321888
-
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
-
Algorisms used for in silico finishing of bacterial genomes based on short-read assemblage implemented in GenoFinisher, AceFileViewer, and ShortReadManager.Biosci Biotechnol Biochem. 2022 May 24;86(6):693-703. doi: 10.1093/bbb/zbac032. Biosci Biotechnol Biochem. 2022. PMID: 35425950 Review.
Cited by
-
WiseScaffolder: an algorithm for the semi-automatic scaffolding of Next Generation Sequencing data.BMC Bioinformatics. 2015 Sep 3;16:281. doi: 10.1186/s12859-015-0705-y. BMC Bioinformatics. 2015. PMID: 26335184 Free PMC article.
-
MetAMOS: a modular and open source metagenomic assembly and analysis pipeline.Genome Biol. 2013 Jan 15;14(1):R2. doi: 10.1186/gb-2013-14-1-r2. Genome Biol. 2013. PMID: 23320958 Free PMC article.
-
A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach.BMC Bioinformatics. 2012 Sep 17;13:237. doi: 10.1186/1471-2105-13-237. BMC Bioinformatics. 2012. PMID: 22984983 Free PMC article.
-
Sequencing intractable DNA to close microbial genomes.PLoS One. 2012;7(7):e41295. doi: 10.1371/journal.pone.0041295. Epub 2012 Jul 31. PLoS One. 2012. PMID: 22859974 Free PMC article.
-
Review of general algorithmic features for genome assemblers for next generation sequencers.Genomics Proteomics Bioinformatics. 2012 Apr;10(2):58-73. doi: 10.1016/j.gpb.2012.05.006. Epub 2012 Jun 9. Genomics Proteomics Bioinformatics. 2012. PMID: 22768980 Free PMC article. Review.
References
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous
