Low-cost short read sequencing technology has revolutionized genomics, though it is only just becoming practical for the high-quality de novo assembly of a novel large genome. We describe the Assemblathon 1 competition, which aimed to comprehensively assess the state of the art in de novo assembly methods when applied to current sequencing technologies. In a collaborative effort, teams were asked to assemble a simulated Illumina HiSeq data set of an unknown, simulated diploid genome. A total of 41 assemblies from 17 different groups were received. Novel haplotype aware assessments of coverage, contiguity, structure, base calling, and copy number were made. We establish that within this benchmark: (1) It is possible to assemble the genome to a high level of coverage and accuracy, and that (2) large differences exist between the assemblies, suggesting room for further improvements in current methods. The simulated benchmark, including the correct answer, the assemblies, and the code that was used to evaluate the assemblies is now public and freely available from http://www.assemblathon.org/.
High-quality draft assemblies of mammalian genomes from massively parallel sequence data.Proc Natl Acad Sci U S A. 2011 Jan 25;108(4):1513-8. doi: 10.1073/pnas.1017351108. Epub 2010 Dec 27. Proc Natl Acad Sci U S A. 2011. PMID: 21187386 Free PMC article.
Metassembler: merging and optimizing de novo genome assemblies.Genome Biol. 2015 Sep 24;16:207. doi: 10.1186/s13059-015-0764-4. Genome Biol. 2015. PMID: 26403281 Free PMC article.
GABenchToB: a genome assembly benchmark tuned on bacteria and benchtop sequencers.PLoS One. 2014 Sep 8;9(9):e107014. doi: 10.1371/journal.pone.0107014. eCollection 2014. PLoS One. 2014. PMID: 25198770 Free PMC article.
Genetic variation and the de novo assembly of human genomes.Nat Rev Genet. 2015 Nov;16(11):627-40. doi: 10.1038/nrg3933. Epub 2015 Oct 7. Nat Rev Genet. 2015. PMID: 26442640 Free PMC article. Review.
De novo assembly of short sequence reads.Brief Bioinform. 2010 Sep;11(5):457-72. doi: 10.1093/bib/bbq020. Epub 2010 Aug 19. Brief Bioinform. 2010. PMID: 20724458 Review.
Cited by 196 articles
PACVr: plastome assembly coverage visualization in R.BMC Bioinformatics. 2020 May 24;21(1):207. doi: 10.1186/s12859-020-3475-0. BMC Bioinformatics. 2020. PMID: 32448146 Free PMC article.
Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery.Genome Biol. 2020 Apr 28;21(1):98. doi: 10.1186/s13059-020-01993-6. Genome Biol. 2020. PMID: 32345333 Free PMC article. No abstract available.
Whole Genome Sequences of 23 Species from the Drosophila montium Species Group (Diptera: Drosophilidae): A Resource for Testing Evolutionary Hypotheses.G3 (Bethesda). 2020 May 4;10(5):1443-1455. doi: 10.1534/g3.119.400959. G3 (Bethesda). 2020. PMID: 32220952 Free PMC article.
NucBreak: location of structural errors in a genome assembly by using paired-end Illumina reads.BMC Bioinformatics. 2020 Feb 21;21(1):66. doi: 10.1186/s12859-020-3414-0. BMC Bioinformatics. 2020. PMID: 32085722 Free PMC article.
MAC: Merging Assemblies by Using Adjacency Algebraic Model and Classification.Front Genet. 2020 Jan 31;10:1396. doi: 10.3389/fgene.2019.01396. eCollection 2019. Front Genet. 2020. PMID: 32082361 Free PMC article.
- F31 HG000064/HG/NHGRI NIH HHS/United States
- R01 HG003474/HG/NHGRI NIH HHS/United States
- P41 HG002371/HG/NHGRI NIH HHS/United States
- HG00064/HG/NHGRI NIH HHS/United States
- U41HG004568/HG/NHGRI NIH HHS/United States
- U01HG004695/HG/NHGRI NIH HHS/United States
- 1U24CA143858-01/CA/NCI NIH HHS/United States
- U41 HG004568/HG/NHGRI NIH HHS/United States
- U24 CA143858/CA/NCI NIH HHS/United States
- Howard Hughes Medical Institute/United States
- K22 HG000064/HG/NHGRI NIH HHS/United States
- P41HG002371/HG/NHGRI NIH HHS/United States
- R21 AA022707/AA/NIAAA NIH HHS/United States
- U01 HG004695/HG/NHGRI NIH HHS/United States
- U54HG004555/HG/NHGRI NIH HHS/United States
- U54 HG004555/HG/NHGRI NIH HHS/United States