Genome assembly from synthetic long read clouds
- PMID: 27307620
- PMCID: PMC4908351
- DOI: 10.1093/bioinformatics/btw267
Genome assembly from synthetic long read clouds
Abstract
Motivation: Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.
Results: Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR's underlying short reads, which we refer to as read clouds This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.
Availability and implementation: Our source code is freely available at https://github.com/kuleshov/architect
Contact: kuleshov@stanford.edu.
© The Author 2016. Published by Oxford University Press.
Figures
Similar articles
-
SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme.BMC Bioinformatics. 2021 Mar 25;22(1):158. doi: 10.1186/s12859-021-04081-z. BMC Bioinformatics. 2021. PMID: 33765921 Free PMC article.
-
LongStitch: high-quality genome assembly correction and scaffolding using long reads.BMC Bioinformatics. 2021 Oct 30;22(1):534. doi: 10.1186/s12859-021-04451-7. BMC Bioinformatics. 2021. PMID: 34717540 Free PMC article.
-
RepLong: de novo repeat identification using long read sequencing data.Bioinformatics. 2018 Apr 1;34(7):1099-1107. doi: 10.1093/bioinformatics/btx717. Bioinformatics. 2018. PMID: 29126180
-
Assessment of metagenomic assemblers based on hybrid reads of real and simulated metagenomic sequences.Brief Bioinform. 2020 May 21;21(3):777-790. doi: 10.1093/bib/bbz025. Brief Bioinform. 2020. PMID: 30860572 Free PMC article. Review.
-
Tools and Strategies for Long-Read Sequencing and De Novo Assembly of Plant Genomes.Trends Plant Sci. 2019 Aug;24(8):700-724. doi: 10.1016/j.tplants.2019.05.003. Epub 2019 Jun 14. Trends Plant Sci. 2019. PMID: 31208890 Review.
Cited by
-
Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts.Front Bioinform. 2022 May 16;2:867386. doi: 10.3389/fbinf.2022.867386. eCollection 2022. Front Bioinform. 2022. PMID: 36304283 Free PMC article.
-
Longitudinal linked-read sequencing reveals ecological and evolutionary responses of a human gut microbiome during antibiotic treatment.Genome Res. 2021 Aug;31(8):1433-1446. doi: 10.1101/gr.265058.120. Epub 2021 Jul 22. Genome Res. 2021. PMID: 34301627 Free PMC article.
-
SLR-superscaffolder: a de novo scaffolding tool for synthetic long reads using a top-to-bottom scheme.BMC Bioinformatics. 2021 Mar 25;22(1):158. doi: 10.1186/s12859-021-04081-z. BMC Bioinformatics. 2021. PMID: 33765921 Free PMC article.
-
Accurate haplotype-resolved assembly reveals the origin of structural variants for human trios.Bioinformatics. 2021 Aug 9;37(15):2095-2102. doi: 10.1093/bioinformatics/btab068. Bioinformatics. 2021. PMID: 33538292 Free PMC article.
-
IterCluster: a barcode clustering algorithm for long fragment read analysis.PeerJ. 2020 Mar 24;8:e8431. doi: 10.7717/peerj.8431. eCollection 2020. PeerJ. 2020. PMID: 32231869 Free PMC article.
References
-
- Berlin K. et al. (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol., 33, 623–630. - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
