Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 15;32(12):i216-i224.
doi: 10.1093/bioinformatics/btw267.

Genome assembly from synthetic long read clouds

Affiliations

Genome assembly from synthetic long read clouds

Volodymyr Kuleshov et al. Bioinformatics. .

Abstract

Motivation: Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.

Results: Here, we introduce Architect, a new de novo scaffolder aimed at SLR technologies. Unlike previous assembly strategies, Architect does not require a costly subassembly step; instead it assembles genomes directly from the SLR's underlying short reads, which we refer to as read clouds This enables a 4- to 20-fold reduction in sequencing requirements and a 5-fold increase in assembly contiguity on both genomic and metagenomic datasets relative to state-of-the-art assembly strategies aimed directly at fully subassembled long reads.

Availability and implementation: Our source code is freely available at https://github.com/kuleshov/architect

Contact: kuleshov@stanford.edu.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
High-level overview of SLR and read cloud technologies. DNA (1) is sheared into kilobase-long fragments (2), which are then diluted and placed into multiple containers, typically with 0.1–2% of the genome per container (3). Within each container, fragments may be amplified before being cut into short fragments, and barcoded (4). The barcoded fragments are finally pooled together and sequenced (5); reads can be demultiplexed on a computer into their original compartment via the barcodes in order to form read clouds or SLRs
Fig. 2.
Fig. 2.
Scaffolding using read clouds. A genome contains a repeat R flanked by unique sequences (A, B) and (C, D) (top). With short reads, the correct assembly is ambiguous (middle). If two read clouds (marked as red and orange) map, respectively, to ARB and CRD, this provides signal that may be used to correctly resolve the repeat structure (bottom).

Similar articles

Cited by

References

    1. Adey A. et al. (2014) In vitro, long-range sequence information for de novo genome assembly via transposase contiguity. Genome Res., 24, 2041–2049. - PMC - PubMed
    1. Amini S. et al. (2014) Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet., 46, 1343–1349. - PMC - PubMed
    1. Bankevich A. et al. (2012) Spades: A new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol., 19, 455–477. - PMC - PubMed
    1. Berlin K. et al. (2015) Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol., 33, 623–630. - PubMed
    1. Bishara A. et al. (2015) Read clouds uncover variation in complex regions of the human genome. Genome Res., 25, 1570–1580. - PMC - PubMed