Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(8):e23455.
doi: 10.1371/journal.pone.0023455. Epub 2011 Aug 15.

LOCAS--a Low Coverage Assembly Tool for Resequencing Projects

Affiliations
Free PMC article

LOCAS--a Low Coverage Assembly Tool for Resequencing Projects

Juliane D Klein et al. PLoS One. .
Free PMC article

Abstract

Motivation: Next Generation Sequencing (NGS) is a frequently applied approach to detect sequence variations between highly related genomes. Recent large-scale re-sequencing studies as the Human 1000 Genomes Project utilize NGS data of low coverage to afford sequencing of hundreds of individuals. Here, SNPs and micro-indels can be detected by applying an alignment-consensus approach. However, computational methods capable of discovering other variations such as novel insertions or highly diverged sequence from low coverage NGS data are still lacking.

Results: We present LOCAS, a new NGS assembler particularly designed for low coverage assembly of eukaryotic genomes using a mismatch sensitive overlap-layout-consensus approach. LOCAS assembles homologous regions in a homology-guided manner while it performs de novo assemblies of insertions and highly polymorphic target regions subsequently to an alignment-consensus approach. LOCAS has been evaluated in homology-guided assembly scenarios with low sequence coverage of Arabidopsis thaliana strains sequenced as part of the Arabidopsis 1001 Genomes Project. While assembling the same amount of long insertions as state-of-the-art NGS assemblers, LOCAS showed best results regarding contig size, error rate and runtime.

Conclusion: LOCAS produces excellent results for homology-guided assembly of eukaryotic genomes with short reads and low sequencing depth, and therefore appears to be the assembly tool of choice for the detection of novel sequence variations in this scenario.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Transformation of the overlap alignment graph into the path graph.
In (A), an overlap graph is shown, and, for some read sequences, the underlying alignment information is displayed. In (B), the corresponding path graph is displayed. For each unique path in the overlap graph an inner edge is introduced in the path graph, represented by a solid line. For example, the unique path between b and d in the overlap graph is represented by the inner edge (b′″, d′) in the path graph. If two vertices in the path graph represent the two ends of the same read, then a real edge is presented between them in the path graph. These real edges are shown as dashed lines. An example is the real edge (d′, d″). The vertices d′ and d″ represent the same read since they both correspond to vertex d in the overlap graph but they represent the two different ends of the read, since the read of d overlaps with the read of b and e but at different ends, which is displayed in (A).
Figure 2
Figure 2. Workflow of SUPERLOCAS.
The figure shows the workflow of the algorithm of SUPERLOCAS. The initial steps are illustrated: the left-over reads with the constructed left-over overlap graph, and the reads that are aligned against the reference sequence and partitioned into blocks. Next, the steps that are executed consecutively for each block are shown: the construction of the overlap graph, the insertion of edges between both graphs and the procedure until contigs are reported for the merged graph.
Figure 3
Figure 3. Performance comparison of low sequencing depth assembly.
Illumina GAIIx reads were simulated at a sequencing depth of 7.5× for the first chromosome of A. thaliana Col-0. The reads were assigned to the reference sequence corresponding to their origin positions and partitioned into blocks of a length of 10 kb. The avgN50 (average N50) is plotted against the avgERR (average error rate) for the assembly tools LOCAS, EULER-SR, ABySS, VELVET and soapDeNovo. For each assembler, several runs are displayed corresponding to the different parameter settings. The data points of ABySS are drawn in orange, EULER-SR in green, LOCAS in red, VELVET in blue and soapDeNovo in turquoise. Each point represents one run.
Figure 4
Figure 4. Performance comparison of homology-guided assembly on simulated data.
We simulated a resequencing study of an artificial A. thaliana strain using a sequencing depth of 7.5×. The simulated Illumina reads were aligned to the reference genome Col-0 and partitioned into blocks of 25 kb using SHORE. The assembly tools SUPERLOCAS and VELVET were applied to assemble the mapped reads of the first chromosome and the left-over reads. The avgN50 (average N50) is plotted against the avgERR (average error rate) for the assembly tools SUPERLOCAS and VELVET (in left-over incorporation mode). SUPERLOCAS is displayed in red and VELVET in blue.
Figure 5
Figure 5. Number of detected insertion regions in homology-guided assembly on simulated data.
For the artificial A. thaliana strain in the simulated resequencing study, the total insertion regions in the target genome are plotted for different lengths of these regions. In addition, the number of error-free regions assembled by VELVET and by SUPERLOCAS are shown.
Figure 6
Figure 6. Performance comparison of homology-guided assembly on real world data without utilizing left-over reads.
Paired-end reads were produced by Illumina GAIIx with a length of 80 bp to a depth of ∼7× for the first chromosome of A. thaliana strain Ler-1. Reads were aligned against the complete reference sequence (Col-0) and partitioned into blocks with SHORE of 25 kb. LOCAS and VELVET are applied in paired-end mode for all blocks which contain reads that are aligned to the same region of the reference sequence. The x-axis shows the avgN50 (average N50) and the y-axis the avgDISS (average dissimilarity). The runs of LOCAS produced with different parameter setting are drawn in red and those of VELVET in blue.
Figure 7
Figure 7. Performance comparison of homology-guided assembly on real world data utilizing left-over reads.
Illumina reads of the first chromosome of A. thaliana strain Ler-1 were aligned against the reference sequence (Col-0) and partitioned into blocks with SHORE of 25 kb. Local assemblies of reads are performed with SUPERLOCAS and VELVET in order to incorporate left-over reads. While SUPERLOCAS provides algorithms specifically adjusted to this task, VELVET had to assemble each block with the complete set of left-over reads. A barplot is shown for the avgN50 (average N50) size of both assemblers.

Similar articles

See all similar articles

Cited by 10 articles

See all "Cited by" articles

References

    1. Durbin RM, Abecasis GR, Altshuler DL, Auton A, Brooks LD, et al. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. - PMC - PubMed
    1. Li R, Li Y, Fang X, Yang H, Wang J, et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 2009;19:1124–1132. - PMC - PubMed
    1. Nusbaum C, Ohsumi TK, Gomez J, Aquadro J, Victor TC, et al. Sensitive, specific polymorphism discovery in bacteria using massively parallel sequencing. Nat Methods. 2009;6:67–69. - PMC - PubMed
    1. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 2008;18:1851–1858. - PMC - PubMed
    1. Quinlan AR, Stewart DA, Stromberg MP, Marth GT. Pyrobayes: an improved base caller for SNP discovery in pyrosequences. Nat Methods. 2008;5:179–181. - PubMed

Publication types

LinkOut - more resources

Feedback