Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 1;28(1):13-6.
doi: 10.1093/bioinformatics/btr588. Epub 2011 Oct 23.

Graph accordance of next-generation sequence assemblies

Affiliations

Graph accordance of next-generation sequence assemblies

Guohui Yao et al. Bioinformatics. .

Abstract

Motivation: No individual assembly algorithm addresses all the known limitations of assembling short-length sequences. Overall reduced sequence contig length is the major problem that challenges the usage of these assemblies. We describe an algorithm to take advantages of different assembly algorithms or sequencing platforms to improve the quality of next-generation sequence (NGS) assemblies.

Results: The algorithm is implemented as a graph accordance assembly (GAA) program. The algorithm constructs an accordance graph to capture the mapping information between the target and query assemblies. Based on the accordance graph, the contigs or scaffolds of the target assembly can be extended, merged or bridged together. Extra constraints, including gap sizes, mate pairs, scaffold order and orientation, are explored to enforce those accordance operations in the correct context. We applied GAA to various chicken NGS assemblies and the results demonstrate improved contiguity statistics and higher genome and gene coverage.

Availability: GAA is implemented in OO perl and is available here: http://sourceforge.net/projects/gaa-wugi/.

Contact: lye@genome.wustl.edu

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Graph accordance of assemblies. (A) Alignments between the target and query assemblies. (B) An accordance graph capturing the alignments with four maximal sub-paths included in dashed circles. T and Q represent contigs in the target and query assemblies, respectively. For example, T1.2 is the second contig of the first scaffold in the target assembly, and Q1.2 is the second contig of the first scaffold in the query assembly. M represents contigs in the merged assembly. Green edges represent consistent links, and red edges represent inconsistent links. (C) Target contigs are merged or bridged along each maximal sub-path with gap filling using query bases, and the merged contigs are renamed accordingly.
Fig. 2.
Fig. 2.
Gene recovery in merged assemblies. Blue represents target contigs, purple query contigs and red genes. β€˜βˆ’β€™ represents the reverse complement of a contig.

Similar articles

Cited by

References

    1. Alkan C., et al. Limitations of next-generation genome sequence assembly. Nat. Methods. 2010;8:61–65. - PMC - PubMed
    1. Casagrande A., et al. IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Washington, DC: 2009. GAM: genomics assemblies merger: a graph based method to integrate different assemblies; pp. 321–326.
    1. Consortium I.C.G.S. Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004;432:695–716. - PubMed
    1. DiGuistini S., et al. De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data. Genome Biol. 2009;10:R94. - PMC - PubMed
    1. Gnerre S., et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl Acad. Sci. USA. 2011;108:1513–1518. - PMC - PubMed

Publication types