Integrative Meta-Assembly Pipeline (IMAP): Chromosome-level genome assembler combining multiple de novo assemblies

PLoS One. 2019 Aug 27;14(8):e0221858. doi: 10.1371/journal.pone.0221858. eCollection 2019.


Background: Genomic data have become major resources to understand complex mechanisms at fine-scale temporal and spatial resolution in functional and evolutionary genetic studies, including human diseases, such as cancers. Recently, a large number of whole genomes of evolving populations of yeast (Saccharomyces cerevisiae W303 strain) were sequenced in a time-dependent manner to identify temporal evolutionary patterns. For this type of study, a chromosome-level sequence assembly of the strain or population at time zero is required to compare with the genomes derived later. However, there is no fully automated computational approach in experimental evolution studies to establish the chromosome-level genome assembly using unique features of sequencing data.

Methods and results: In this study, we developed a new software pipeline, the integrative meta-assembly pipeline (IMAP), to build chromosome-level genome sequence assemblies by generating and combining multiple initial assemblies using three de novo assemblers from short-read sequencing data. We significantly improved the continuity and accuracy of the genome assembly using a large collection of sequencing data and hybrid assembly approaches. We validated our pipeline by generating chromosome-level assemblies of yeast strains W303 and SK1, and compared our results with assemblies built using long-read sequencing and various assembly evaluation metrics. We also constructed chromosome-level sequence assemblies of S. cerevisiae strain Sigma1278b, and three commonly used fungal strains: Aspergillus nidulans A713, Neurospora crassa 73, and Thielavia terrestris CBS 492.74, for which long-read sequencing data are not yet available. Finally, we examined the effect of IMAP parameters, such as reference and resolution, on the quality of the final assembly of the yeast strains W303 and SK1.

Conclusions: We developed a cost-effective pipeline to generate chromosome-level sequence assemblies using only short-read sequencing data. Our pipeline combines the strengths of reference-guided and meta-assembly approaches. Our pipeline is available online at including a Docker image, as well as a Perl script, to help users install the IMAP package, including several prerequisite programs. Users can use IMAP to easily build the chromosome-level assembly for the genome of their interest.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Chromosomes, Fungal
  • Genome, Fungal
  • Molecular Sequence Annotation
  • Sequence Analysis, DNA*
  • Software*
  • Synteny / genetics

Grants and funding

This work was supported by Pusan National University Research Grant, 2016, and National Research Foundation of Korea Grant funded by the Korean Government (NRF-2017074529 and NRF-2018R1A5A2023879) to GS, and Ministry of Science and ICT of Korea Grant 2014M3C9A3063544 and Ministry of Education of Korea Grant 2016R1D1A1B03930209 and 2019R1F1A1042018 to JK.