Background: Leishmania parasites cause severe human diseases known as leishmaniasis. These eukaryotic microorganisms possess an atypical chromosomal architecture and the regulation of gene expression occurs almost exclusively at post-transcriptional levels. Accordingly, sequencing of the genome of Leishmania major, and subsequently the genome of other related species, was paramount for highlighting these peculiar molecular aspects. Recently, we carried out an analysis of gene expression by massive sequencing of RNA in the L. major promastigote, and data derived from that analysis were suggestive of possible errors in the current genome assembly for this Leishmania species.
Results: During the analysis by RNA-Seq of the transcriptome for L. major Friedlin strain, 163,714 reads could not be aligned with the reference genome. Thus, de novo assembly with these reads was carried out and the resulting contigs were further analyzed. After detailed homology searches using available databases, it was postulated that 15 contigs might correspond to genomic sequences lost during the initial genome assembly of the L. major Friedlin strain. This was experimentally confirmed by PCR amplification, cloning and sequencing of the new genomic regions. As a result, we have identified seven regions of the L. major (Friedlin) genome that were lost during the sequence assembly. This led to the uncovering of six new genes (LmjF.15.1475, LmjF.15.0285, LmjF.24.0765, LmjF.14.0860, LmjF.19.0305, and LmjF.27.2035), and correction of the annotation for two others (LmjF.15.1480 and LmjF.27.2030). Our data suggest that these genomic regions probably collapsed during the genome assembly due to the existence of gene duplications and/or repeated regions surrounding the missed genes.
Conclusion: RNA-seq data helped to reconstruct some genomic regions misassembled during the L. major Friedlin genome assembly, which is otherwise quite robust. On the other hand, this study shows that data derived from massive sequencing approaches, including RNA-Seq, should be carefully inspected to improve current genome definition and gene annotations.