New gene models and alternative splicing in the maize pathogen Colletotrichum graminicola revealed by RNA-Seq analysis

BMC Genomics. 2014 Oct 2;15(1):842. doi: 10.1186/1471-2164-15-842.

Abstract

Background: An annotated genomic sequence of the corn anthracnose fungus Colletotrichum graminicola has been published previously, but correct identification of gene models by means of automated gene annotation remains a challenge. RNA-Seq offers the potential for substantially improved gene annotations and for the identification of posttranscriptional RNA modifications, such as alternative splicing and RNA editing.

Results: Based on the nucleotide sequence information of transcripts, we identified 819 novel transcriptionally active regions (nTARs) and revised 906 incorrectly predicted gene models, including revisions of exon-intron structure, gene orientation and sequencing errors. Among the nTARs, 146 share significant similarity with proteins that have been identified in other species suggesting that they are hitherto unidentified genes in C. graminicola. Moreover, 5'- and 3'-UTR sequences of 4378 genes have been retrieved and alternatively spliced variants of 69 genes have been identified. Comparative analysis of RNA-Seq data and the genome sequence did not provide evidence for RNA editing in C. graminicola.

Conclusions: We successfully employed deep sequencing RNA-Seq data in combination with an elaborate bioinformatics strategy in order to identify novel genes, incorrect gene models and mechanisms of transcript processing in the corn anthracnose fungus C. graminicola. Sequence data of the revised genome annotation including several hundreds of novel transcripts, improved gene models and candidate genes for alternative splicing have been made accessible in a comprehensive database. Our results significantly contribute to both routine laboratory experiments and large-scale genomics or transcriptomic studies in C. graminicola.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions
  • Alternative Splicing / genetics
  • Chromosome Mapping
  • Colletotrichum / genetics*
  • Computational Biology
  • Databases, Genetic
  • High-Throughput Nucleotide Sequencing
  • Models, Genetic*
  • Molecular Sequence Annotation
  • RNA Editing / genetics
  • RNA, Untranslated / genetics
  • RNA, Untranslated / metabolism
  • Sequence Analysis, RNA
  • Transcriptome
  • Zea mays / microbiology

Substances

  • 3' Untranslated Regions
  • RNA, Untranslated