Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Sep 1;7(9):giy096.
doi: 10.1093/gigascience/giy096.

ANNOgesic: A Swiss Army Knife for the RNA-seq Based Annotation of Bacterial/Archaeal Genomes

Affiliations
Free PMC article

ANNOgesic: A Swiss Army Knife for the RNA-seq Based Annotation of Bacterial/Archaeal Genomes

Sung-Huan Yu et al. Gigascience. .
Free PMC article

Abstract

To understand the gene regulation of an organism of interest, a comprehensive genome annotation is essential. While some features, such as coding sequences, can be computationally predicted with high accuracy based purely on the genomic sequence, others, such as promoter elements or noncoding RNAs, are harder to detect. RNA sequencing (RNA-seq) has proven to be an efficient method to identify these genomic features and to improve genome annotations. However, processing and integrating RNA-seq data in order to generate high-resolution annotations is challenging, time consuming, and requires numerous steps. We have constructed a powerful and modular tool called ANNOgesic that provides the required analyses and simplifies RNA-seq-based bacterial and archaeal genome annotation. It can integrate data from conventional RNA-seq and differential RNA-seq and predicts and annotates numerous features, including small noncoding RNAs, with high precision. The software is available under an open source license (ISCL) at https://pypi.org/project/ANNOgesic/.

Figures

Figure 1:
Figure 1:
Schema of the genetic algorithm for optimizing the parameters of TSSpredator. It starts from the default parameters. These parameter sets will go through three steps: global change (change every parameter randomly), large change (change two of the parameters randomly), and then small change (adds/subtracts a small fraction to one of the parameters). It will then select the best parameter set for reproduction when one step is done. Usually, ANNOgesic can achieve the optimized parameters within 4,000 runs.
Figure 2:
Figure 2:
Transcript boundary detection. (A) Schema: ANNOgesic can predict TSSs, terminators, transcripts, genes, and UTRs and integrate them into a comprehensive annotation. (B) Gene HP1342 of H. pylori 26695 as an example. The pink coverage plot represents RNA-seq data of libraries after fragmentation, the blue coverage plots TEX+ libraries of dRNA-seq, and the green coverage plots TEX- libraries of dRNA-seq. Transcript, TSS, terminator, and CDS are presented as red, blue, orange, and green bars, respectively. The figure shows how the transcript covers the whole gene location and how UTRs (presented by purple bars) can be detected based on the TSS, transcript, terminator, and gene annotations.
Figure 3:
Figure 3:
Coverage-based transcript detection. If the coverage (blue curve-blocks) is higher than a given coverage cutoff value (dashed line), a transcript will be called. The user can set a tolerance value (i.e., a number of nucleotides with a coverage below the cutoff) on which basis gapped transcripts are merged or are kept separated. Information of gene positions can also be used to merge transcripts in case two of them overlap with the same gene.
Figure 4:
Figure 4:
Operon and sub-operon detection. (A) If there is more than one TSSs that does not overlap with genes located within one operon, the operon can be divided to several sub-operons based on these TSSs. (B) An example from H. pylori 26695. The coverage of RNA-seq with fragmentation, TEX+, and TEX- of dRNA-seq are shown in pink, blue, and green coverages, respectively. TSSs, transcripts/operons, and genes are presented as blue, red, and green bars, respectively. The two genes are located in the same operon but also in different sub-operons (two empty red squares).
Figure 5:
Figure 5:
Detection of intergenic, antisense, and UTR-derived sRNAs. The length of potential sRNAs should be within a given range, and their coverages should exceed a given minimum coverage. (A) Detection of intergenic and antisense sRNAs. Three potential cases are shown. In the upper panel, the transcript starts with a TSS, and length of the transcript is within the expected length. In the middle panel, the transcript starts with a TSS, but the transcript is longer than an average sRNA. In that case, ANNOgesic will search in the region of high coverage (blue region) for a point at which the coverage is decreasing rapidly. In the bottom panel, the image is identical to the one in the middle, but the sRNA ends instead with a PS. (B) Detection of UTR-derived sRNAs. For 3’ UTR-derived sRNAs: if the transcript starts with a TSS or PS, it will be tagged as a 3’ UTR-derived sRNA. For 5’ UTR-derived sRNAs: if the transcript starts with a TSS and ends with a PS or the point where the coverage significant drops. (C) Detection of interCDS-derived sRNAs; this is similar to the 5’ UTR-derived approach, but the transcript starts with a PS.
Figure 6:
Figure 6:
sORF detection. (A) An sORF must contain a start codon and stop codon within a transcript and should be inside of a given length range (default 30 -150nt). Additionally, a ribosomal binding site must be detected between the TSS and the start codon. (B) An example from H. pylori 26695. The coverage of RNA-seq (fragmented libraries), TEX+, and TEX- (dRNA-seq) are shown as pink, blue, and green coverages, respectively. The TSS, transcript, and sORF are presented as blue, red, and green bars, respectively.

Similar articles

See all similar articles

Cited by 5 articles

References

    1. Delcher AL, Bratke KA, Powers EC et al. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23:673–9. - PMC - PubMed
    1. Schattner P, Brooks AN, Lowe TM. The tRNAscan-SE, snoscan and snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Res. 2005;33:W686–9. - PMC - PubMed
    1. Lagesen K, Hallin P, Rodland EA, et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 2007;35:3100–8. - PMC - PubMed
    1. Seemann T. Prokka: rapid prokaryotic genome annotation. Bioinformatics. 2014;30:2068–9. - PubMed
    1. Weinmaier T, Platzer A, Frank J, et al. ConsPred: a rule-based (re-)annotation framework for prokaryotic genomes. Bioinformatics. 2016;32:3327–9. - PubMed

Publication types

Feedback