Background: Despite the importance of osmoprotectants, no previous in silico evaluation of high throughput data is available for higher plants. The present approach aimed at the identification and annotation of osmoprotectant-related sequences applied to short transcripts from a soybean HT-SuperSAGE (High Throughput Super Serial Analysis of Gene Expression; 26-bp tags) database, and also its comparison with other transcriptomic and genomic data available from different sources.
Methods: A curated set of osmoprotectants related sequences was generated using text mining and selected seed sequences for identification of the respective transcripts and proteins in higher plants. To test the efficiency of the seed sequences, these were aligned against four HT-SuperSAGE contrasting libraries generated by our group using soybean tolerant and sensible plants against water deficit, considering only differentially expressed transcripts (p ≤ 0.05). Identified transcripts from soybean and their respective tags were aligned and anchored against the soybean virtual genome.
Results: The workflow applied resulted in a set including 1,996 seed sequences that allowed the identification of 36 differentially expressed genes related to the biosynthesis of osmoprotectants [Proline (P5CS: 4, P5CR: 2), Trehalose (TPS1: 9, TPPB: 1), Glycine betaine (BADH: 4) and Myo-inositol (MIPS: 7, INPS1: 8)], also mapped in silico in the soybean genome (25 loci). Another approach considered matches using Arabidopsis full length sequences as seed sequences, and allowed the identification of 124 osmoprotectant-related sequences, matching ~10.500 tags anchored in the soybean virtual chromosomes. Osmoprotectant-related genes appeared clustered in all soybean chromosomes, with higher density in some subterminal regions and synteny among some chromosome pairs.
Conclusions: Soybean presents all searched osmoprotectant categories with some important members differentially expressed among the comparisons considered (drought tolerant or sensible vs. control; tolerant vs. sensible), allowing the identification of interesting candidates for biotechnological inferences. The identified tags aligned to corresponding genes that matched 19 soybean chromosomes. Osmoprotectant-related genes are not regularly distributed in the soybean genome, but clustered in some regions near the chromosome terminals, with some redundant clusters in different chromosomes indicating their involvement in previous duplication and rearrangements events. The seed sequences, transcripts and map represent the first transversal evaluation for osmoprotectant-related genes and may be easily applied to other plants of interest.