Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
, 18 (1), 119

An Integrated and Comparative Approach Towards Identification, Characterization and Functional Annotation of Candidate Genes for Drought Tolerance in Sorghum (Sorghum Bicolor (L.) Moench)

Affiliations
Comparative Study

An Integrated and Comparative Approach Towards Identification, Characterization and Functional Annotation of Candidate Genes for Drought Tolerance in Sorghum (Sorghum Bicolor (L.) Moench)

Adugna Abdi Woldesemayat et al. BMC Genet.

Abstract

Background: Drought is the most disastrous abiotic stress that severely affects agricultural productivity worldwide. Understanding the biological basis of drought-regulated traits, requires identification and an in-depth characterization of genetic determinants using model organisms and high-throughput technologies. However, studies on drought tolerance have generally been limited to traditional candidate gene approach that targets only a single gene in a pathway that is related to a trait. In this study, we used sorghum, one of the model crops that is well adapted to arid regions, to mine genes and define determinants for drought tolerance using drought expression libraries and RNA-seq data.

Results: We provide an integrated and comparative in silico candidate gene identification, characterization and annotation approach, with an emphasis on genes playing a prominent role in conferring drought tolerance in sorghum. A total of 470 non-redundant functionally annotated drought responsive genes (DRGs) were identified using experimental data from drought responses by employing pairwise sequence similarity searches, pathway and interpro-domain analysis, expression profiling and orthology relation. Comparison of the genomic locations between these genes and sorghum quantitative trait loci (QTLs) showed that 40% of these genes were co-localized with QTLs known for drought tolerance. The genome reannotation conducted using the Program to Assemble Spliced Alignment (PASA), resulted in 9.6% of existing single gene models being updated. In addition, 210 putative novel genes were identified using AUGUSTUS and PASA based analysis on expression dataset. Among these, 50% were single exonic, 69.5% represented drought responsive and 5.7% were complete gene structure models. Analysis of biochemical metabolism revealed 14 metabolic pathways that are related to drought tolerance and also had a strong biological network, among categories of genes involved. Identification of these pathways, signifies the interplay of biochemical reactions that make up the metabolic network, constituting fundamental interface for sorghum defence mechanism against drought stress.

Conclusions: This study suggests untapped natural variability in sorghum that could be used for developing drought tolerance. The data presented here, may be regarded as an initial reference point in functional and comparative genomics in the Gramineae family.

Keywords: Candidate gene identification; Drought tolerance; Functional genomics; Genome annotation; Integrated in silico approach; Sorghum bicolor (L.) Moench.

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
Schematic gene structure model for annotation comparison. In this figure, there are three representations of gene structure models. a represents hypothetical map of transcripts to the existing gene model (EGM): ‘Gene A’ denotes a hypothetical EGM to which all transcripts overlapped showing a specific type of updated gene model. Transcript A, B and C each represents an extended overlapping gene at both 5′ and 3′ edges, at only 5′ edge but sharing start position at 3′ edge and at only 3′ edge but sharing start position at 5′ edge respectively. Transcript D represents perfect overlapping gene that conform or share start position at 5′ and stop at 3′ edges. Transcript E and F represent partial overlapping at one edge and extension at another where the former partially overlapped at 3′ and extended at 5′ edge and the latter with an exact opposite pattern. Transcript G and H each denotes a partial overlapping gene that shares start position at 5′ edge and at 3′ edge respectively. Transcript I represents an inner overlapping gene. The values given corresponding to each overlapping transcript in a describe the actual number of modified genes in our finding based on TIGR DRESTs and UniGene datasets. b represents cross-genic overlapping (merged gene structure model) where two separate EGMs, ‘Gene B’ and ‘Gene C’ were assumed to be merged into a single gene model, ‘Gene D’. c represents an illustration of a NGSM ‘Gene F’ that mapped to an intergenic region between the two EGMs ‘Gene E’ and ‘Gene G’ that represent the left and right nearest neighbouring genes respectively. The gene names denote arbitrary example. Each bar represents exon structure and the inverted ‘v’ shaped structure positioned between any two adjacent bars represents intron splicing. The gene model structure with red bars denote EGMs and those with blue are assumed to represent the currently identified genes that mapped to EGMs (transcript A-I), merged gene (‘Gene D’) and NGSM (‘Gene F’). This schematic gene structure model assumes both strand orientations based on the pattern of loci overlapping observed in our results
Fig. 2
Fig. 2
Pipeline for mapping experimental data to reference genome and annotation comparison. This pipeline represents a work flow for identifying known and novel candidate drought responsive genes (CDRGs) and for finding out annotation updates. Identified known putatively uncharacterised genes were functionally annotated. The UniGenes that mapped to integenic region were used by BLAT to generate HINTs and then by AUGUSTUS to identify novel genes which were further optimized by PASA. The PASA pipeline was initiated afresh by cleaning up of any existing output in the MYSQL database using utility codes. The process for annotation comparison was then started by running alignment assembly and by employing the minimum criteria for overlapping transcript alignments and for sub clustering into gene structure (Table 4). Mapping valid alignment assemblies to genome resulted established ICGBs. While the gene builds mapped to the intergenic region that come from the TIGR transcripts were used by BLAT to generate additional HINTs, those mapped to the genic region were used for further annotation comparison. A two round approach was implemented by PASA for processing a complete annotation comparisons: 1st, compared existing gene structure annotations with alignment assemblies and 2nd, re-run, using the output from the first round to capture a few more updates or to verify the initial updates if there was no further updates from the second round. Analysis of alternative spliced alignments and identification of BCORFs were also included in the process. The BCORFs originated from TIGR ESTs were another input to generate HINTs
Fig. 3
Fig. 3
Oxidative phosphorelation metabolic pathway. This represents one of the 14 metabolic pathways identified in this study and is associated with the production of respiratory energy in mitochondria, a power house of the cell. Cytochrome c oxidase subunit 1 (EC: 1.9.3.1; Additional file 6), the enzyme encoded by sorghum gene cox1 was identified to be involved in the catalytic reaction of the final step of protein complex (complex IV) in the electron transport chain. In addition, inorganic diphosphatase (EC: 3.6.1.1; Additional file 6) was identified to be involved in the electron transport system by catalysing the conversion of diphosphate into monophosphate. This enzyme controls the amount of inorganic phosphate (Pi) that should be coupled with adenosine dinucleotide phosphate (ADP) in the last step of oxidative phosphorylation, a phenomenon thought to be involved in counteracting an imbalance of reactive oxygen species caused by drought stress
Fig. 4
Fig. 4
Representation of the GO classification. Gene Ontology terms assigned to the drought responsive sorghum UniGene clusters that encode genes involved in the drought related pathways based on the blast hit obtained against the non-redundant database are classified into three main categories namely BP, MF and CC and 31 subcategories (a). Likewise, the enriched GO-terms from the differentially expressed (up and down-regulated; p-value <0.05) sorghum genes and orthologs that were queried based on the high-score blast hit against the non-redundant database are classified into three main categories as mentioned above and 33 subcategories (b). While the left y-axis represents the number of genes associated with the subcategories, the x-axis indicates the specific subcategory involved in the main category
Fig. 5
Fig. 5
Heat-map showing differential gene expression based on sorghum RNA-seq dataset. The hierarchical clustering of gene expression profiling in this figure is associated with the information derived from the sorghum drought related ontology terms and the gene expression omnibus (GEO) database. The figure shows heat map depicting up and down-regulated genes under drought condition based on data from sorghum RNA-seq in response to osmotic and abscisic acid stresses. The rows represent the genes, while the columns represent the biological samples. The red color denotes the up-regulation, while the green shows down-regulation of the genes
Fig. 6
Fig. 6
Heat map showing up and down-regulated sorghum orthologs in maize from RNA-seq data. The comparison of gene expression pattern based on parametric (unpaired t-Test or between subject comparison, p < 0.01) and non-parametric test (Rank Product (RP), p < 0.01), and Fisher’s Exact test (p < 0.05) shows the up and down-regulated genes across treatment and tissue based grouping. Evaluation by treatment based grouping was determined to see significant difference in gene expression due to effect of differential condition under which the samples were tested while tissue based grouping was used to detect the effect of differences in tissues on the gene expression. All data showing significant expression, either up or down regulation of genes in both groupings represent results obtained under drought conditions for ovary and leaf meristem tissues
Fig. 7
Fig. 7
Description of sorghum orthologs across species and drought related GO terms. Key to legend: RWD = response to water deprivation; RH = response to heat and RABAS = response to ABA stimulus. The Venn-diagram shows patterns of shared sorghum orthologous gene clusters among its relative species and GO terms related to drought stress. a shows the distribution of shared sorghum orthologs among species, giving some clue on evolutionary implication and functional crosstalk of genes and on the extent of shared conserved syntheny among species related to sorghum. Closely related species (eg. maize and rice) share higher conserved sorghum orthologs (2549 genes) than relatively distantly related species to sorghum, for example maize and arabidopsis only share 367 sorghum orthologous genes and rice and arabidopsis share 194 sorghum orthologs. Surprisingly, 2098 sorghum orthologs shared among all the species seemingly represent ancestral gene families. All the genes in the diagram represent sorghum orthologs in the respective species. The non-shared ones indicate the unique sorghum orthologs found only in the corresponding species. b shows the pattern of distribution of genes involved in key selected drought related GO-terms. Functional overlapping was indicated as a clue for gene network among categories involved in complex stress responses with some genes playing a rate limiting role. For example, two genes ‘Sb09g026860.1’ and ‘Sb07g014940.1’ are shared and act in all the pathways. Pathway controlling response to water deprivation shares 40 overlapping genes with the one controlling response to ABA stimulus and six genes with the pathway regulating response to heat (Additional file 2: Table S12). Similarly, the pathway controlling response to ABA stimulus and that controls response to heat share six genes between them. On the other hand, 265 unique sorghum orthologs were identified in total for drought related responses with almost equal proportion of unique genes associated to each of the three Go-terms
Fig. 8
Fig. 8
A summarized description of the outputs for the findings of the various analytical approaches. The Venn-diagram shows the number of identified genes and the corresponding percentage in a particular approach used in this study. The numbers in the peripheral regions, parts not overlapped, show unique findings of the particular method, whereas the numbers in the overlapping regions of the circles show the shared values among the methods. This description doesn’t include the results based on genome annotation. Seq_homology, denotes sequence homology
Fig. 9
Fig. 9
Pipeline for building gene structure models. Drought responsive genes were mapped to sorghum genome using UniGene clusters and TIGR transcripts. Sequences were downloaded as described in the method and were screened for quality using RepeatMasker and SeqClean. These were mapped to genome using e-value cutoff 1e-10. The raw out put was parsed and HSPs were extracted using in-house perl script. Percent identity with ≥80% was used to select the HSPs which were further consolidated along the genomic length of 2000 bp as described in the method. These were converted into GFF3 formats to extract associated genomic region that was aligned to the corresponding transcripts using EXONERATE and Blat to generate gene builds. Known and novel gene builds were classified by intersecting and subtracting the data sets respectively using galaxy genomic interval tool. Gene models were identified by AUGUSTUS and optimized by PASA (Additional file 2: Table S14). Finally, genes were visualized by loading the GFF3 formatted files onto the MySQL DB

Similar articles

See all similar articles

Cited by 5 PubMed Central articles

References

    1. Ghannoum O. C4 photosynthesis and water stress. Ann Bot. 2009;103:635–644. doi: 10.1093/aob/mcn093. - DOI - PMC - PubMed
    1. Magalhães PC, de Souza TC, Lavinsky AO, de Albuquerque PEP, de Oliveira LL, de Castro EM. Phenotypic plasticity of root system and shoots of Sorghum Bicolor under different soil water levels during pre-flowering stage. Aust J Crop Sci. 2016;10:81–87.
    1. Teshome A, Fahrig L, Torrance JK, Lambert JD, Arnason TJ, Baum BR. Maintenance of sorghum (Sorghum Bicolor, Poaceae) landrace diversity by farmers’ selection in Ethiopia. Econ Bot. 1999;53:79–88. doi: 10.1007/BF02860796. - DOI
    1. Abdi A, Asfaw Z. Situ (on-farm) conservation dynamics and the patterns of uses of sorghum (Sorghum Bicolor (L.) Moench) landraces in north Shewa and south Wollo, central highlands of Ethiopia. Ethiop. J Biol Sci. 2005;4:161–184.
    1. Abdi A, Bekele E, Asfaw Z, Teshome A. Patterns of morphological variation of sorghum (Sorghum Bicolor (L.) Moench) landraces in qualitative characters in north Shewa and south Welo, Ethiopia. Hereditas. 2002;137:161–172. doi: 10.1034/j.1601-5223.2002.01604.x. - DOI

Publication types

LinkOut - more resources

Feedback