Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun 26;18(1):480.
doi: 10.1186/s12864-017-3853-9.

Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction

Affiliations

Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction

Flavia Bossi et al. BMC Genomics. .

Abstract

Background: The molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized.

Results: To identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation. We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription.

Conclusions: Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.

Keywords: Coactivators; Genes with unknown function; Polycomb repressive complex 2; Transcriptional regulators.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Feature-based prediction pipeline to identify novel transcriptional regulator families. a Pipeline work flow: First, Arabidopsis protein families were filtered based on their size and the GO annotations of their members. Then, uncharacterized families with more than 2 members were filtered based on subcellular localization patterns using Yloc [98], percentage of disordered residues using Predisorder [99], and the ability of at least one member to activate transcription of a reporter gene in yeast (autoactivation) [, –107]. Numbers in the Venn diagram represent the number of families with most members being nuclear localized (blue), high percentage of disordered residues (green) and autoactivation in yeast (red). Families that met all criteria (intersection of the Venn diagram) were considered as candidate regulator families. b-j Proportion of proteins predicted to contain nuclear localization signal (NLS) (b-d), distribution of the percentage of disordered amino acid residues (e-g), and proportion of proteins with autoactivation activity (h-j) in the background (white), TFs (dark gray), and predicted regulators (light gray). The background corresponds to all proteins in Arabidopsis (b, e, h), fruit fly (c, f, i), or human (d, g, j) genomes or the set of proteins that were tested for autoactivation in yeast (h, i, j). * = p-value <0.0001, chi-square test with Yates correction (b-d and h-j) or t-test (e-g)
Fig. 2
Fig. 2
Ortholog distribution of the predicted regulator families. a Proportion of families that contain proteins from one, two, three or four species in all the families and in the predicted transcriptional regulator families generated by OrthoMCL [109]. b-d Ortholog distribution of the predicted regulator families in Arabidopsis (b), fruit fly (c), and human (d) using data from Ensembl Genomes [50] to classify taxon specificity of the candidate families within each taxonomic domain
Fig. 3
Fig. 3
Experimental analysis of the predictions. a Steps of the in planta transactivation assay procedure from bacterial growth to quantification of the normalized transactivation activity. b Constructs used in the transactivation assay. c Average relative transactivation activity calculated as the GUS activity (nmol of 4MU/min/mg total protein) divided by the concentration of the effector protein (ng/ml). Error bars represent standard error from 3 independent experiments. The asterisk (*) indicates that the relative activity is statistically different from the YFP control (p-value <0.002, t-test). A line under the gene names indicates that they belong to the same family
Fig. 4
Fig. 4
Mutants lacking CHIQ1 have smaller organs. a-b, Whole plants (a) or rosette leaves (b), of wild type (Col-0, left or top), chiq1–1 (middle), and chiq1–1 complemented with CHIQ1 (B12, right or bottom) grown in soil for 7 weeks. Leaves are ordered from the oldest (left) to the youngest (right). c Height of the primary inflorescence stem in wild type (black), chiq1–1 (white), and complemented (gray) plants grown in soil for 11 weeks. Stature of chiq1–1 plants is reduced by 53% compared to the wild type and 42% compared to the complemented line (* = p-value: 2E-34 against wild type and 2E-25 against complemented line, t-test). n = 30 per genotype from 8 independent experiments. d Measurements of leaf area from wild type (black), chiq1–1 (white), and complemented (gray) plants grown in soil for 7 weeks. n = 8 per genotype from 3 independent experiments. c-d Error bars represent standard error from 3 independent experiments. e-g Expression of the CHIQ1-GUS transgene driven by CHIQ1 promoter in the root apical meristem (e), shoot apical meristem and leaf primordia of 2 day-old seedlings (f) and rosette of 14 day-old plants (g) grown on MS media. Each image is a representative of at least three independent experiments with n = 10 plants. At least three independent transgenic lines were analyzed
Fig. 5
Fig. 5
CHIQ1 family interacts with EMF2. a Phylogenetic tree of Arabidopsis CHIQ1 family (left) made using Phylogeny.fr [121] and motif conservation in CHIQ1 protein family (right) predicted by MEME [122]. Motifs 1 and 4 correspond to the DUF641 domain. Height of the domains indicates the degree of conservation, where taller domains are more conserved than shorter ones. CHIQ1 is in blue and CHIQ1’s interactors in red. CHIQL6 (TAIR: AT1G29300), CHIQL7 (TAIR: AT2G32130), CHIQL4 (TAIR: AT3G14870), CHIQL5 (TAIR: AT1G53380), CHIQL8 (TAIR: AT2G30380), CHIQL3 (TAIR: AT4G36100), CHIQL2 (TAIR: AT4G33320), CHIQ1 (TAIR: AT2G45260), CHIQL1 (TAIR: AT4G34080), CHIQL9 (TAIR: AT3G60680), CHIQL10 (TAIR: AT5G58960). b-d Physical interaction between CHIQ1, CHIQL6, CHIQL5, and EMF2 based on yeast two-hybrid assays (b), pull-down assays in tobacco (c), and bimolecular fluorescence complementation assays in Arabidopsis protoplasts (d). Pull-down assays were performed with anti-FLAG antibody and we used anti-GFP antibody to detect CHIQ1, anti-GST antibody to detect EMF2, and anti-FLAG antibody to detect CHIQL6, CHIQL5, and EMF2 in the eluted immuno-precipitate. The input corresponds to the total protein extract and IP is the eluted immuno-precipitate. Error bars in (b) represent standard error. * = p-value <0.001, t-test. In (d), green indicates fluorescence from reconstituted Venus fluorescent protein. Red indicates autofluorescence from the chloroplast. The percentage corresponds to the fraction of cells expressing Venus in each sample. Representative images from three independent experiments are shown (n = 258–321 cells per pair per experiment)

Similar articles

Cited by

References

    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al. Gene Ontology: tool for the unification of biology. Nat Genet. 2000;25(1):25–29. doi: 10.1038/75556. - DOI - PMC - PubMed
    1. Anton BP, Chang YC, Brown P, Choi HP, Faller LL, Guleria J, Hu Z, Klitgord N, Levy-Moonshine A, Maksad A, et al. The COMBREX project: design, methodology, and initial results. PLoS Biol. 2013;11(8):e1001638. doi: 10.1371/journal.pbio.1001638. - DOI - PMC - PubMed
    1. Pandey AK, Lu L, Wang X, Homayouni R, Williams RW. Functionally enigmatic genes: a case study of the brain ignorome. PLoS One. 2014;9(2):e88889. doi: 10.1371/journal.pone.0088889. - DOI - PMC - PubMed
    1. Rhee SY, Mutwil M. Towards revealing the functions of all genes in plants. Trends Plant Sci. 2014;19(4):212–221. doi: 10.1016/j.tplants.2013.10.006. - DOI - PubMed
    1. Pena-Castillo L, Hughes TR. Why are there still over 1000 uncharacterized yeast genes? Genetics. 2007;176(1):7–14. doi: 10.1534/genetics.107.074468. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources