Identification of genes associated with specific biological phenotypes is a fundamental step toward understanding the molecular basis underlying development and pathogenesis. Although RNAi-based high-throughput screens are routinely used for this task, false discovery and sensitivity remain a challenge. Here we describe a computational framework for systematic integration of published gene expression data to identify genes defining a phenotype of interest. We applied our approach to rank-order all genes based on their likelihood of determining ES cell (ESC) identity. RNAi-mediated loss-of-function experiments on top-ranked genes unearthed many novel determinants of ESC identity, thus validating the derived gene ranks to serve as a rich and valuable resource for those working to uncover novel ESC regulators. Underscoring the value of our gene ranks, functional studies of our top-hit Nucleolin (Ncl), abundant in stem and cancer cells, revealed Ncl's essential role in the maintenance of ESC homeostasis by shielding against differentiation-inducing redox imbalance-induced oxidative stress. Notably, we report a conceptually novel mechanism involving a Nucleolin-dependent Nanog-p53 bistable switch regulating the homeostatic balance between self-renewal and differentiation in ESCs. Our findings connect the dots on a previously unknown regulatory circuitry involving genes associated with traits in both ESCs and cancer and might have profound implications for understanding cell fate decisions in cancer stem cells. The proposed computational framework, by helping to prioritize and preselect candidate genes for tests using complex and expensive genetic screens, provides a powerful yet inexpensive means for identification of key cell identity genes.
Keywords: RNA-binding protein; ROS; computational biology; pluripotency; transcription.