Integrating sequence and gene expression information predicts genome-wide DNA-binding proteins and suggests a cooperative mechanism

Nucleic Acids Res. 2018 Jan 9;46(1):54-70. doi: 10.1093/nar/gkx1166.

Abstract

DNA-binding proteins (DBPs) perform diverse biological functions ranging from transcription to pathogen sensing. Machine learning methods can not only identify DBPs de novo but also provide insights into their DNA-recognition dynamics. However, it remains unclear whether available methods that can accurately predict DNA-binding sites in known DBPs can also identify novel DBPs. Moreover, sequence information is blind to the cellular- and disease-specific contexts of DBP activities, whereas the under-utilized knowledge from public gene expression data offers great promise. To address these issues, we have developed novel methods for predicting DBPs by integrating sequence and gene expression-derived features and applied them to explore human, mouse and Arabidopsis proteomes. While our sequence-based models outperformed the gene expression-based ones, some proteins with weaker DBP-like sequence features were correctly predicted by gene expression-based features, suggesting that these proteins acquire a tangible DBP functionality in a conducive gene expression environment. Analysis of motif enrichment among the co-expressed genes of top 100 candidates DBPs from hitherto unannotated genes provides further avenues to explore their functional associations.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Arabidopsis / genetics
  • Arabidopsis / metabolism
  • Binding Sites / genetics
  • DNA / genetics
  • DNA / metabolism
  • DNA-Binding Proteins / genetics*
  • DNA-Binding Proteins / metabolism
  • Gene Expression Profiling*
  • Gene Ontology
  • Genome / genetics*
  • Genomics / methods*
  • Humans
  • Mice
  • Protein Binding
  • Proteome / genetics
  • Proteome / metabolism
  • Transcription Factors / genetics
  • Transcription Factors / metabolism

Substances

  • DNA-Binding Proteins
  • Proteome
  • Transcription Factors
  • DNA