Collaborative Completion of Transcription Factor Binding Profiles via Local Sensitive Unified Embedding

IEEE Trans Nanobioscience. 2016 Dec;15(8):946-958. doi: 10.1109/TNB.2016.2625823. Epub 2016 Nov 7.


Although the newly available ChIP-seq data provides immense opportunities for comparative study of regulatory activities across different biological conditions, due to cost, time or sample material availability, it is not always possible for researchers to obtain binding profiles for every protein in every sample of interest, which considerably limits the power of integrative studies. Recently, by leveraging related information from measured data, Ernst et al. proposed ChromImpute for predicting additional ChIP-seq and other types of datasets, it is demonstrated that the imputed signal tracks accurately approximate the experimentally measured signals, and thereby could potentially enhance the power of integrative analysis. Despite the success of ChromImpute, in this paper, we reexamine its learning process, and show that its performance may degrade substantially and sometimes may even fail to output a prediction when the available data is scarce. This limitation could hurt its applicability to important predictive tasks, such as the imputation of TF binding data. To alleviate this problem, we propose a novel method called Local Sensitive Unified Embedding (LSUE) for imputing new ChIP-seq datasets. In LSUE, the ChIP-seq data compendium are fused together by mapping proteins, samples, and genomic positions simultaneously into the Euclidean space, thereby making their underling associations directly evaluable using simple calculations. In contrast to ChromImpute which mainly makes use of the local correlations between available datasets, LSUE can better estimate the overall data structure by formulating the representation learning of all involved entities as a single unified optimization problem. Meanwhile, a novel form of local sensitive low rank regularization is also proposed to further improve the performance of LSUE. Experimental evaluations on the ENCODE TF ChIP-seq data illustrate the performance of the proposed model. The code of LSUE is available at

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites / genetics*
  • Cell Line, Tumor
  • Chromatin Immunoprecipitation / methods*
  • Computational Biology / methods*
  • Databases, Genetic
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Oligonucleotide Array Sequence Analysis / methods*
  • Sequence Analysis, DNA / methods*
  • Transcription Factors / chemistry
  • Transcription Factors / genetics
  • Transcription Factors / metabolism*


  • Transcription Factors