Higher-order partial least squares for predicting gene expression levels from chromatin states

BMC Bioinformatics. 2018 Apr 11;19(Suppl 5):113. doi: 10.1186/s12859-018-2100-y.

Abstract

Background: Extensive studies have shown that gene expression levels are strongly affected by chromatin mark combinations via at least two mechanisms, i.e., activation or repression. But their combinatorial patterns are still unclear. To further understand the relationship between histone modifications and gene expression levels, here in this paper, we introduce a purely geometric higher-order representation, tensor (also called multidimensional array), which might borrow more unknown interactions in chromatin states to predicting gene expression levels.

Results: The prediction models were learned from regions around upstream 10k base pairs and downstream 10k base pairs of the transcriptional start sites (TSSs) on three species (i.e., Human, Rhesus Macaque, and Chimpanzee) with five histone modifications (i.e., H3K4me1, H3K4me3, H3K27ac, H3K27me3, and Pol II). Experimental results demonstrate that the proposed method is more powerful to predicting gene expression levels than several other popular methods. Specifically, our method enable to get more powerful performance on both commonly used criteria, R and RMSE, as high as 1.7% and 11%, respectively.

Conclusions: The overall aim of this work is to show that the higher-order representation is able to include more unknown interaction information between histone modifications across different species.

Keywords: Chromatin states; Gene expression levels; Higher-order partial least squares; Histone modification; Tensor decomposition.

MeSH terms

  • Algorithms
  • Animals
  • Cell Line
  • Chromatin / metabolism*
  • Computer Simulation
  • Gene Expression Regulation*
  • Humans
  • Least-Squares Analysis
  • Macaca mulatta / genetics
  • Pan troglodytes / genetics
  • Transcription Initiation Site

Substances

  • Chromatin