Continuous-Trait Probabilistic Model for Comparing Multi-species Functional Genomic Data

Cell Syst. 2018 Aug 22;7(2):208-218.e11. doi: 10.1016/j.cels.2018.05.022. Epub 2018 Jun 20.


A large amount of multi-species functional genomic data from high-throughput assays are becoming available to help understand the molecular mechanisms for phenotypic diversity across species. However, continuous-trait probabilistic models, which are key to such comparative analysis, remain under-explored. Here we develop a new model, called phylogenetic hidden Markov Gaussian processes (Phylo-HMGP), to simultaneously infer heterogeneous evolutionary states of functional genomic features in a genome-wide manner. Both simulation studies and real data application demonstrate the effectiveness of Phylo-HMGP. Importantly, we applied Phylo-HMGP to analyze a new cross-species DNA replication timing (RT) dataset from the same cell type in five primate species (human, chimpanzee, orangutan, gibbon, and green monkey). We demonstrate that our Phylo-HMGP model enables discovery of genomic regions with distinct evolutionary patterns of RT. Our method provides a generic framework for comparative analysis of multi-species continuous functional genomic signals to help reveal regions with conserved or lineage-specific regulatory roles.

Keywords: comparative genomics; continuous-trait probabilistic model; phylogenetic hidden Markov Gaussian processes; replication timing.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Animals
  • DNA Replication*
  • Evolution, Molecular*
  • Genomics / methods*
  • Humans
  • Markov Chains
  • Models, Genetic*
  • Models, Statistical
  • Phenotype
  • Software
  • Species Specificity