Literature-based priors for gene regulatory networks

E Steele; A Tucker; P A C 't Hoen; M J Schuemie

doi:10.1093/bioinformatics/btp277

Literature-based priors for gene regulatory networks

Bioinformatics. 2009 Jul 15;25(14):1768-74. doi: 10.1093/bioinformatics/btp277. Epub 2009 Apr 23.

Authors

E Steele¹, A Tucker, P A C 't Hoen, M J Schuemie

Affiliation

¹ Centre for Intelligent Data Analysis, School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge UB8 3PH, UK. emma.steele@brunel.ac.uk

PMID: 19389730
DOI: 10.1093/bioinformatics/btp277

Abstract

Motivation: The use of prior knowledge to improve gene regulatory network modelling has often been proposed. In this article we present the first research on the massive incorporation of prior knowledge from literature for Bayesian network learning of gene networks. As the publication rate of scientific papers grows, updating online databases, which have been proposed as potential prior knowledge in past research, becomes increasingly challenging. The novelty of our approach lies in the use of gene-pair association scores that describe the overlap in the contexts in which the genes are mentioned, generated from a large database of scientific literature, harnessing the information contained in a huge number of documents into a simple, clear format.

Results: We present a method to transform such literature-based gene association scores to network prior probabilities, and apply it to learn gene sub-networks for yeast, Escherichia coli and Human organisms. We also investigate the effect of weighting the influence of the prior knowledge. Our findings show that literature-based priors can improve both the number of true regulatory interactions present in the network and the accuracy of expression value prediction on genes, in comparison to a network learnt solely from expression data. Networks learnt with priors also show an improved biological interpretation, with identified subnetworks that coincide with known biological pathways.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Computer Simulation
Databases, Genetic*
Gene Expression Profiling / methods
Gene Regulatory Networks*
Humans
Proteome

Substances

Proteome