Background: The biomedical literature is a rich source of associative information but too vast for complete manual review. We have developed an automated method of literature interrogation called "Literature Lab" that identifies and ranks associations existing in the literature between gene sets, such as those derived from microarray experiments, and curated sets of key terms (i.e. pathway names, medical subject heading (MeSH) terms, etc).
Results: Literature Lab was developed using differentially expressed gene sets from three previously published cancer experiments and tested on a fourth, novel gene set. When applied to the genesets from the published data including an in vitro experiment, an in vivo mouse experiment, and an experiment with human tumor samples, Literature Lab correctly identified known biological processes occurring within each experiment. When applied to a novel set of genes differentially expressed between locally invasive and metastatic prostate cancer, Literature Lab identified a strong association between the pathway term "FOSB" and genes with increased expression in metastatic prostate cancer. Immunohistochemistry subsequently confirmed increased nuclear FOSB staining in metastatic compared to locally invasive prostate cancers.
Conclusion: This work demonstrates that Literature Lab can discover key biological processes by identifying meritorious associations between experimentally derived gene sets and key terms within the biomedical literature.