Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations

Jun Zhu; Matthew C Wiener; Chunsheng Zhang; Arthur Fridman; Eric Minch; Pek Y Lum; Jeffrey R Sachs; Eric E Schadt

doi:10.1371/journal.pcbi.0030069

Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations

PLoS Comput Biol. 2007 Apr 13;3(4):e69. doi: 10.1371/journal.pcbi.0030069. Epub 2007 Feb 27.

Authors

Jun Zhu¹, Matthew C Wiener, Chunsheng Zhang, Arthur Fridman, Eric Minch, Pek Y Lum, Jeffrey R Sachs, Eric E Schadt

Affiliation

¹ Rosetta Inpharmatics, Seattle, Washington, United States of America.

Abstract

To dissect common human diseases such as obesity and diabetes, a systematic approach is needed to study how genes interact with one another, and with genetic and environmental factors, to determine clinical end points or disease phenotypes. Bayesian networks provide a convenient framework for extracting relationships from noisy data and are frequently applied to large-scale data to derive causal relationships among variables of interest. Given the complexity of molecular networks underlying common human disease traits, and the fact that biological networks can change depending on environmental conditions and genetic factors, large datasets, generally involving multiple perturbations (experiments), are required to reconstruct and reliably extract information from these networks. With limited resources, the balance of coverage of multiple perturbations and multiple subjects in a single perturbation needs to be considered in the experimental design. Increasing the number of experiments, or the number of subjects in an experiment, is an expensive and time-consuming way to improve network reconstruction. Integrating multiple types of data from existing subjects might be more efficient. For example, it has recently been demonstrated that combining genotypic and gene expression data in a segregating population leads to improved network reconstruction, which in turn may lead to better predictions of the effects of experimental perturbations on any given gene. Here we simulate data based on networks reconstructed from biological data collected in a segregating mouse population and quantify the improvement in network reconstruction achieved using genotypic and gene expression data, compared with reconstruction using gene expression data alone. We demonstrate that networks reconstructed using the combined genotypic and gene expression data achieve a level of reconstruction accuracy that exceeds networks reconstructed from expression data alone, and that fewer subjects may be required to achieve this superior reconstruction accuracy. We conclude that this integrative genomics approach to reconstructing networks not only leads to more predictive network models, but also may save time and money by decreasing the amount of data that must be generated under any given condition of interest to construct predictive network models.

MeSH terms

Animals
Computer Simulation
DNA Mutational Analysis / methods*
Gene Expression Profiling / methods*
Genetic Variation / genetics
Genotype
Mice
Models, Biological*
Multigene Family / physiology
Proteome / classification
Proteome / genetics*
Proteome / metabolism*
Signal Transduction / physiology*

Substances

Proteome