Genetical genomics: use all data

BMC Genomics. 2007 Mar 12;8:69. doi: 10.1186/1471-2164-8-69.


Background: Genetical genomics is a very powerful tool to elucidate the basis of complex traits and disease susceptibility. Despite its relevance, however, statistical modeling of expression quantitative trait loci (eQTL) has not received the attention it deserves. Based on two reasonable assertions (i) a good model should consider all available variables as potential effects, and (ii) gene expressions are highly interconnected, we suggest that an eQTL model should consider the rest of expression levels as potential regressors, in addition to the markers.

Results: It is shown that power can be increased with this strategy. We also show, using classical statistical and support vector machines techniques in a reanalysis of public data, that the external transcripts, i.e., transcripts other than the one being analysed, explain on average much more variability than the markers themselves. The presence of eQTL hotspots is reassessed in the light of these results.

Conclusion: Model choice is a critical yet neglected issue in genetical genomics studies. Although we are far from having a general strategy for model choice in this area, we can at least propose that any transcript level is scanned not only for the markers genotyped but also for the rest of gene expression levels. Some sort of stepwise regression strategy can be used to select the final model.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Computational Biology
  • Gene Expression*
  • Genomics / methods*
  • Likelihood Functions
  • Mice
  • Models, Genetic*
  • Oligonucleotide Array Sequence Analysis
  • Quantitative Trait Loci*