Background: The regulatory processes that govern cell proliferation and differentiation are central to developmental biology. Particularly well studied in this respect is the lymphoid system due to its importance for basic biology and for clinical applications. Gene expression measured in lymphoid cells in several distinguishable developmental stages helps in the elucidation of underlying molecular processes, which change gradually over time and lock cells in either the B cell, T cell or Natural Killer cell lineages. Large-scale analysis of these gene expression trees requires computational support for tasks ranging from visualization, querying, and finding clusters of similar genes, to answering detailed questions about the functional roles of individual genes.
Results: We present the first statistical framework designed to analyze gene expression data as it is collected in the course of lymphoid development through clusters of co-expressed genes and additional heterogeneous data. We introduce dependence trees for continuous variates, which model the inherent dependencies during the differentiation process naturally as gene expression trees. Several trees are combined in a mixture model to allow inference of potentially overlapping clusters of co-expressed genes. Additionally, we predict microRNA targets.
Conclusion: Computational results for several data sets from the lymphoid system demonstrate the relevance of our framework. We recover well-known biological facts and identify promising novel regulatory elements of genes and their functional assignments. The implementation of our method (licensed under the GPL) is available at http://algorithmics.molgen.mpg.de/Supplements/ExpLym/.