Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 26 (12), 1520-7

FABIA: Factor Analysis for Bicluster Acquisition

Affiliations

FABIA: Factor Analysis for Bicluster Acquisition

Sepp Hochreiter et al. Bioinformatics.

Abstract

Motivation: Biclustering of transcriptomic data groups genes and samples simultaneously. It is emerging as a standard tool for extracting knowledge from gene expression measurements. We propose a novel generative approach for biclustering called 'FABIA: Factor Analysis for Bicluster Acquisition'. FABIA is based on a multiplicative model, which accounts for linear dependencies between gene expression and conditions, and also captures heavy-tailed distributions as observed in real-world transcriptomic data. The generative framework allows to utilize well-founded model selection methods and to apply Bayesian techniques.

Results: On 100 simulated datasets with known true, artificially implanted biclusters, FABIA clearly outperformed all 11 competitors. On these datasets, FABIA was able to separate spurious biclusters from true biclusters by ranking biclusters according to their information content. FABIA was tested on three microarray datasets with known subclusters, where it was two times the best and once the second best method among the compared biclustering approaches.

Availability: FABIA is available as an R package on Bioconductor (http://www.bioconductor.org). All datasets, results and software are available at http://www.bioinf.jku.at/software/fabia/fabia.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

Figures

Fig. 1.
Fig. 1.
The outer product λ zT of two sparse vectors results in a matrix with a bicluster. Note that the non-zero entries in the vectors are adjacent to each other for visualization purposes only.
Fig. 2.
Fig. 2.
An example of FABIA model selection. The data have 10 true biclusters. We have trained the model with 13 biclusters. Only for visualization purposes, the biclusters are generated as contiguous blocks. Top: data (left) and noise-free data (right). Middle: factors Z. Bottom: data reconstructed by the FABIA model as Λ Z (left) and loadings Λ (right). The lines indicate three biclusters and connect each bicluster in the reconstructed data with its corresponding factors (middle) and loadings (bottom right).

Similar articles

See all similar articles

Cited by 56 PubMed Central articles

See all "Cited by" articles

References

    1. Barkow S, et al. BicAT: a biclustering analysis toolbox. Bioinformatics. 2006;22:1282–1283. - PubMed
    1. Ben-Dor A, et al. Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol. 2003;10:373–384. - PubMed
    1. Bithas PS, et al. Proceedings of the International Conference on Applied Stochastic Models and Data Analysis. Vol. 12. Chania: 2007. Distributions involving correlated generalized gamma variables.
    1. Busygin S, et al. Proceedings of the 2nd SIAM International Conference on Data Mining/Workshop on Clustering High Dimensional Data. Arlington, VA, USA: 2002. Double conjugated clustering applied to leukemia microarray data.
    1. Caldas J, Kaski S. Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing. XVIII. Cancún, Mexico: 2008. Bayesian biclustering with the plaid model; pp. 291–296.

Publication types

MeSH terms

Feedback