Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 33 (14), i59-i66

Rectified Factor Networks for Biclustering of Omics Data

Affiliations

Rectified Factor Networks for Biclustering of Omics Data

Djork-Arné Clevert et al. Bioinformatics.

Abstract

Motivation: Biclustering has become a major tool for analyzing large datasets given as matrix of samples times features and has been successfully applied in life sciences and e-commerce for drug design and recommender systems, respectively. actor nalysis for cluster cquisition (FABIA), one of the most successful biclustering methods, is a generative model that represents each bicluster by two sparse membership vectors: one for the samples and one for the features. However, FABIA is restricted to about 20 code units because of the high computational complexity of computing the posterior. Furthermore, code units are sometimes insufficiently decorrelated and sample membership is difficult to determine. We propose to use the recently introduced unsupervised Deep Learning approach Rectified Factor Networks (RFNs) to overcome the drawbacks of existing biclustering methods. RFNs efficiently construct very sparse, non-linear, high-dimensional representations of the input via their posterior means. RFN learning is a generalized alternating minimization algorithm based on the posterior regularization method which enforces non-negative and normalized posterior means. Each code unit represents a bicluster, where samples for which the code unit is active belong to the bicluster and features that have activating weights to the code unit belong to the bicluster.

Results: On 400 benchmark datasets and on three gene expression datasets with known clusters, RFN outperformed 13 other biclustering methods including FABIA. On data of the 1000 Genomes Project, RFN could identify DNA segments which indicate, that interbreeding with other hominins starting already before ancestors of modern humans left Africa.

Availability and implementation: https://github.com/bioinf-jku/librfn.

Contact: djork-arne.clevert@bayer.com or hochreit@bioinf.jku.at.

Figures

Fig. 1
Fig. 1
Left: Factor analysis model: hidden units (factors) h, visible units v, weight matrix W, noise ε. Right: The outer product whT of two sparse vectors results in a matrix with a bicluster. Note that the non-zero entries in the vectors are adjacent to each other for visualization purposes only
Fig. 2
Fig. 2
Runtime comparison of FABIA and RFN for 10, 30, 100, 300 and 500 biclusters on synthetic inputs of n = 500 features and l = 1000 samples for 100 iterations each. Shown data are the median of five measurements, error bars are standard errors of the mean
Fig. 3
Fig. 3
Example of an IBD segment matching the Neanderthal genome shared among Africans and Admixed Americans. The rows represent all individuals that have the IBD segment, and columns represent consecutive SNVs. Major alleles are shown in yellow, minor alleles of tagSNVs in violet, and minor alleles of other SNVs in cyan. The row labeled model L indicates tagSNVs identified by RFN in violet. The rows Ancestor, Neanderthal and Denisova show bases of the respective genomes in violet if they match the minor allele of the tagSNVs (in yellow otherwise). For the Ancestor genome we used the reconstructed common ancestor sequence that was provided as part of the 1000 Genomes Project data

Similar articles

See all similar articles

Cited by 1 PubMed Central articles

References

    1. Ben-Dor A. et al. (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol., 10, 373–384. - PubMed
    1. Bertsekas D.P. (1976) On the Goldstein-Levitin-Polyak gradient projection method. IEEE Trans. Automat. Control, 21, 174–184.
    1. Browning B.L., Browning S.R. (2011) A fast, powerful method for detecting identity by descent. Am. J. Hum. Genet., 88, 173–182. - PMC - PubMed
    1. Chekouo T. et al. (2015) The gibbs-plaid biclustering model. Ann. Appl. Stat., 9, 1643–1670.
    1. Cheng Y., Church G.M. (2000) Biclustering of expression data. In Proceedings of the International Conference on Intelligent Systems for Molecular Biology, Vol. 8, San Diego, U.S.A., pp. 93–103.

MeSH terms

Feedback