Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011;6(12):e28072.
doi: 10.1371/journal.pone.0028072. Epub 2011 Dec 22.

A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression From Multiple Organisms

Affiliations
Free PMC article

A Higher-Order Generalized Singular Value Decomposition for Comparison of Global mRNA Expression From Multiple Organisms

Sri Priya Ponnapalli et al. PLoS One. .
Free PMC article

Abstract

The number of high-dimensional datasets recording multiple aspects of a single phenomenon is increasing in many areas of science, accompanied by a need for mathematical frameworks that can compare multiple large-scale matrices with different row dimensions. The only such framework to date, the generalized singular value decomposition (GSVD), is limited to two matrices. We mathematically define a higher-order GSVD (HO GSVD) for N≥2 matrices D(i)∈R(m(i) × n), each with full column rank. Each matrix is exactly factored as D(i)=U(i)Σ(i)V(T), where V, identical in all factorizations, is obtained from the eigensystem SV=VΛ of the arithmetic mean S of all pairwise quotients A(i)A(j)(-1) of the matrices A(i)=D(i)(T)D(i), i≠j. We prove that this decomposition extends to higher orders almost all of the mathematical properties of the GSVD. The matrix S is nondefective with V and Λ real. Its eigenvalues satisfy λ(k)≥1. Equality holds if and only if the corresponding eigenvector v(k) is a right basis vector of equal significance in all matrices D(i) and D(j), that is σ(i,k)/σ(j,k)=1 for all i and j, and the corresponding left basis vector u(i,k) is orthogonal to all other vectors in U(i) for all i. The eigenvalues λ(k)=1, therefore, define the "common HO GSVD subspace." We illustrate the HO GSVD with a comparison of genome-scale cell-cycle mRNA expression from S. pombe, S. cerevisiae and human. Unlike existing algorithms, a mapping among the genes of these disparate organisms is not required. We find that the approximately common HO GSVD subspace represents the cell-cycle mRNA expression oscillations, which are similar among the datasets. Simultaneous reconstruction in the common subspace, therefore, removes the experimental artifacts, which are dissimilar, from the datasets. In the simultaneous sequence-independent classification of the genes of the three organisms in this common subspace, genes of highly conserved sequences but significantly different cell-cycle peak times are correctly classified.

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Higher-order generalized singular value decomposition (HO GSVD).
In this raster display of Equation (1) with overexpression (red), no change in expression (black), and underexpression (green) centered at gene- and array-invariant expression, the S. pombe, S. cerevisiae and human global mRNA expression datasets are tabulated as organism-specific genesformula image17-arrays matrices formula image, formula image and formula image. The underlying assumption is that there exists a one-to-one mapping among the 17 columns of the three matrices but not necessarily among their rows. These matrices are transformed to the reduced diagonalized matrices formula image, formula image and formula image, each of 17-“arraylets,” i.e., left basis vectorsformula image17-“genelets,” i.e., right basis vectors, by using the organism-specific genesformula image17-arraylets transformation matrices formula image, formula image and formula image and the shared 17-geneletsformula image17-arrays transformation matrix formula image. We prove that with our particular formula image of Equations (2)–(4), this decomposition extends to higher orders all of the mathematical properties of the GSVD except for complete column-wise orthogonality of the arraylets, i.e., left basis vectors that form the matrices formula image, formula image and formula image. We therefore mathematically define, in analogy with the GSVD, the “common HO GSVD subspace” of the formula image matrices to be the subspace spanned by the genelets, i.e., right basis vectors formula image that correspond to higher-order generalized singular values that are equal, formula image, where, as we prove, the corresponding arraylets, i.e., the left basis vectors formula image, formula image and formula image, are orthonormal to all other arraylets in formula image, formula image and formula image. We show that like the GSVD for two organisms , the HO GSVD provides a sequence-independent comparative mathematical framework for datasets from more than two organisms, where the mathematical variables and operations represent biological reality: Genelets of common significance in the multiple datasets, and the corresponding arraylets, represent cell-cycle checkpoints or transitions from one phase to the next, common to S. pombe, S. cerevisiae and human. Simultaneous reconstruction and classification of the three datasets in the common subspace that these patterns span outline the biological similarity in the regulation of their cell-cycle programs. Notably, genes of significantly different cell-cycle peak times but highly conserved sequences , are correctly classified.
Figure 2
Figure 2. Genelets or right basis vectors.
(a) Raster display of the expression of the 17 genelets, i.e., HO GSVD patterns of expression variation across time, with overexpression (red), no change in expression (black) and underexpression (green) around the array-, i.e., time-invariant expression. (b) Bar chart of the corresponding inverse eigenvalues formula image, showing that the 13th through the 17th genelets correspond to formula image. (c) Line-joined graphs of the 13th (red), 14th (blue) and 15th (green) genelets in the two-dimensional subspace that approximates the five-dimensional HO GSVD subspace (Figure S4 and Section 2.4), normalized to zero average and unit variance. (d) Line-joined graphs of the projected 16th (orange) and 17th (violet) genelets in the two-dimensional subspace. The five genelets describe expression oscillations of two periods in the three time courses.
Figure 3
Figure 3. Common HO GSVD subspace represents similar cell-cycle oscillations.
(ac) S. pombe, S. cerevisiae and human array expression, projected from the five-dimensional common HO GSVD subspace onto the two-dimensional subspace that approximates it (Sections 2.3 and 2.4 in Appendix S1). The arrays are color-coded according to their previous cell-cycle classification –. The arrows describe the projections of the formula image arraylets of each dataset. The dashed unit and half-unit circles outline 100% and 50% of added-up (rather than canceled-out) contributions of these five arraylets to the overall projected expression. (df) Expression of 380, 641 and 787 cell cycle-regulated genes of S. pombe, S. cerevisiae and human, respectively, color-coded according to previous classifications. (gi) The HO GSVD pictures of the S. pombe, S. cerevisiae and human cell-cycle programs. The arrows describe the projections of the formula image shared genelets and organism-specific arraylets that span the common HO GSVD subspace and represent cell-cycle checkpoints or transitions from one phase to the next.
Figure 4
Figure 4. Simultaneous HO GSVD classification of homologous genes of different cell-cycle peak times.
(a) The S. pombe gene BFR1, and (b) its closest S. cerevisiae homologs. (c) The S. pombe and (d) S. cerevisiae closest homologs of the S. cerevisiae gene PLB1. (e) The S. pombe cyclin-encoding gene CIG2 and its closest S. pombe, (f) S. cerevisiae and (g) human homologs.

Similar articles

See all similar articles

Cited by 27 articles

See all "Cited by" articles

References

    1. Jensen LJ, Jensen TS, de Lichtenberg U, Brunak S, Bork P. Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature. 2006;443:594–597. - PubMed
    1. Lu Y, Huggins P, Bar-Joseph Z. Cross species analysis of microarray expression data. Bioinformatics. 2009;25:1476–1483. - PMC - PubMed
    1. Mushegian AR, Koonin EV. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc Natl Acad Sci USA. 1996;93:10268–10273. - PMC - PubMed
    1. Golub GH, Van Loan CF. Matrix Computations. Baltimore: Johns Hopkins University Press, third edition; 1996. 694
    1. Van Loan CF. Generalizing the singular value decomposition. SIAM J Numer Anal. 1976;13:76–83.

Publication types

LinkOut - more resources

Feedback