Cross-platform comparison and visualisation of gene expression data using co-inertia analysis

BMC Bioinformatics. 2003 Nov 21:4:59. doi: 10.1186/1471-2105-4-59.

Abstract

Background: Rapid development of DNA microarray technology has resulted in different laboratories adopting numerous different protocols and technological platforms, which has severely impacted on the comparability of array data. Current cross-platform comparison of microarray gene expression data are usually based on cross-referencing the annotation of each gene transcript represented on the arrays, extracting a list of genes common to all arrays and comparing expression data of this gene subset. Unfortunately, filtering of genes to a subset represented across all arrays often excludes many thousands of genes, because different subsets of genes from the genome are represented on different arrays. We wish to describe the application of a powerful yet simple method for cross-platform comparison of gene expression data. Co-inertia analysis (CIA) is a multivariate method that identifies trends or co-relationships in multiple datasets which contain the same samples. CIA simultaneously finds ordinations (dimension reduction diagrams) from the datasets that are most similar. It does this by finding successive axes from the two datasets with maximum covariance. CIA can be applied to datasets where the number of variables (genes) far exceeds the number of samples (arrays) such is the case with microarray analyses.

Results: We illustrate the power of CIA for cross-platform analysis of gene expression data by using it to identify the main common relationships in expression profiles on a panel of 60 tumour cell lines from the National Cancer Institute (NCI) which have been subjected to microarray studies using both Affymetrix and spotted cDNA array technology. The co-ordinates of the CIA projections of the cell lines from each dataset are graphed in a bi-plot and are connected by a line, the length of which indicates the divergence between the two datasets. Thus, CIA provides graphical representation of consensus and divergence between the gene expression profiles from different microarray platforms. Secondly, the genes that define the main trends in the analysis can be easily identified.

Conclusions: CIA is a robust, efficient approach to coupling of gene expression datasets. CIA provides simple graphical representations of the results making it a particularly attractive method for the identification of relationships between large datasets.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Breast Neoplasms / genetics
  • Breast Neoplasms / pathology
  • Carcinoma, Non-Small-Cell Lung / genetics
  • Carcinoma, Non-Small-Cell Lung / pathology
  • Cell Line
  • Cell Line, Tumor
  • Central Nervous System Neoplasms / genetics
  • Central Nervous System Neoplasms / pathology
  • Computational Biology / methods*
  • Computer Graphics / statistics & numerical data
  • Databases, Genetic / statistics & numerical data
  • Epithelial Cells / cytology
  • Epithelial Cells / pathology
  • Gene Expression Profiling / statistics & numerical data*
  • Gene Expression Regulation, Neoplastic / genetics
  • Genes, Neoplasm / genetics
  • Glioblastoma / genetics
  • Glioblastoma / pathology
  • HT29 Cells / chemistry
  • HT29 Cells / metabolism
  • Humans
  • Kidney Neoplasms / genetics
  • Kidney Neoplasms / pathology
  • Leukemia / genetics
  • Leukemia / pathology
  • Lung Neoplasms / genetics
  • Lung Neoplasms / pathology
  • Melanoma / genetics
  • Melanoma / pathology
  • Melanoma / secondary
  • Mesoderm / cytology
  • Mesoderm / pathology
  • Models, Statistical