A multivariate approach applied to microarray data for identification of genes with cell cycle-coupled transcription

Bioinformatics. 2003 Mar 1;19(4):467-73. doi: 10.1093/bioinformatics/btg017.


We have analyzed microarray data using a modeling approach based on the multivariate statistical method partial least squares (PLS) regression to identify genes with periodic fluctuations in expression levels coupled to the cell cycle in the budding yeast, Saccharomyces cerevisiae. PLS has major advantages for analyzing microarray data since it can model data sets with large numbers of variables and with few observations. A response model was derived describing the expression profile over time expected for periodically transcribed genes, and was used to identify budding yeast transcripts with similar profiles. PLS was then used to interpret the importance of the variables (genes) for the model, yielding a ranking list of how well the genes fitted the generated model. Application of an appropriate cutoff value, calculated from randomized data, allows the identification of genes whose expression appears to be synchronized with cell cycling. Our approach also provides information about the stage in the cell cycle where their transcription peaks. Three synchronized yeast cell microarray data sets were analyzed, both separately and combined. Cell cycle-coupled periodicity was suggested for 455 of the 6,178 transcripts monitored in the combined data set, at a significance level of 0.5%. Among the candidates, 85% of the known periodic transcripts were included. Analysis of the three data sets separately yielded similar ranking lists, showing that the method is robust.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Cell Cycle / genetics*
  • Cell Cycle / physiology
  • Gene Expression Profiling / methods
  • Gene Expression Regulation / physiology*
  • Least-Squares Analysis
  • Models, Genetic
  • Models, Statistical
  • Multivariate Analysis
  • Oligonucleotide Array Sequence Analysis / methods*
  • Periodicity
  • Regression Analysis
  • Reproducibility of Results
  • Sensitivity and Specificity
  • Sequence Alignment / methods*
  • Sequence Analysis, DNA / methods*
  • Transcription, Genetic / genetics*
  • Transcription, Genetic / physiology
  • Yeasts / genetics
  • Yeasts / physiology