Regression as a method to predict copy numbers in comparative genomic hybridization studies on bacteria

Biom J. 2006 Apr;48(2):255-70. doi: 10.1002/bimj.200510208.

Abstract

Comparative genomic hybridizations (CGH) using microarrays are performed with bacteria in order to determine the level of genomic similarity between various strains. The microarrays applied in CGH experiments are constructed on the basis of the genome sequence of one strain, which is used as a control, or reference, in each experiment. A strain being compared with the known strain is called the unknown strain. The ratios of fluorescent intensities obtained from the spots on the microarrays can be used to determine which genes are divergent in the unknown strain, as well as to predict the copy number of actual genes in the unknown strain. In this paper, we focus on the prediction of gene copy number based on data from CGH experiments. We assumed a linear connection between the log2 of the copy number and the observed log2-ratios, then predictors based on the factor analysis model and the linear random model were proposed in an attempt to identify the copy numbers. These predictors were compared to using the ratio of the intensities directly. Simulations indicated that the proposed predictors improved the prediction of the copy number in most situations. The predictors were applied on CGH data obtained from experiments with Enterococcus faecalis strains in order to determine copy number of relevant genes in five different strains.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Chromosome Mapping / methods*
  • Computer Simulation
  • Data Interpretation, Statistical*
  • Gene Dosage / genetics*
  • Genome, Bacterial / genetics*
  • In Situ Hybridization, Fluorescence / methods*
  • Models, Genetic*
  • Models, Statistical
  • Regression Analysis