Background: Microarray comparative genomic hybridization (aCGH) evaluates the distribution of genes of sequenced bacterial strains among unsequenced strains of the same or related species. As genomic sequences from multiple strains of the same species become available, multistrain microarrays are designed, containing spots for every unique gene in all sequenced strains. To perform two-color aCGH experiments with multistrain microarrays, the choice of control sample can be the genomic DNA of one strain or a mixture of all the strains used in the array design. This important problem has no universally accepted solution.
Results: We performed a comparative study of the two control sample options with a Streptococcus pneumoniae microarray designed with three fully sequenced strains. We separately hybridized two of these strains (R6 and G54) as test samples using the third strain alone (TIGR4) or a mixture of the three strains as control. We show that for both types of control it is advantageous to analyze spots in separate sets according to their expected control channel signal (5-15% AUC increase). Following this analysis, the use of a mix control leads to higher accuracies (5% increase). This enhanced performance is due to gains in sensitivity (21% increase, p = 0.001) that compensate minor losses in specificity (5% decrease, p = 0.014).
Conclusion: The use of a single strain control increases the error rate in genes that are part of the accessory genome, where more variation across unsequenced strains is expected, further justifying the use of the mix control.