Motivation: The Affymetrix GeneChip microarray is currently providing a high-density and economical platform for discovery of genetic polymorphisms. Microarray data for single feature polymorphism (SFP) detection in recombinant inbred lines (RILs) can capitalize on the high level of replication available for each locus in the RIL population. It was suggested that the binding affinities from all of the RILs would form a multimodal distribution for a SFP. This motivated us to estimate the binding affinities from the robust multi-array analysis (RMA) method and formulate the SFP detection problem as a hypothesis testing problem, i.e. testing whether the underlying distribution of the estimated binding affinity (EBA) values of a probe is unimodal or multimodal.
Results: We developed a bootstrap-based hypothesis testing procedure using the 'dip' statistic. Our simulation studies show that the proposed procedure can reach satisfactory detection power with false discovery rate controlled at a desired level and is robust to the unimodal distribution assumption, which facilitates wide application of the proposed procedure. Our analysis of the real data identified more than four times the SFPs compared to the previous studies, covering 96% of their findings. The constructed genetic map using the SFP markers predicted from our procedure shows over 99% concordance of the genetic orders of these markers with their known physical locations on the genome sequence.
Availability: The R package 'dipSFP' can be downloaded from http://sites.google.com/a/bioinformatics.ucr.edu/xinping-cui/home/software.