Determination of sequence variation within a genetic locus to develop clinically relevant databases is critical for molecular assay design and clinical test interpretation, so multisample pooling for Illumina genome analyzer (GA) sequencing was investigated using the RET proto-oncogene as a model. Samples were Sanger-sequenced for RET exons 10, 11, and 13-16. Ten samples with 13 known unique variants ("singleton variants" within the pool) and seven common changes were amplified and then equimolar-pooled before sequencing on a single flow cell lane, generating 36 base reads. For comparison, a single "control" sample was run in a different lane. After alignment, a 24-base quality score-screening threshold and 3; read end trimming of three bases yielded low background error rates with a 27% decrease in aligned read coverage. Sequencing data were evaluated using an established variant detection method (percent variant reads), by the presented subtractive correction method, and with SNPSeeker software. In total, 41 variants (of which 23 were singleton variants) were detected in the 10 pool data, which included all Sanger-identified variants. The 23 singleton variants were detected near the expected 5% allele frequency (average 5.17%+/-0.90% variant reads), well above the highest background error (1.25%). Based on background error rates, read coverage, simulated 30, 40, and 50 sample pool data, expected singleton allele frequencies within pools, and variant detection methods; >or=30 samples (which demonstrated a minimum 1% variant reads for singletons) could be pooled to reliably detect singleton variants by GA sequencing.
Keywords: Illumina; RET; massively parallel sequencing; next generation sequencing.