Background: Atlantic cod (Gadus morhua) is a species with increasing economic significance for the aquaculture industry. The genetic improvement of cod will play a critical role in achieving successful large-scale aquaculture. While many microsatellite markers have been developed in cod, the number of single nucleotide polymorphisms (SNPs) is currently limited. Here we report the identification of SNPs from sequence data generated by a large-scale expressed sequence tag (EST) program, focusing on fish originating from Canadian waters.
Results: A total of 97976 ESTs were assembled to generate 13448 contigs. We detected 4753 SNPs that met our selection criteria (depth of coverage > or = 4 reads; minor allele frequency > 25%). 3072 SNPs were selected for testing. The percentage of successful assays was 75%, with 2291 SNPs amplifying correctly. Of these, 607 (26%) SNPs were monomorphic for all populations tested. In total, 64 (4%) of SNPs are likely to represent duplicated genes or highly similar members of gene families, rather than alternative alleles of the same gene, since they showed a high frequency of heterozygosity. The remaining polymorphic SNPs (1620) were categorised as validated SNPs. The mean minor allele frequency of the validated loci was 0.258 (+/- 0.141). Of the 1514 contigs from which validated SNPs were selected, 31% have a significant blast hit. For the SNPs predicted to occur in coding regions (141), we determined that 36% (51) are non-synonymous. Many loci (1033 SNPs; 64%) are polymorphic in all populations tested. However a small number of SNPs (184) that are polymorphic in the Western Atlantic were monomorphic in fish tested from three European populations. A preliminary linkage map has been constructed with 23 major linkage groups and 924 mapped SNPs.
Conclusions: These SNPs represent powerful tools to accelerate the genetic improvement of cod aquaculture. They have been used to build a genetic linkage map that can be applied to quantitative trait locus (QTL) discovery. Since these SNPs were generated from ESTs, they are linked to specific genes. Genes that map within QTL intervals can be prioritized for testing to determine whether they contribute to observed phenotypes.