Background: Worldwide, the genus Haliotis is represented by 56 extant species and several of these are commercially cultured. Among the six abalone species found in South Africa, Haliotis midae is the only aquacultured species. Despite its economic importance, genomic sequence resources for H. midae, and for abalone in general, are still scarce. Next generation sequencing technologies provide a fast and efficient tool to generate large sequence collections that can be used to characterize the transcriptome and identify expressed genes associated with economically important traits like growth and disease resistance.
Results: More than 25 million short reads generated by the Illumina Genome Analyzer were de novo assembled in 22,761 contigs with an average size of 260 bp. With a stringent E-value threshold of 10-10, 3,841 contigs (16.8%) had a BLAST homologous match against the Genbank non-redundant (NR) protein database. Most of these sequences were annotated using the gene ontology (GO) and eukaryotic orthologous groups of proteins (KOG) databases and assigned to various functional categories. According to annotation results, many gene families involved in immune response were identified. Thousands of simple sequence repeats (SSR) and single nucleotide polymorphisms (SNP) were detected. Setting stringent parameters to ensure a high probability of amplification, 420 primer pairs in 181 contigs containing SSR loci were designed.
Conclusion: This data represents the most comprehensive genomic resource for the South African abalone H. midae to date. The amount of assembled sequences demonstrated the utility of the Illumina sequencing technology in the transcriptome characterization of a non-model species. It allowed the development of several markers and the identification of promising candidate genes for future studies on population and functional genomics in H. midae and in other abalone species.