Combining high-throughput sequencing with targeted sequence capture has become an attractive tool to study specific genomic regions of interest. Most studies have so far focused on the exome using short-read technology. These approaches are not designed to capture intergenic regions needed to reconstruct genomic organization, including regulatory regions and gene synteny. Here, we demonstrate the power of combining targeted sequence capture with long-read sequencing technology for comparative genomic analyses of the haemoglobin (Hb) gene clusters across eight species separated by up to 70 million years. Guided by the reference genome assembly of the Atlantic cod (Gadus morhua) together with genome information from draft assemblies of selected codfishes, we designed probes covering the two Hb gene clusters. Use of custom-made barcodes combined with PacBio RSII sequencing led to highly continuous assemblies of the LA (~100 kb) and MN (~200 kb) clusters, which include syntenic regions of coding and intergenic sequences. Our results revealed an overall conserved genomic organization of the Hb genes within this lineage, yet with several, lineage-specific gene duplications. Moreover, for some of the species examined, we identified amino acid substitutions at two sites in the Hbb1 gene as well as length polymorphisms in its regulatory region, which has previously been linked to temperature adaptation in Atlantic cod populations. This study highlights the use of targeted long-read capture as a versatile approach for comparative genomic studies by generation of a cross-species genomic resource elucidating the evolutionary history of the Hb gene family across the highly divergent group of codfishes.
Keywords: Gadiformes; PacBio sequencing; codfishes; comparative genomics; targeted sequence capture; teleosts.
© 2018 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.