Background: Fleas (Insecta: Siphonaptera) are small flightless parasites of birds and mammals; their blood-feeding can transmit many serious pathogens (i.e., the etiological agents of bubonic plague, endemic and murine typhus). The lack of flea genome assemblies has hindered research, especially comparisons to other disease vectors. Accordingly, we sequenced the genome of the cat flea, Ctenocephalides felis, an insect with substantial human health and veterinary importance across the globe.
Results: By combining Illumina and PacBio sequencing of DNA derived from multiple inbred female fleas with Hi-C scaffolding techniques, we generated a chromosome-level genome assembly for C. felis. Unexpectedly, our assembly revealed extensive gene duplication across the entire genome, exemplified by ~ 38% of protein-coding genes with two or more copies and over 4000 tRNA genes. A broad range of genome size determinations (433-551 Mb) for individual fleas sampled across different populations supports the widespread presence of fluctuating copy number variation (CNV) in C. felis. Similarly, broad genome sizes were also calculated for individuals of Xenopsylla cheopis (Oriental rat flea), indicating that this remarkable "genome-in-flux" phenomenon could be a siphonapteran-wide trait. Finally, from the C. felis sequence reads, we also generated closed genomes for two novel strains of Wolbachia, one parasitic and one symbiotic, found to co-infect individual fleas.
Conclusion: Rampant CNV in C. felis has dire implications for gene-targeting pest control measures and stands to complicate standard normalization procedures utilized in comparative transcriptomics analysis. Coupled with co-infection by novel Wolbachia endosymbionts-potential tools for blocking pathogen transmission-these oddities highlight a unique and underappreciated disease vector.
Keywords: Cat flea; Copy number variation; Ctenocephalides felis; Gene duplication; Genome; Hi-C assembly; PacBio sequencing; Parasitism; Wolbachia.