Arthropod silk is vital to the evolutionary success of hundreds of thousands of species. The primary proteins in silks are often encoded by long, repetitive gene sequences. Until recently, sequencing and assembling these complex gene sequences has proven intractable given their repetitive structure. Here, using high-quality long-read sequencing, we show that there is extensive variation-both in terms of length and repeat motif order-between alleles of silk genes within individual arthropods. Further, this variation exists across two deep, independent origins of silk which diverged more than 500 Mya: the insect clade containing caddisflies and butterflies and spiders. This remarkable convergence in previously overlooked patterns of allelic variation across multiple origins of silk suggests common mechanisms for the generation and maintenance of structural protein-coding genes. Future genomic efforts to connect genotypes to phenotypes should account for such allelic variation.
Keywords: alleles; genomics; insects; long-read sequencing; silk.