Population and subspecies diversity at mouse centromere satellites

BMC Genomics. 2021 Apr 17;22(1):279. doi: 10.1186/s12864-021-07591-5.

Abstract

Background: Mammalian centromeres are satellite-rich chromatin domains that execute conserved roles in kinetochore assembly and chromosome segregation. Centromere satellites evolve rapidly between species, but little is known about population-level diversity across these loci.

Results: We developed a k-mer based method to quantify centromere copy number and sequence variation from whole genome sequencing data. We applied this method to diverse inbred and wild house mouse (Mus musculus) genomes to profile diversity across the core centromere (minor) satellite and the pericentromeric (major) satellite repeat. We show that minor satellite copy number varies more than 10-fold among inbred mouse strains, whereas major satellite copy numbers span a 3-fold range. In contrast to widely held assumptions about the homogeneity of mouse centromere repeats, we uncover marked satellite sequence heterogeneity within single genomes, with diversity levels across the minor satellite exceeding those at the major satellite. Analyses in wild-caught mice implicate subspecies and population origin as significant determinants of variation in satellite copy number and satellite heterogeneity. Intriguingly, we also find that wild-caught mice harbor dramatically reduced minor satellite copy number and elevated satellite sequence heterogeneity compared to inbred strains, suggesting that inbreeding may reshape centromere architecture in pronounced ways.

Conclusion: Taken together, our results highlight the power of k-mer based approaches for probing variation across repetitive regions, provide an initial portrait of centromere variation across Mus musculus, and lay the groundwork for future functional studies on the consequences of natural genetic variation at these essential chromatin domains.

Keywords: Bioinformatics; CENP-A; Centromere; Evolution; Genetic diversity; Inbred mice; Mammalian genomics; Satellite DNA; Wild mice; k-mer.

MeSH terms

  • Animals
  • Centromere* / genetics
  • DNA, Satellite* / genetics
  • Mice
  • Mice, Inbred Strains
  • Repetitive Sequences, Nucleic Acid

Substances

  • DNA, Satellite