Identification and characterization of short tandem repeats in the Tibetan macaque genome based on resequencing data

Zool Res. 2018 Jul 18;39(4):291-300. doi: 10.24272/j.issn.2095-8137.2018.047. Epub 2018 Apr 11.

Abstract

The Tibetan macaque, which is endemic to China, is currently listed as a Near Endangered primate species by the International Union for Conservation of Nature (IUCN). Short tandem repeats (STRs) refer to repetitive elements of genome sequence that range in length from 1-6 bp. They are found in many organisms and are widely applied in population genetic studies. To clarify the distribution characteristics of genome-wide STRs and understand their variation among Tibetan macaques, we conducted a genome-wide survey of STRs with next-generation sequencing of five macaque samples. A total of 1 077 790 perfect STRs were mined from our assembly, with an N50 of 4 966 bp. Mono-nucleotide repeats were the most abundant, followed by tetra- and di-nucleotide repeats. Analysis of GC content and repeats showed consistent results with other macaques. Furthermore, using STR analysis software (lobSTR), we found that the proportion of base pair deletions in the STRs was greater than that of insertions in the five Tibetan macaque individuals (P<0.05, t-test). We also found a greater number of homozygous STRs than heterozygous STRs (P<0.05, t-test), with the Emei and Jianyang Tibetan macaques showing more heterozygous loci than Huangshan Tibetan macaques. The proportion of insertions and mean variation of alleles in the Emei and Jianyang individuals were slightly higher than those in the Huangshan individuals, thus revealing differences in STR allele size between the two populations. The polymorphic STR loci identified based on the reference genome showed good amplification efficiency and could be used to study population genetics in Tibetan macaques. The neighbor-joining tree classified the five macaques into two different branches according to their geographical origin, indicating high genetic differentiation between the Huangshan and Sichuan populations. We elucidated the distribution characteristics of STRs in the Tibetan macaque genome and provided an effective method for screening polymorphic STRs. Our results also lay a foundation for future genetic variation studies of macaques.

Keywords: Next-generation sequencing; Polymorphism; Short tandem repeats; Tibetan macaque (Macaca thibetana) genome; Variation analysis.

MeSH terms

  • Animals
  • Genetics, Population
  • Genome / genetics
  • High-Throughput Nucleotide Sequencing
  • Macaca / genetics*
  • Microsatellite Repeats / genetics*

Grants and funding

This research was supported by the State Key Program of National Natural Science Foundation of China (31530068), National Natural Science Foundation of China (31770415), and Sichuan Application Foundation Project (2015JY0268).