Comprehensive annotation of the Chinese tree shrew genome by large-scale RNA sequencing and long-read isoform sequencing

Zool Res. 2021 Nov 18;42(6):692-709. doi: 10.24272/j.issn.2095-8137.2021.272.


The Chinese tree shrew (Tupaia belangeri chinensis) is emerging as an important experimental animal in multiple fields of biomedical research. Comprehensive reference genome annotation for both mRNA and long non-coding RNA (lncRNA) is crucial for developing animal models using this species. In the current study, we collected a total of 234 high-quality RNA sequencing (RNA-seq) datasets and two long-read isoform sequencing (ISO-seq) datasets and improved the annotation of our previously assembled high-quality chromosome-level tree shrew genome. We obtained a total of 3 514 newly annotated coding genes and 50 576 lncRNA genes. We also characterized the tissue-specific expression patterns and alternative splicing patterns of mRNAs and lncRNAs and mapped the orthologous relationships among 11 mammalian species using the current annotated genome. We identified 144 tree shrew-specific gene families, including interleukin 6 (IL6) and STT3 oligosaccharyltransferase complex catalytic subunit B (STT3B), which underwent significant changes in size. Comparison of the overall expression patterns in tissues and pathways across four species (human, rhesus monkey, tree shrew, and mouse) indicated that tree shrews are more similar to primates than to mice at the tissue-transcriptome level. Notably, the newly annotated purine rich element binding protein A (PURA) gene and the STT3B gene family showed dysregulation upon viral infection. The updated version of the tree shrew genome annotation (KIZ version 3: TS_3.0) is available at and provides an essential reference for basic and biomedical studies using tree shrew animal models.

树鼩(Tupaia belangeri chinensis)在多个生物医学研究领域逐渐成为重要的实验动物。对树鼩参考基因组,包括mRNA以及长链非编码RNA (Long non-coding RNA, lncRNA)在内更加完整的注释,对树鼩动物模型的创建至关重要。在该研究中,我们收集了234个高质量的二代转录组(RNA sequencing, RNA-seq)数据集以及两个三代转录组(Long-read isoform sequencing, ISO-seq)数据集,来提高改善已报道的树鼩染色体级别参考基因组的注释质量。我们总共获得了3 514个新注释的编码基因和50 576个新注释的lncRNA基因。基于新的注释信息,我们鉴定了mRNA和lncRNA的组织特异性表达模式与组织特异性可变剪切模式,并对11种哺乳动物中基因的同源情况进行了分析。我们鉴定出144个树鼩特异性扩张的基因家族,包括白介素6(Interleukin 6, IL6)和STT3寡糖基转移酶复合物催化亚基B(STT3 oligosaccharyltransferase complex catalytic subunit B, STT3B),都在树鼩基因组中发生了扩张。我们还比较了四个物种(人类、猕猴、树鼩和小鼠)的组织表达模式与通路相关基因的表达模式,发现相比于小鼠,树鼩的组织表达模式与灵长类动物更为接近。值得注意的是,在该研究中新注释出的富嘌呤元件结合蛋白A(Purine rich element binding protein A, PURA)和STT3B基因家族,在病毒感染过程中呈现显著的差异表达。更新版本的树鼩基因组注释信息(KIZ version 3: TS_3.0)已经发布在。该注释信息有望为树鼩基础生物学和医学生物学模型等研究提供关键的参考信息。.

Keywords: Gene family; Genome annotation; Transcriptome; Tree shrew; Virus infection.

MeSH terms

  • Animals
  • Base Sequence
  • Genome*
  • Protein Isoforms
  • RNA, Long Noncoding / genetics
  • Sequence Analysis, RNA / methods
  • Sequence Analysis, RNA / veterinary*
  • Species Specificity
  • Tupaiidae / genetics*


  • Protein Isoforms
  • RNA, Long Noncoding