Optimized splitting of mixed-species RNA sequencing data

J Bioinform Comput Biol. 2022 Apr;20(2):2250001. doi: 10.1142/S0219720022500019. Epub 2022 Jan 6.


Gene expression studies using xenograft transplants or co-culture systems, usually with mixed human and mouse cells, have proven to be valuable to uncover cellular dynamics during development or in disease models. However, the mRNA sequence similarities among species presents a challenge for accurate transcript quantification. To identify optimal strategies for analyzing mixed-species RNA sequencing data, we evaluate both alignment-dependent and alignment-independent methods. Alignment of reads to a pooled reference index is effective, particularly if optimal alignments are used to classify sequencing reads by species, which are re-aligned with individual genomes, generating [Formula: see text] accuracy across a range of species ratios. Alignment-independent methods, such as convolutional neural networks, which extract the conserved patterns of sequences from two species, classify RNA sequencing reads with over 85% accuracy. Importantly, both methods perform well with different ratios of human and mouse reads. While non-alignment strategies successfully partitioned reads by species, a more traditional approach of mixed-genome alignment followed by optimized separation of reads proved to be the more successful with lower error rates.

Keywords: RNA sequencing; alignment; convolutional neural networks; xenograft.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, N.I.H., Extramural

MeSH terms

  • Animals
  • Base Sequence
  • High-Throughput Nucleotide Sequencing* / methods
  • Humans
  • Mice
  • RNA*
  • Sequence Alignment
  • Sequence Analysis, DNA / methods
  • Sequence Analysis, RNA / methods


  • RNA