A linked-read approach to museomics: Higher quality de novo genome assemblies from degraded tissues

Mol Ecol Resour. 2020 Jul;20(4):856-870. doi: 10.1111/1755-0998.13155. Epub 2020 May 11.

Abstract

High-throughput sequencing technologies are a proposed solution for accessing the molecular data in historical specimens. However, degraded DNA combined with the computational demands of short-read assemblies has posed significant laboratory and bioinformatics challenges for de novo genome assembly. Linked-read or "synthetic long-read" sequencing technologies, such as 10× Genomics, may provide a cost-effective alternative solution to assemble higher quality de novo genomes from degraded tissue samples. Here, we compare assembly quality (e.g., genome contiguity and completeness, presence of orthogroups) between four new deer mouse (Peromyscus spp.) genomes assembled using linked-read technology and four published genomes assembled from a single shotgun library. At a similar price-point, these approaches produce vastly different assemblies, with linked-read assemblies having overall higher contiguity and completeness, measured by larger N50 values and greater number of genes assembled, respectively. As a proof-of-concept, we used annotated genes from the four Peromyscus linked-read assemblies and eight additional rodent taxa to generate a phylogeny, which reconstructed the expected relationships among species with 100% support. Although not without caveats, our results suggest that linked-read sequencing approaches are a viable option to build de novo genomes from degraded tissues, which may prove particularly valuable for taxa that are extinct, rare or difficult to collect.

Keywords: Peromyscus; 10× genomics; assembly quality; natural history collections.

MeSH terms

  • Animals
  • Computational Biology / methods
  • Gene Library
  • Genome / genetics*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing / methods*
  • Molecular Sequence Annotation / methods
  • Peromyscus / genetics*
  • Phylogeny
  • Sequence Analysis, DNA / methods