Three invariant Hi-C interaction patterns: Applications to genome assembly

Methods. 2018 Jun 1:142:89-99. doi: 10.1016/j.ymeth.2018.04.013. Epub 2018 Apr 22.

Abstract

Assembly of reference-quality genomes from next-generation sequencing data is a key challenge in genomics. Recently, we and others have shown that Hi-C data can be used to address several outstanding challenges in the field of genome assembly. This principle has since been developed in academia and industry, and has been used in the assembly of several major genomes. In this paper, we explore the central principles underlying Hi-C-based assembly approaches, by quantitatively defining and characterizing three invariant Hi-C interaction patterns on which these approaches can build: Intrachromosomal interaction enrichment, distance-dependent interaction decay and local interaction smoothness. Specifically, we evaluate to what degree each invariant pattern holds on a single locus level in different species, cell types and Hi-C map resolutions. We find that these patterns are generally consistent across species and cell types but are affected by sequencing depth, and that matrix balancing improves consistency of loci with all three invariant patterns. Finally, we overview current Hi-C-based assembly approaches in light of these invariant patterns and demonstrate how local interaction smoothness can be used to easily detect scaffolding errors in extremely sparse Hi-C maps. We suggest that simultaneously considering all three invariant patterns may lead to better Hi-C-based genome assembly methods.

Keywords: 3D genome; Computational biology; Genome assembly; Genome scaffolding; Genomics; Hi-C.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Chromosome Mapping / instrumentation
  • Chromosome Mapping / methods*
  • DNA / chemistry
  • DNA / genetics
  • Genome / genetics
  • High-Throughput Nucleotide Sequencing / instrumentation
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Imaging, Three-Dimensional / instrumentation
  • Imaging, Three-Dimensional / methods
  • Metagenomics / instrumentation
  • Metagenomics / methods*
  • Models, Genetic*
  • Models, Statistical
  • Molecular Imaging / instrumentation
  • Molecular Imaging / methods
  • Molecular Sequence Annotation / methods*
  • Nucleic Acid Conformation
  • Sequence Analysis, DNA / instrumentation
  • Sequence Analysis, DNA / methods

Substances

  • DNA