Wavelet-Based Genomic Signal Processing for Centromere Identification and Hypothesis Generation

Front Genet. 2019 May 31:10:487. doi: 10.3389/fgene.2019.00487. eCollection 2019.

Abstract

Various 'omics data types have been generated for Populus trichocarpa, each providing a layer of information which can be represented as a density signal across a chromosome. We make use of genome sequence data, variants data across a population as well as methylation data across 10 different tissues, combined with wavelet-based signal processing to perform a comprehensive analysis of the signature of the centromere in these different data signals, and successfully identify putative centromeric regions in P. trichocarpa from these signals. Furthermore, using SNP (single nucleotide polymorphism) correlations across a natural population of P. trichocarpa, we find evidence for the co-evolution of the centromeric histone CENH3 with the sequence of the newly identified centromeric regions, and identify a new CENH3 candidate in P. trichocarpa.

Keywords: CENH3; DNA methylation; Populus trichocarpa centromeres; SNP density; co-evolution; data integration; wavelet transform.