Scanning window analysis of non-coding regions within normal-tumor whole-genome sequence samples

Brief Bioinform. 2021 May 20;22(3):bbaa203. doi: 10.1093/bib/bbaa203.

Abstract

Genomics has benefited from an explosion in affordable high-throughput technology for whole-genome sequencing. The regulatory and functional aspects in non-coding regions may be an important contributor to oncogenesis. Whole-genome tumor-normal paired alignments were used to examine the non-coding regions in five cancer types and two races. Both a sliding window and a binning strategy were introduced to uncover areas of higher than expected variation for additional study. We show that the majority of cancer associated mutations in 154 whole-genome sequences covering breast invasive carcinoma, colon adenocarcinoma, kidney renal papillary cell carcinoma, lung adenocarcinoma and uterine corpus endometrial carcinoma cancers and two races are found outside of the coding region (4 432 885 in non-gene regions versus 1 412 731 in gene regions). A pan-cancer analysis found significantly mutated windows (292 to 3881 in count) demonstrating that there are significant numbers of large mutated regions in the non-coding genome. The 59 significantly mutated windows were found in all studied races and cancers. These offer 16 regions ripe for additional study within 12 different chromosomes-2, 4, 5, 7, 10, 11, 16, 18, 20, 21 and X. Many of these regions were found in centromeric locations. The X chromosome had the largest set of universal windows that cluster almost exclusively in Xq11.1-an area linked to chromosomal instability and oncogenesis. Large consecutive clusters (super windows) were found (19 to 114 in count) providing further evidence that large mutated regions in the genome are influencing cancer development. We show remarkable similarity in highly mutated non-coding regions across both cancer and race.

Keywords: cancer; cancer hotspots; non-coding region; pan-cancer analysis; whole-genome sequencing.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Breast Neoplasms / genetics
  • Centromere / genetics
  • Chromosome Mapping / methods
  • Colonic Neoplasms / genetics
  • Endometrial Neoplasms / genetics
  • Female
  • Genome, Human / genetics*
  • Genomics / methods*
  • Humans
  • Kidney Neoplasms / genetics
  • Lung Neoplasms / genetics
  • Mutation*
  • Neoplasms / classification
  • Neoplasms / genetics*
  • Open Reading Frames / genetics
  • Polymorphism, Single Nucleotide / genetics
  • Reproducibility of Results
  • Whole Genome Sequencing / methods*