Background: Combined whole-genome sequencing (WGS) and RNA sequencing of cancers offer the opportunity to identify genes with altered expression due to genomic rearrangements. Somatic structural variants (SVs), as identified by WGS, can involve altered gene cis-regulation, gene fusions, copy number alterations, or gene disruption. The absence of computational tools to streamline integrative analysis steps may represent a barrier in identifying genes recurrently altered by genomic rearrangement.
Results: Here, we introduce SVExpress, a set of tools for carrying out integrative analysis of SV and gene expression data. SVExpress enables systematic cataloging of genes that consistently show increased or decreased expression in conjunction with the presence of nearby SV breakpoints. SVExpress can evaluate breakpoints in proximity to genes for potential enhancer translocation events or disruption of topologically associated domains, two mechanisms by which SVs may deregulate genes. The output from any commonly used SV calling algorithm may be easily adapted for use with SVExpress. SVExpress can readily analyze genomic datasets involving hundreds of cancer sample profiles. Here, we used SVExpress to analyze SV and expression data across 327 cancer cell lines with combined SV and expression data in the Cancer Cell Line Encyclopedia (CCLE). In the CCLE dataset, hundreds of genes showed altered gene expression in relation to nearby SV breakpoints. Altered genes involved TAD disruption, enhancer hijacking, and gene fusions. When comparing the top set of SV-altered genes from cancer cell lines with the top SV-altered genes previously reported for human tumors from The Cancer Genome Atlas and the Pan-Cancer Analysis of Whole Genomes datasets, a significant number of genes overlapped in the same direction for both cell lines and tumors, while some genes were significant for cell lines but not for human tumors and vice versa.
Conclusion: Our SVExpress tools allow computational biologists with a working knowledge of R to integrate gene expression with SV breakpoint data to identify recurrently altered genes. SVExpress is freely available for academic or commercial use at https://github.com/chadcreighton/SVExpress . SVExpress is implemented as a set of Excel macros and R code. All source code (R and Visual Basic for Applications) is available.
Keywords: CCLE; Cancer; Data integration; Genomic rearrangement; Multiplatform analysis; Structural variation; Whole genome sequencing.