TAaCGH Suite for Detecting Cancer-Specific Copy Number Changes Using Topological Signatures

Entropy (Basel). 2022 Jun 29;24(7):896. doi: 10.3390/e24070896.


Copy number changes play an important role in the development of cancer and are commonly associated with changes in gene expression. Persistence curves, such as Betti curves, have been used to detect copy number changes; however, it is known these curves are unstable with respect to small perturbations in the data. We address the stability of lifespan and Betti curves by providing bounds on the distance between persistence curves of Vietoris-Rips filtrations built on data and slightly perturbed data in terms of the bottleneck distance. Next, we perform simulations to compare the predictive ability of Betti curves, lifespan curves (conditionally stable) and stable persistent landscapes to detect copy number aberrations. We use these methods to identify significant chromosome regions associated with the four major molecular subtypes of breast cancer: Luminal A, Luminal B, Basal and HER2 positive. Identified segments are then used as predictor variables to build machine learning models which classify patients as one of the four subtypes. We find that no single persistence curve outperforms the others and instead suggest a complementary approach using a suite of persistence curves. In this study, we identified new cytobands associated with three of the subtypes: 1q21.1-q25.2, 2p23.2-p16.3, 23q26.2-q28 with the Basal subtype, 8p22-p11.1 with Luminal B and 2q12.1-q21.1 and 5p14.3-p12 with Luminal A. These segments are validated by the TCGA BRCA cohort dataset except for those found for Luminal A.

Keywords: Betti curves; CNA copy number aberrations; breast cancer molecular subtypes; genomics; lifespan curves; persistence landscapes; topological data analysis.