Certain cell types function as factories, secreting large quantities of one or more proteins that are central to the physiology of the respective organ. Examples include surfactant proteins in lung alveoli, albumin in liver parenchyma, and lipase in the stomach lining. Whole-genome sequencing analysis of lung adenocarcinomas revealed noncoding somatic mutational hotspots near VMP1/MIR21 and indel hotspots in surfactant protein genes (SFTPA1, SFTPB, and SFTPC). Extrapolation to other solid cancers demonstrated highly recurrent and tumor-type-specific indel hotspots targeting the noncoding regions of highly expressed genes defining certain secretory cellular lineages: albumin (ALB) in liver carcinoma, gastric lipase (LIPF) in stomach carcinoma, and thyroglobulin (TG) in thyroid carcinoma. The sequence contexts of indels targeting lineage-defining genes were significantly enriched in the AATAATD DNA motif and specific chromatin contexts, including H3K27ac and H3K36me3. Our findings illuminate a prevalent and hitherto unrecognized mutational process linking cellular lineage and cancer.
Keywords: cancer cell of origin; cancer genomics; noncoding genetic variation; somatic mutational processes; statistical driver discovery; variant topography; whole-genome sequencing.
Copyright © 2017 Elsevier Inc. All rights reserved.