Determining the familial risk distribution of colorectal cancer: a data mining approach

Rowena Chau; Mark A Jenkins; Daniel D Buchanan; Driss Ait Ouakrim; Graham G Giles; Graham Casey; Steven Gallinger; Robert W Haile; Loic Le Marchand; Polly A Newcomb; Noralane M Lindor; John L Hopper; Aung Ko Win

doi:10.1007/s10689-015-9860-6

Determining the familial risk distribution of colorectal cancer: a data mining approach

Fam Cancer. 2016 Apr;15(2):241-51. doi: 10.1007/s10689-015-9860-6.

Authors

Rowena Chau¹, Mark A Jenkins¹, Daniel D Buchanan^{1

2}, Driss Ait Ouakrim¹, Graham G Giles^{1

3}, Graham Casey⁴, Steven Gallinger^{5

6}, Robert W Haile⁷, Loic Le Marchand⁸, Polly A Newcomb⁹, Noralane M Lindor¹⁰, John L Hopper¹, Aung Ko Win¹¹

Affiliations

¹ Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Level 3, 207 Bouverie Street, Parkville, VIC, 3010, Australia.
² Colorectal Oncogenomics Group, Genetic Epidemiology Laboratory, Department of Pathology, The University of Melbourne, Parkville, VIC, Australia.
³ Cancer Epidemiology Centre, The Cancer Council Victoria, Melbourne, Australia.
⁴ Department of Preventive Medicine, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
⁵ Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, ON, Canada.
⁶ Cancer Care Ontario, Toronto, ON, Canada.
⁷ Division of Oncology, Department of Medicine, Stanford University, Stanford, CA, USA.
⁸ University of Hawaii Cancer Center, Honolulu, HI, USA.
⁹ Cancer Prevention Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
¹⁰ Department of Health Science Research, Mayo Clinic Arizona, Scottsdale, AZ, USA.
¹¹ Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Level 3, 207 Bouverie Street, Parkville, VIC, 3010, Australia. awin@unimelb.edu.au.

Abstract

This study was aimed to characterize the distribution of colorectal cancer risk using family history of cancers by data mining. Family histories for 10,066 colorectal cancer cases recruited to population cancer registries of the Colon Cancer Family Registry were analyzed using a data mining framework. A novel index was developed to quantify familial cancer aggregation. Artificial neural network was used to identify distinct categories of familial risk. Standardized incidence ratios (SIRs) and corresponding 95% confidence intervals (CIs) of colorectal cancer were calculated for each category. We identified five major, and 66 minor categories of familial risk for developing colorectal cancer. The distribution the major risk categories were: (1) 7% of families (SIR = 7.11; 95% CI 6.65-7.59) had a strong family history of colorectal cancer; (2) 13% of families (SIR = 2.94; 95% CI 2.78-3.10) had a moderate family history of colorectal cancer; (3) 11% of families (SIR = 1.23; 95% CI 1.12-1.36) had a strong family history of breast cancer and a weak family history of colorectal cancer; (4) 9 % of families (SIR = 1.06; 95 % CI 0.96-1.18) had strong family history of prostate cancer and weak family history of colorectal cancer; and (5) 60% of families (SIR = 0.61; 95% CI 0.57-0.65) had a weak family history of all cancers. There is a wide variation of colorectal cancer risk that can be categorized by family history of cancer, with a strong gradient of colorectal cancer risk between the highest and lowest risk categories. The risk of colorectal cancer for people with the highest risk category of family history (7% of the population) was 12-times that for people in the lowest risk category (60%) of the population. Data mining was proven an effective approach for gaining insight into the underlying cancer aggregation patterns and for categorizing familial risk of colorectal cancer.

Keywords: Colorectal cancer; Data mining; Familial aggregation; Familial risk.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.
Research Support, U.S. Gov't, P.H.S.

MeSH terms

Aged
Breast Neoplasms / epidemiology
Colorectal Neoplasms / epidemiology*
Colorectal Neoplasms / genetics*
DNA-Binding Proteins / genetics
Data Mining / methods*
Female
Humans
Male
Middle Aged
Mismatch Repair Endonuclease PMS2 / genetics
MutL Protein Homolog 1 / genetics
MutS Homolog 2 Protein / genetics
Ontario / epidemiology
Pedigree
Prostatic Neoplasms / epidemiology
Registries
Risk Assessment
United States / epidemiology
Victoria / epidemiology

Substances

DNA-Binding Proteins
G-T mismatch-binding protein
MLH1 protein, human
PMS2 protein, human
MSH2 protein, human
Mismatch Repair Endonuclease PMS2
MutL Protein Homolog 1
MutS Homolog 2 Protein

Abstract

Publication types

MeSH terms

Substances

Grants and funding