Predicting breast cancer risk using interacting genetic and demographic factors and machine learning

Hamid Behravan; Jaana M Hartikainen; Maria Tengström; Veli-Matti Kosma; Arto Mannermaa

doi:10.1038/s41598-020-66907-9

Predicting breast cancer risk using interacting genetic and demographic factors and machine learning

Sci Rep. 2020 Jul 6;10(1):11044. doi: 10.1038/s41598-020-66907-9.

Authors

Hamid Behravan¹, Jaana M Hartikainen², Maria Tengström^{3

4}, Veli-Matti Kosma^#^{2

5}, Arto Mannermaa^#^{2

5}

Affiliations

¹ Institute of Clinical Medicine, Pathology and Forensic Medicine, and Translational Cancer Research Area, University of Eastern Finland, P.O. Box 1627, FI-70211, Kuopio, Finland. hamid.behravan@uef.fi.
² Institute of Clinical Medicine, Pathology and Forensic Medicine, and Translational Cancer Research Area, University of Eastern Finland, P.O. Box 1627, FI-70211, Kuopio, Finland.
³ Institute of Clinical Medicine, Oncology, University of Eastern Finland, P.O. Box 1627, FI-70211, Kuopio, Finland.
⁴ Cancer Center, Kuopio University Hospital, Kuopio, P.O. Box 100, FI-70029, Kuopio, Finland.
⁵ Biobank of Eastern Finland, Kuopio University Hospital, Kuopio, Finland.

^# Contributed equally.

Abstract

Breast cancer (BC) is a multifactorial disease and the most common cancer in women worldwide. We describe a machine learning approach to identify a combination of interacting genetic variants (SNPs) and demographic risk factors for BC, especially factors related to both familial history (Group 1) and oestrogen metabolism (Group 2), for predicting BC risk. This approach identifies the best combinations of interacting genetic and demographic risk factors that yield the highest BC risk prediction accuracy. In tests on the Kuopio Breast Cancer Project (KBCP) dataset, our approach achieves a mean average precision (mAP) of 77.78 in predicting BC risk by using interacting genetic and Group 1 features, which is better than the mAPs of 74.19 and 73.65 achieved using only Group 1 features and interacting SNPs, respectively. Similarly, using interacting genetic and Group 2 features yields a mAP of 78.00, which outperforms the system based on only Group 2 features, which has a mAP of 72.57. Furthermore, the gene interaction maps built from genes associated with SNPs that interact with demographic risk factors indicate important BC-related biological entities, such as angiogenesis, apoptosis and oestrogen-related networks. The results also show that demographic risk factors are individually more important than genetic variants in predicting BC risk.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Breast Neoplasms / etiology*
Breast Neoplasms / genetics*
Databases, Factual
Databases, Genetic
Demography
Epistasis, Genetic
Female
Genetic Predisposition to Disease
Humans
Machine Learning*
Polymorphism, Single Nucleotide
Risk Factors