Breast cancer (BC) is the most common malignancy worldwide and neoadjuvant therapy (NAT) plays an important role in the treatment of patients with early BC. However, only a subset of BC patients can achieve pathological complete response (pCR) and benefit from NAT. It is therefore necessary to predict the responses to NAT. Although many models to predict the response to NAT based on gene expression determined by the microarray platform have been proposed, their applications in clinical practice are limited due to the data normalization methods during model building and the disadvantages of the microarray platform compared with the RNA-seq platform. In this study, we first reconfirmed the correlation between immune profiles and pCR in an RNA-seq dataset. Then, we employed multiple machine learning algorithms and a model stacking strategy to build an immunological gene based model (Ipredictor model) and an immunological gene and receptor status based model ICpredictor model) in the RNA-seq dataset. The areas under the receiver operator characteristic curves for the Ipredictor model and ICpredictor models were 0.745 and 0.769 in an independent external test set based on the RNA-seq platform, and were 0.716 and 0.752 in another independent external test set based on the microarray platform. Furthermore, we found that the predictive score of the Ipredictor model was correlated with immune microenvironment and genomic aberration markers. These results demonstrated that the models can accurately predict the response to NAT for BC patients and will contribute to individualized therapy.
Keywords: breast cancer; immunological gene; machine learning; neoadjuvant therapy; pathological complete response.
Copyright © 2022 Chen, Hao, Qian, Lin, Pan and Han.