Analysis of Population Differences in Digital Conversations About Cancer Clinical Trials: Advanced Data Mining and Extraction Study

JMIR Cancer. 2021 Sep 23;7(3):e25621. doi: 10.2196/25621.

Abstract

Background: Racial and ethnic diversity in clinical trials for cancer treatment is essential for the development of treatments that are effective for all patients and for identifying potential differences in toxicity between different demographics. Mining of social media discussions about clinical trials has been used previously to identify patient barriers to enrollment in clinical trials; however, a comprehensive breakdown of sentiments and barriers by various racial and ethnic groups is lacking.

Objective: The aim of this study is to use an innovative methodology to analyze web-based conversations about cancer clinical trials and to identify and compare conversation topics, barriers, and sentiments between different racial and ethnic populations.

Methods: We analyzed 372,283 web-based conversations about cancer clinical trials, of which 179,339 (48.17%) of the discussions had identifiable race information about the individual posting the conversations. Using sophisticated machine learning software and analyses, we were able to identify key sentiments and feelings, topics of interest, and barriers to clinical trials across racial groups. The stage of treatment could also be identified in many of the discussions, allowing for a unique insight into how the sentiments and challenges of patients change throughout the treatment process for each racial group.

Results: We observed that only 4.01% (372,283/9,284,284) of cancer-related discussions referenced clinical trials. Within these discussions, topics of interest and identified clinical trial barriers discussed by all racial and ethnic groups throughout the treatment process included health care professional interactions, cost of care, fear, anxiety and lack of awareness, risks, treatment experiences, and the clinical trial enrollment process. Health care professional interactions, cost of care, and enrollment processes were notably discussed more frequently in minority populations. Other minor variations in the frequency of discussion topics between ethnic and racial groups throughout the treatment process were identified.

Conclusions: This study demonstrates the power of digital search technology in health care research. The results are also valuable for identifying the ideal content and timing for the delivery of clinical trial information and resources for different racial and ethnic groups.

Keywords: cancer; clinical trials; data mining; health care disparities; health communication; natural language processing; race and ethnicity; social media; text extraction.