Quantifying Sex Bias in Clinical Studies at Scale With Automated Data Extraction

JAMA Netw Open. 2019 Jul 3;2(7):e196700. doi: 10.1001/jamanetworkopen.2019.6700.


Importance: Analyses of female representation in clinical studies have been limited in scope and scale.

Objective: To perform a large-scale analysis of global enrollment sex bias in clinical studies.

Design, setting, and participants: In this cross-sectional study, clinical studies from published articles from PubMed from 1966 to 2018 and records from Aggregate Analysis of ClinicalTrials.gov from 1999 to 2018 were identified. Global disease prevalence was determined for male and female patients in 11 disease categories from the Global Burden of Disease database: cardiovascular, diabetes, digestive, hepatitis (types A, B, C, and E), HIV/AIDS, kidney (chronic), mental, musculoskeletal, neoplasms, neurological, and respiratory (chronic). Machine reading algorithms were developed that extracted sex data from tables in articles and records on December 31, 2018, at an artificial intelligence research institute. Male and female participants in 43 135 articles (792 004 915 participants) and 13 165 records (12 977 103 participants) were included.

Main outcomes and measures: Sex bias was defined as the difference between the fraction of female participants in study participants minus prevalence fraction of female participants for each disease category. A total of 1000 bootstrap estimates of sex bias were computed by resampling individual studies with replacement. Sex bias was reported as mean and 95% bootstrap confidence intervals from articles and records in each disease category over time (before or during 1993 to 2018), with studies or participants as the measurement unit.

Results: There were 792 004 915 participants, including 390 470 834 female participants (49%), in articles and 12 977 103 participants, including 6 351 619 female participants (49%), in records. With studies as measurement unit, substantial female underrepresentation (sex bias ≤ -0.05) was observed in 7 of 11 disease categories, especially HIV/AIDS (mean for articles, -0.17 [95% CI, -0.18 to -0.16]), chronic kidney diseases (mean, -0.17 [95% CI, -0.17 to -0.16]), and cardiovascular diseases (mean, -0.14 [95% CI, -0.14 to -0.13]). Sex bias in articles for all categories combined was unchanged over time with studies as measurement unit (range, -0.15 [95% CI, -0.16 to -0.13] to -0.10 [95% CI, -0.14 to -0.06]), but improved from before or during 1993 (mean, -0.11 [95% CI, -0.16 to -0.05]) to 2014 to 2018 (mean, -0.05 [95% CI, -0.09 to -0.02]) with participants as the measurement unit. Larger study size was associated with greater female representation.

Conclusions and relevance: Automated extraction of the number of participants in clinical reports provides an effective alternative to manual analysis of demographic bias. Despite legal and policy initiatives to increase female representation, sex bias against female participants in clinical studies persists. Studies with more participants have greater female representation. Differences between sex bias estimates with studies vs participants as measurement unit, and between articles vs records, suggest that sex bias with both measures and data sources should be reported.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Clinical Decision Rules*
  • Clinical Studies as Topic* / standards
  • Clinical Studies as Topic* / statistics & numerical data
  • Cross-Sectional Studies
  • Dimensional Measurement Accuracy
  • Electronic Data Processing
  • Female
  • Humans
  • Information Storage and Retrieval / methods*
  • Male
  • Patient Selection*
  • PubMed / statistics & numerical data*
  • Sexism* / prevention & control
  • Sexism* / statistics & numerical data