Characterizing the Clinical and Genetic Spectrum of Polycystic Ovary Syndrome in Electronic Health Records

J Clin Endocrinol Metab. 2021 Jan 1;106(1):153-167. doi: 10.1210/clinem/dgaa675.


Context: Polycystic ovary syndrome (PCOS) is one of the leading causes of infertility, yet current diagnostic criteria are ineffective at identifying patients whose symptoms reside outside strict diagnostic criteria. As a result, PCOS is underdiagnosed and its etiology is poorly understood.

Objective: We aim to characterize the phenotypic spectrum of PCOS clinical features within and across racial and ethnic groups.

Methods: We developed a strictly defined PCOS algorithm (PCOSkeyword-strict) using the International Classification of Diseases, ninth and tenth revisions and keywords mined from clinical notes in electronic health records (EHRs) data. We then systematically relaxed the inclusion criteria to evaluate the change in epidemiological and genetic associations resulting in 3 subsequent algorithms (PCOScoded-broad, PCOScoded-strict, and PCOSkeyword-broad). We evaluated the performance of each phenotyping approach and characterized prominent clinical features observed in racially and ethnically diverse PCOS patients.

Results: The best performance came from the PCOScoded-strict algorithm, with a positive predictive value of 98%. Individuals classified as cases by this algorithm had significantly higher body mass index (BMI), insulin levels, free testosterone values, and genetic risk scores for PCOS, compared to controls. Median BMI was higher in African American females with PCOS compared to White and Hispanic females with PCOS.

Conclusions: PCOS symptoms are observed across a severity spectrum that parallels the continuous genetic liability to PCOS in the general population. Racial and ethnic group differences exist in PCOS symptomology and metabolic health across different phenotyping strategies.

Keywords: electronic health record; hormones; phenotyping; polycystic ovary syndrome; polygenic risk scores.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adolescent
  • Adult
  • Algorithms*
  • Case-Control Studies
  • Data Interpretation, Statistical
  • Data Mining / methods
  • Electronic Health Records* / statistics & numerical data
  • Ethnicity / genetics
  • Ethnicity / statistics & numerical data
  • Female
  • Genetic Predisposition to Disease / ethnology
  • Humans
  • Multifactorial Inheritance
  • Phenotype
  • Polycystic Ovary Syndrome* / diagnosis
  • Polycystic Ovary Syndrome* / ethnology
  • Polycystic Ovary Syndrome* / genetics
  • Predictive Value of Tests
  • Racial Groups / genetics
  • Racial Groups / statistics & numerical data
  • Risk Factors
  • Tennessee / epidemiology
  • Young Adult