Utility of the 5-Minute Apgar Score as a Research Endpoint

Am J Epidemiol. 2019 Sep 1;188(9):1695-1704. doi: 10.1093/aje/kwz132.


Although Apgar scores are commonly used as proxy outcomes, little evidence exists in support of the most common cutpoints (<7, <4). We used 2 data sets to explore this issue: one contained planned community births from across the United States (n = 52,877; 2012-2016), and the other contained hospital births from California (n = 428,877; 2010). We treated 5-minute Apgars as clinical "tests," compared against 18 known outcomes; we calculated sensitivity, specificity, positive and negative predictive values, and the area under the receiver operating characteristic curve for each. We used 3 different criteria to determine optimal cutpoints. Results were very consistent across data sets, outcomes, and all subgroups: The cutpoint that maximizes the trade-off between sensitivity and specificity is universally <9. However, extremely low positive predictive values for all outcomes at <9 indicate more misclassification than is acceptable for research. The areas under the receiver operating characteristic curves (which treat Apgars as quasicontinuous) were generally indicative of adequate discrimination between infants destined to experience poor outcomes and those not; comparing median Apgars between groups might be an analytical alternative to dichotomizing. Nonetheless, because Apgar scores are not clearly on any causal pathway of interest, we discourage researchers from using them unless the motivation for doing so is clear.

Keywords: Apgar score; ROC curve; infant health.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Apgar Score*
  • Area Under Curve
  • Biomedical Research*
  • Datasets as Topic
  • Epidemiologic Methods
  • Humans
  • Infant, Newborn
  • Infant, Newborn, Diseases / diagnosis*
  • Predictive Value of Tests
  • ROC Curve
  • Risk Factors
  • Sensitivity and Specificity