Objective: To examine the extent to which performance assessment methods affect the percentage of neonatal intensive care units (NICUs) and very low-birth-weight (VLBW) infants included in performance assessments, the distribution of NICU performance ratings, and the level of agreement in those ratings.
Design: Cross-sectional study based on risk-adjusted nosocomial infection rates.
Setting: NICUs belonging to the California Perinatal Quality Care Collaborative 2007-2008.
Participants: One hundred twenty-six California NICUs and 10 487 VLBW infants.
Main exposures: Three performance assessment choices: (1) excluding "low-volume" NICUs (those caring for <30 VLBW infants per year) vs a criterion based on confidence intervals, (2) using Bayesian vs frequentist hierarchical models, and (3) pooling data across 1 vs 2 years.
Main outcome measures: Proportion of NICUs and patients included in quality assessment, distribution of ratings for NICUs, and agreement between methods using the κ statistic.
Results: Depending on the methods applied, 51% to 85% of NICUs and 72% to 96% of VLBW infants were included in performance assessments, 76% to 87% of NICUs were considered "average," and the level of agreement between NICU ratings ranged from 0.23 to 0.89.
Conclusions: The percentage of NICUs included in performance assessments and their ratings can shift dramatically depending on performance measurement method. Physicians, payers, and policymakers should continue to closely examine which existing performance assessment methods are most appropriate for evaluating pediatric care quality.