Background: Various statistical methods are commonly used to assess the accuracy of near-continuous glucose sensors. The performance and reliability of these methods have not been well described.
Methods: We used computer simulation to describe the behavior of several statistical measures including error grid analysis, receiver operating characteristics, correlation, and repeated measures under varying conditions. Actual data from an inpatient accuracy study conducted by the Diabetes Research in Children Network (DirecNet) were also used to demonstrate these limitations.
Results: Sensors that were made artificially inaccurate by randomly shuffling the pairings to reference values still fell in Zone A or B 78% of the time for the Clarke grid and 79% of the time for the modified grid. Area under the curve values for these shuffled pairs averaged 64% for hypoglycemia and 68% for hyperglycemia. Continuous error grid analysis resulted in 75% of shuffled pairs designated as "Accurate Readings" or "Benign Errors." Correlation analysis gave inconsistent results for sensors simulated to have identical accuracies with values ranging from 0.50 to 0.96. Simplistic repeated-measures analyses accounting for subject effects, but ignoring temporal correlation patterns substantially inflated the probability of falsely obtaining a statistically significant result. In simulations where the null hypothesis was correct, 23% of observed P values were <0.05 and 12% of observed P values were <0.01.
Conclusion: Commonly used statistical methods can give overly optimistic and/or inconsistent notions of sensor accuracy if results are not placed in proper context. Novel techniques are needed to assess the accuracy of near-continuous glucose sensors.