We systematically reviewed the literature on the performance of osteoporosis absolute fracture risk assessment instruments. Relatively few studies have evaluated the calibration of instruments in populations separate from their development cohorts, and findings are mixed. Many studies had methodological limitations making susceptibility to bias a concern.
Introduction: The aim of this study was to systematically review the literature on the performance of osteoporosis clinical fracture risk assessment instruments for predicting absolute fracture risk, or calibration, in populations other than their derivation cohorts.
Methods: We performed a systematic review, and MEDLINE, Embase, Cochrane Library, and multiple other literature sources were searched. Inclusion and exclusion criteria were applied and data extracted, including information about study participants, study design, potential sources of bias, and predicted and observed fracture probabilities.
Results: A total of 19,949 unique records were identified for review. Fourteen studies met inclusion criteria. There was substantial heterogeneity among included studies. Six studies assessed the WHO's Fracture Risk Assessment (FRAX) instrument in five separate cohorts, and a variety of risk assessment instruments were evaluated in the remainder of the studies. Approximately half found good instrument calibration, with observed fracture probabilities being close to predicted probabilities for different risk categories. Studies that assessed the calibration of FRAX found mixed performance in different populations. A similar proportion of studies that evaluated simple risk assessment instruments (≤5 variables) found good calibration when compared with studies that assessed complex instruments (>5 variables). Many studies had methodological features making them susceptible to bias.
Conclusions: Few studies have evaluated the performance or calibration of osteoporosis fracture risk assessment instruments in populations separate from their development cohorts. Findings are mixed, and many studies had methodological limitations making susceptibility to bias a possibility, raising concerns about use of these tools outside of the original derivation cohorts. Further studies are needed to assess the calibration of instruments in different populations prior to widespread use.