Calibration of the Finnish FRAX model was evaluated using a locally derived population-based cohort of postmenopausal women (n = 13,917). Hip fractures were observed from national register-based data and verified from radiological records. For a subpopulation of 11,182 women, there were enough data to calculate the fracture probabilities using the Finnish FRAX tool (without bone mineral density). A 10-year period prevalence of hip fractures to this subpopulation was 0.66 %. The expected numbers of hip fractures were significantly higher than the self reported ones (O/E ratio 0.46; 95 % CI 0.33-0.63), had a tendency to be greater than the observed ones (O/E ratio 0.83; 95 % CI 0.65-1.04), and calibration in terms of goodness-of-fit of absolute probabilities was questionable (P = 0.015). Strikingly, the 10-year period prevalence of hip fractures to the whole cohort was higher (0.84 %) than for the women with FRAX measurements (0.66 %). This was mainly the result of difference between people who had and who had not responded to postal enquiries (0.71 vs. 1.77 %, P < 0.0001). Self-reports missed to capture 38 % of all hip fractures in those who responded and about 45 % of hip fractures in women who had a FRAX estimate. The Finnish FRAX tool seems to provide appropriate discrimination for hip fracture risk, but caution is required in the interpretation of absolute risk, especially if used for population that may not be representing general population per se. Our study also showed that patients with no response had significantly higher hip fracture risk and that the use of purely self-reported hip fractures in calculations results in biased incidence and period prevalence estimates. Such important biases may remain unnoticed if there are no data from other sources available.