Comparison of 'pattern recognition' and logistic regression models for discrimination between benign and malignant pelvic masses: a prospective cross validation

Ultrasound Obstet Gynecol. 2001 Oct;18(4):357-65. doi: 10.1046/j.0960-7692.2001.00500.x.


Objectives: To test prospectively the diagnostic performance of two logistic regression models for calculation of individual risk of malignancy in adnexal tumors (the 'Tailor model' and the 'Timmerman model'), and to compare them to that of 'pattern recognition' (subjective evaluation of the gray-scale ultrasound image and color Doppler ultrasound examination).

Design: Consecutive women with a pelvic mass judged clinically to be of adnexal origin underwent preoperative ultrasound examination including color and spectral Doppler examination. The same examination techniques and definitions as those used in the studies in which the logistic regression models had been created were used. The Tailor model was tested in 133 women (35 of whom hada malignancy) and the Timmerman model in 82 women (29 of whom had a malignancy). A subset of 79 women (28 of whom had a malignancy) was used to compare the performance of the Tailor model and the Timmerman model by calculating and comparing the areas under the receiver operating characteristics curves of the two models. Sensitivity and specificity with regard to malignancy were calculated for all three methods.

Results: Pattern recognition performed better than the two logistic regression models (sensitivity around 85%, specificity around 90%). Using a risk of malignancy of > 50% to indicate malignancy (as suggested in the original publications), the sensitivity of the Tailor model was 69% and the specificity 88% (n = 133). The corresponding values for the Timmerman model were 62% and 79% (n = 82). The receiver operating characteristics curves showed the two logistic regression models to have similar diagnostic properties (area under the curve, 0.87 vs. 0.84; P = 0.25; n = 79). The diagnostic performance of the mathematical models was much poorer in this study than in those in which the models had been created.

Conclusion: The poor diagnostic performance of the mathematical models can probably be explained by subtle differences in definitions and examination technique and by differences between the original tumor populations and the study population. For mathematical models to be generally useful, they probably need to be created on the basis of a very large number of tumors, and the variables in the model must be unequivocally defined and the examination technique meticulously standardized.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenofibroma / diagnostic imaging
  • Adult
  • Aged
  • Aged, 80 and over
  • Cystadenocarcinoma, Papillary / diagnostic imaging
  • Dermoid Cyst / diagnostic imaging
  • Diagnosis, Differential
  • Female
  • Genital Diseases, Female / diagnosis*
  • Humans
  • Logistic Models
  • Middle Aged
  • Ovarian Neoplasms / diagnostic imaging*
  • Pattern Recognition, Automated*
  • Prospective Studies
  • ROC Curve
  • Sensitivity and Specificity
  • Ultrasonography, Doppler, Color*