Term identification methods for consumer health vocabulary development

J Med Internet Res. 2007 Feb 28;9(1):e4. doi: 10.2196/jmir.9.1.e4.


Background: The development of consumer health information applications such as health education websites has motivated the research on consumer health vocabulary (CHV). Term identification is a critical task in vocabulary development. Because of the heterogeneity and ambiguity of consumer expressions, term identification for CHV is more challenging than for professional health vocabularies.

Objective: For the development of a CHV, we explored several term identification methods, including collaborative human review and automated term recognition methods.

Methods: A set of criteria was established to ensure consistency in the collaborative review, which analyzed 1893 strings. Using the results from the human review, we tested two automated methods-C-value formula and a logistic regression model.

Results: The study identified 753 consumer terms and found the logistic regression model to be highly effective for CHV term identification (area under the receiver operating characteristic curve = 95.5%).

Conclusions: The collaborative human review and logistic regression methods were effective for identifying terms for CHV development.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, N.I.H., Intramural

MeSH terms

  • Automation / methods
  • Cooperative Behavior
  • Health Education / methods*
  • Humans
  • Logistic Models
  • Models, Theoretical
  • ROC Curve
  • Vocabulary, Controlled*