Validity: on meaningful interpretation of assessment data

Med Educ. 2003 Sep;37(9):830-7. doi: 10.1046/j.1365-2923.2003.01594.x.


Context: All assessments in medical education require evidence of validity to be interpreted meaningfully. In contemporary usage, all validity is construct validity, which requires multiple sources of evidence; construct validity is the whole of validity, but has multiple facets. Five sources--content, response process, internal structure, relationship to other variables and consequences--are noted by the Standards for Educational and Psychological Testing as fruitful areas to seek validity evidence.

Purpose: The purpose of this article is to discuss construct validity in the context of medical education and to summarize, through example, some typical sources of validity evidence for a written and a performance examination.

Summary: Assessments are not valid or invalid; rather, the scores or outcomes of assessments have more or less evidence to support (or refute) a specific interpretation (such as passing or failing a course). Validity is approached as hypothesis and uses theory, logic and the scientific method to collect and assemble data to support or fail to support the proposed score interpretations, at a given point in time. Data and logic are assembled into arguments--pro and con--for some specific interpretation of assessment data. Examples of types of validity evidence, data and information from each source are discussed in the context of a high-stakes written and performance examination in medical education.

Conclusion: All assessments require evidence of the reasonableness of the proposed interpretation, as test data in education have little or no intrinsic meaning. The constructs purported to be measured by our assessments are important to students, faculty, administrators, patients and society and require solid scientific evidence of their meaning.

MeSH terms

  • Data Interpretation, Statistical
  • Education, Medical / methods*
  • Educational Measurement / methods
  • Humans
  • Reproducibility of Results