Assessment Question Characteristics Predict Medical Student Performance in General Pathology

Arch Pathol Lab Med. 2021 Jan 15. doi: 10.5858/arpa.2020-0624-OA. Online ahead of print.


Context.—: Evaluation of medical curricula includes appraisal of student assessments in order to encourage deeper learning approaches. General pathology is our institution's 4-week, first-year course covering universal disease concepts (inflammation, neoplasia, etc).

Objective.—: To compare types of assessment questions and determine which characteristics may predict student scores, degree of difficulty, and item discrimination.

Design.—: Item-level analysis was employed to categorize questions along the following variables: type (multiple choice question or matching answer), presence of clinical vignette (if so, whether simple or complex), presence of specimen image, information depth (simple recall or interpretation), knowledge density (first or second order), Bloom taxonomy level (1-3), and, for the final, subject familiarity (repeated concept and, if so, whether verbatim).

Results.—: Assessments comprised 3 quizzes and 1 final exam (total 125 questions), scored during a 3-year period (total 417 students) for a total 52 125 graded attempts. Overall, 44 890 attempts (86.1%) were correct. In multivariate analysis, question type emerged as the most significant predictor of student performance, degree of difficulty, and item discrimination, with multiple choice questions being significantly associated with lower mean scores (P = .004) and higher degree of difficulty (P = .02), but also, paradoxically, poorer discrimination (P = .002). The presence of a specimen image was significantly associated with better discrimination (P = .04), and questions requiring data interpretation (versus simple recall) were significantly associated with lower mean scores (P = .003) and a higher degree of difficulty (P = .046).

Conclusions.—: Assessments in medical education should comprise combinations of questions with various characteristics in order to encourage better student performance, but also obtain optimal degrees of difficulty and levels of item discrimination.