Toward an understanding of situational judgment item validity and group differences

J Appl Psychol. 2011 Mar;96(2):327-36. doi: 10.1037/a0021983.


This paper evaluates 2 adjustments to common scoring approaches for situational judgment tests (SJTs). These adjustments can result in substantial improvements to item validity, reductions in mean racial differences, and resistance to coaching designed to improve scores. The first adjustment, applicable to SJTs that use Likert scales, controls for elevation and scatter (Cronbach & Gleser, 1953). This adjustment improves item validity. Also, because there is a White-Black mean difference in the preference for extreme responses on Likert scales (Bachman & O'Malley, 1984), these adjustments substantially reduce White-Black mean score differences. Furthermore, this adjustment often eliminates the score elevation associated with the coaching strategy of avoiding extreme responses (Cullen, Sackett, & Lievens, 2006). Item validity is shown to have a U-shaped relationship with item means. This holds both for SJTs with Likert score response formats and for SJTs where respondents identify the best and worst response option. Given the U-shaped relationship, the second adjustment is to drop items with midrange item means. This permits the SJT to be shortened, sometimes dramatically, without necessarily harming validity.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • African Americans / psychology*
  • Comprehension
  • European Continental Ancestry Group / psychology*
  • Humans
  • Job Application*
  • Judgment*
  • Psychometrics
  • Reproducibility of Results