Grooming a CAT: customizing CAT administration rules to increase response efficiency in specific research and clinical settings

Qual Life Res. 2018 Sep;27(9):2403-2413. doi: 10.1007/s11136-018-1870-z. Epub 2018 May 5.


Purpose: To evaluate the degree to which applying alternative stopping rules would reduce response burden while maintaining score precision in the context of computer adaptive testing (CAT).

Data: Analyses were conducted on secondary data comprised of CATs administered in a clinical setting at multiple time points (baseline and up to two follow ups) to 417 study participants who had back pain (51.3%) and/or depression (47.0%). Participant mean age was 51.3 years (SD = 17.2) and ranged from 18 to 86. Participants tended to be white (84.7%), relatively well educated (77% with at least some college), female (63.9%), and married or living in a committed relationship (57.4%). The unit of analysis was individual assessment histories (i.e., CAT item response histories) from the parent study. Data were first aggregated across all individuals, domains, and time points in an omnibus dataset of assessment histories and then were disaggregated by measure for domain-specific analyses. Finally, assessment histories within a "clinically relevant range" (score ≥ 1 SD from the mean in direction of poorer health) were analyzed separately to explore score level-specific findings.

Method: Two different sets of CAT administration rules were compared. The original CAT (CATORIG) rules required at least four and no more than 12 items be administered. If the score standard error (SE) reached a value < 3 points (T score metric) before 12 items were administered, the CAT was stopped. We simulated applying alternative stopping rules (CATALT), removing the requirement that a minimum four items be administered, and stopped a CAT if responses to the first two items were both associated with best health, if the SE was < 3, if SE change < 0.1 (T score metric), or if 12 items were administered. We then compared score fidelity and response burden, defined as number of items administered, between CATORIG and CATALT.

Results: CATORIG and CATALT scores varied little, especially within the clinically relevant range, and response burden was substantially lower under CATALT (e.g., 41.2% savings in omnibus dataset).

Conclusions: Alternate stopping rules result in substantial reductions in response burden with minimal sacrifice in score precision.

Keywords: CAT stopping rules; Computer adaptive testing; PROMIS®; Response burden.

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Computers / statistics & numerical data*
  • Female
  • Humans
  • Male
  • Middle Aged
  • Psychometrics / methods*
  • Quality of Life / psychology*
  • Reproducibility of Results
  • Research / instrumentation*
  • Young Adult

Grant support