Crowdsourced and AI-generated age-of-acquisition (AoA) norms for vocabulary in print: Extending the Kuperman et al. (2012) norms

Behav Res Methods. 2025 Oct 6;57(11):304. doi: 10.3758/s13428-025-02843-8.

Abstract

This paper revisits the age-of-acquisition (AoA) norms of Kuperman et al. (2012). Three studies were conducted. Study 1 reports a crowdsourcing 'megastudy' obtaining 790,024 estimates from participants with the age they could first read and write 11,074 early acquired words from Kuperman et al. (2012). The study aimed to differentiate between oral language receptive AoA and print-based AoA. The results correlate well with the original estimates, offering, as hypothesized, higher AoAs for reading/writing. These are released as supplements to the original norms. Study 2 explored the potential of large language models (LLMs), specifically GPT-4o, to replicate these crowdsourced AoA estimates. The findings indicated a strong correlation between AI-generated estimates and human judgments, showing the utility of AI in estimating AoA and developing norms for psycholinguistic and educational research in lieu of crowdsourcing. Study 3 leveraged AI to extend estimates to all well-known words in Kuperman et al. (2012) and the English Crowdsourcing Project (ECP). Study 3 also investigated a trained model fine-tuned on 2000 ratings from Kuperman et al. (2012). Fine-tuning increased alignment with human ratings, though comparisons with untrained models suggested that fine-tuning is not essential in English for obtaining useful AoA estimates. Both trained and untrained AI-generated norms correlated highly with human ratings and performed well in accounting for word processing times and accuracy in regressions. Uses and limitations of the AI estimates are discussed. All resources are made available in the Open Science Framework and can be used freely for research and education.

Keywords: AI; Age of acquisition; Crowdsourcing; Large language model; Vocabulary; Word norms.

MeSH terms

  • Adolescent
  • Adult
  • Child
  • Child, Preschool
  • Crowdsourcing* / methods
  • Female
  • Humans
  • Language Development*
  • Male
  • Psycholinguistics* / methods
  • Reading*
  • Vocabulary*
  • Young Adult