Gender Stereotypes in Natural Language: Word Embeddings Show Robust Consistency Across Child and Adult Language Corpora of More Than 65 Million Words

Psychol Sci. 2021 Feb;32(2):218-240. doi: 10.1177/0956797620963619. Epub 2021 Jan 5.


Stereotypes are associations between social groups and semantic attributes that are widely shared within societies. The spoken and written language of a society affords a unique way to measure the magnitude and prevalence of these widely shared collective representations. Here, we used word embeddings to systematically quantify gender stereotypes in language corpora that are unprecedented in size (65+ million words) and scope (child and adult conversations, books, movies, TV). Across corpora, gender stereotypes emerged consistently and robustly for both theoretically selected stereotypes (e.g., work-home) and comprehensive lists of more than 600 personality traits and more than 300 occupations. Despite underlying differences across language corpora (e.g., time periods, formats, age groups), results revealed the pervasiveness of gender stereotypes in every corpus. Using gender stereotypes as the focal issue, we unite 19th-century theories of collective representations and 21st-century evidence on implicit social cognition to understand the subtle yet persistent presence of collective representations in language.

Keywords: collective representations; gender stereotypes; machine learning; natural-language processing; open data; open materials; word embeddings.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Child
  • Family
  • Humans
  • Language*
  • Natural Language Processing*
  • Semantics