Stereotypes are associations between social groups and semantic attributes that are widely shared within societies. The spoken and written language of a society affords a unique way to measure the magnitude and prevalence of these widely shared collective representations. Here, we used word embeddings to systematically quantify gender stereotypes in language corpora that are unprecedented in size (65+ million words) and scope (child and adult conversations, books, movies, TV). Across corpora, gender stereotypes emerged consistently and robustly for both theoretically selected stereotypes (e.g., work-home) and comprehensive lists of more than 600 personality traits and more than 300 occupations. Despite underlying differences across language corpora (e.g., time periods, formats, age groups), results revealed the pervasiveness of gender stereotypes in every corpus. Using gender stereotypes as the focal issue, we unite 19th-century theories of collective representations and 21st-century evidence on implicit social cognition to understand the subtle yet persistent presence of collective representations in language.
Keywords: collective representations; gender stereotypes; machine learning; natural-language processing; open data; open materials; word embeddings.