Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English
- PMID: 19897807
- DOI: 10.3758/BRM.41.4.977
Moving beyond Kucera and Francis: a critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English
Abstract
Word frequency is the most important variable in research on word processing and memory. Yet, the main criterion for selecting word frequency norms has been the availability of the measure, rather than its quality. As a result, much research is still based on the old Kucera and Francis frequency norms. By using the lexical decision times of recently published megastudies, we show how bad this measure is and what must be done to improve it. In particular, we investigated the size of the corpus, the language register on which the corpus is based, and the definition of the frequency measure. We observed that corpus size is of practical importance for small sizes (depending on the frequency of the word), but not for sizes above 16-30 million words. As for the language register, we found that frequencies based on television and film subtitles are better than frequencies based on written sources, certainly for the monosyllabic and bisyllabic words used in psycholinguistic research. Finally, we found that lemma frequencies are not superior to word form frequencies in English and that a measure of contextual diversity is better than a measure based on raw frequency of occurrence. Part of the superiority of the latter is due to the words that are frequently used as names. Assembling a new frequency norm on the basis of these considerations turned out to predict word processing times much better than did the existing norms (including Kucera & Francis and Celex). The new SUBTL frequency norms from the SUBTLEX(US) corpus are freely available for research purposes from http://brm.psychonomic-journals.org/content/supplemental, as well as from the University of Ghent and Lexique Web sites.
Similar articles
-
SUBTLEX-NL: a new measure for Dutch word frequency based on film subtitles.Behav Res Methods. 2010 Aug;42(3):643-50. doi: 10.3758/BRM.42.3.643. Behav Res Methods. 2010. PMID: 20805586
-
Do the effects of subjective frequency and age of acquisition survive better word frequency norms?Q J Exp Psychol (Hove). 2011 Mar;64(3):545-59. doi: 10.1080/17470218.2010.503374. Epub 2010 Aug 9. Q J Exp Psychol (Hove). 2011. PMID: 20700859
-
SUBTLEX-UK: a new and improved word frequency database for British English.Q J Exp Psychol (Hove). 2014;67(6):1176-90. doi: 10.1080/17470218.2013.850521. Epub 2014 Jan 13. Q J Exp Psychol (Hove). 2014. PMID: 24417251
-
The word frequency effect: a review of recent developments and implications for the choice of frequency estimates in German.Exp Psychol. 2011;58(5):412-24. doi: 10.1027/1618-3169/a000123. Exp Psychol. 2011. PMID: 21768069 Review.
-
Index of norms and ratings published in the Psychonomic Society journals.Behav Res Methods Instrum Comput. 1999 Nov;31(4):659-67. doi: 10.3758/bf03200742. Behav Res Methods Instrum Comput. 1999. PMID: 10633981 Review.
Cited by
-
Better Together: Integrating Multivariate with Univariate Methods, and MEG with EEG to Study Language Comprehension.Lang Cogn Neurosci. 2024;39(8):991-1019. doi: 10.1080/23273798.2023.2223783. Epub 2023 Jun 12. Lang Cogn Neurosci. 2024. PMID: 39444757 Free PMC article.
-
Orthographic neighborhood density modulates the size of transposed-letter priming effects.Cogn Affect Behav Neurosci. 2021 Oct;21(5):948-959. doi: 10.3758/s13415-021-00905-w. Epub 2021 May 6. Cogn Affect Behav Neurosci. 2021. PMID: 33954926
-
Neural speech recognition: continuous phoneme decoding using spatiotemporal representations of human cortical activity.J Neural Eng. 2016 Oct;13(5):056004. doi: 10.1088/1741-2560/13/5/056004. Epub 2016 Aug 3. J Neural Eng. 2016. PMID: 27484713 Free PMC article.
-
The influence of 2-hop network density on spoken word recognition.Psychon Bull Rev. 2017 Apr;24(2):496-502. doi: 10.3758/s13423-016-1103-9. Psychon Bull Rev. 2017. PMID: 27383618
-
Automatic Assessment of Language Ability in Children with and without Typical Development.Annu Int Conf IEEE Eng Med Biol Soc. 2020 Jul;2020:6111-6114. doi: 10.1109/EMBC44109.2020.9175264. Annu Int Conf IEEE Eng Med Biol Soc. 2020. PMID: 33019365 Free PMC article.
MeSH terms
LinkOut - more resources
Research Materials