Skip to main page content
Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
, 3 (8), e3057

Random Drift Versus Selection in Academic Vocabulary: An Evolutionary Analysis of Published Keywords

Affiliations

Random Drift Versus Selection in Academic Vocabulary: An Evolutionary Analysis of Published Keywords

R Alexander Bentley. PLoS One.

Abstract

The evolution of vocabulary in academic publishing is characterized via keyword frequencies recorded in the ISI Web of Science citations database. In four distinct case-studies, evolutionary analysis of keyword frequency change through time is compared to a model of random copying used as the null hypothesis, such that selection may be identified against it. The case studies from the physical sciences indicate greater selection in keyword choice than in the social sciences. Similar evolutionary analyses can be applied to a wide range of phenomena; wherever the popularity of multiple items through time has been recorded, as with web searches, or sales of popular music and books, for example.

Conflict of interest statement

Competing Interests: The author has declared that no competing interests exist.

Figures

Figure 1
Figure 1. Keywords, total and new, among paradigms about (a) 10 and (b) 30 years old.
Social science cases are shown in red and physical sciences in black. Solid curves show the total number of keywords N per year, and the dashed curve shows number of new keywords introduced per year.
Figure 2
Figure 2. Cumulative turnover in the top 5 keywords.
Social science cases are shown in red and physical sciences in black. Turnover refers to words making a first appearance in the top 5. For the older paradigms (SS77; PS81), symbols are squares and the count begins at 1994, for the newer articles (PS99; SS98) symbols are circles and the count begins the year after publication.
Figure 3
Figure 3. Frequencies of the top 5 keywords of 2005.
Shown are the four paradigm case studies, including: (a) newer physical sciences (PS99); (b) newer social sciences (SS98); (c) older physical sciences (PS81); and (d) older social sciences (SS77). Logarithmic y-axes.
Figure 4
Figure 4. Cumulative frequency distributions of all keywords.
Double logarithmic axes. Open circles show distribution for 2001 and filled circles are for 2005. The paradigms are (a) newer physical sciences (PS99), (b) newer social sciences (SS98), (c) older physical sciences (PS81) and (d) older social sciences (SS77). Using the maximum likelihood method , the estimated power-law exponents for 2001 and 2005, respectively, are as follows: PS99: 2.11, 2.00; SS98: 2.11, 2.02; PS81: 2.09, 2.05; SS77: 2.18, 2.09. Errors (by jackknife estimate) on these exponents are <0.01.

Similar articles

See all similar articles

Cited by 8 PubMed Central articles

See all "Cited by" articles

References

    1. Guimerà R, Uzzi B, Amaral LAN. Team assembly mechanisms determine collaboration network structure and team performance. Science. 2005;308:697–702. - PMC - PubMed
    1. Bentley RA. Academic copying, archaeology and the English language. Antiquity. 2006;80:196–201.
    1. Simkin MV, Roychowdhury VP. Read before you cite! Complex Systems. 2003;14:269.
    1. Wuchty S, Jones BF, Uzzi B. The increasing dominance of teams in production of knowledge. Science. 2007;316:1036–1039. - PubMed
    1. Hayes B. The Britney Spears problem. American Scientist. 2008;96:274–279.
Feedback