Reconceptualizing the classification of PNAS articles

Proc Natl Acad Sci U S A. 2010 Dec 7;107(49):20899-904. doi: 10.1073/pnas.1013452107. Epub 2010 Nov 15.


PNAS article classification is rooted in long-standing disciplinary divisions that do not necessarily reflect the structure of modern scientific research. We reevaluate that structure using latent pattern models from statistical machine learning, also known as mixed-membership models, that identify semantic structure in co-occurrence of words in the abstracts and references. Our findings suggest that the latent dimensionality of patterns underlying PNAS research articles in the Biological Sciences is only slightly larger than the number of categories currently in use, but it differs substantially in the content of the categories. Further, the number of articles that are listed under multiple categories is only a small fraction of what it should be. These findings together with the sensitivity analyses suggest ways to reconceptualize the organization of papers published in PNAS.

MeSH terms

  • Classification
  • Methods
  • National Academy of Sciences, U.S.
  • Periodicals as Topic / classification*
  • Publications / classification*
  • Statistics as Topic
  • United States