Extracting Characteristics of the Study Subjects from Full-Text Articles

AMIA Annu Symp Proc. 2015 Nov 5;2015:484-91. eCollection 2015.


Characteristics of the subjects of biomedical research are important in determining if a publication describing the research is relevant to a search. To facilitate finding relevant publications, MEDLINE citations provide Medical Subject Headings that describe the subjects' characteristics, such as their species, gender, and age. We seek to improve the recommendation of these headings by the Medical Text Indexer (MTI) that supports manual indexing of MEDLINE. To that end, we explore the potential of the full text of the publications. Using simple recall-oriented rule-based methods we determined that adding sentences extracted from the methods sections and captions to the abstracts prior to MTI processing significantly improved recall and F1 score with only a slight drop in precision. Improvements were also achieved in directly assigning several headings extracted from the full text. These results indicate the need for further development of automated methods capable of leveraging the full text for indexing.

MeSH terms

  • Abstracting and Indexing / methods*
  • Algorithms
  • Animals
  • Biomedical Research / methods*
  • Data Mining*
  • Demography*
  • Humans
  • Information Storage and Retrieval / methods*
  • Information Storage and Retrieval / standards
  • Research Subjects*