Information extraction from full text scientific articles: where are the keywords?

BMC Bioinformatics. 2003 May 29:4:20. doi: 10.1186/1471-2105-4-20. Epub 2003 May 29.


Background: To date, many of the methods for information extraction of biological information from scientific articles are restricted to the abstract of the article. However, full text articles in electronic version, which offer larger sources of data, are currently available. Several questions arise as to whether the effort of scanning full text articles is worthy, or whether the information that can be extracted from the different sections of an article can be relevant.

Results: In this work we addressed those questions showing that the keyword content of the different sections of a standard scientific article (abstract, introduction, methods, results, and discussion) is very heterogeneous.

Conclusions: Although the abstract contains the best ratio of keywords per total of words, other sections of the article may be a better source of biologically relevant data.

Publication types

  • Evaluation Study

MeSH terms

  • Anatomy / classification
  • Anatomy / statistics & numerical data
  • Animal Population Groups / classification
  • Animals
  • Bacteria / classification
  • Genetics / classification
  • Genetics / statistics & numerical data
  • Humans
  • Information Storage and Retrieval / methods*
  • Information Storage and Retrieval / statistics & numerical data
  • Information Storage and Retrieval / trends*
  • Information Systems / standards
  • Information Systems / statistics & numerical data
  • Information Systems / trends*
  • Internet
  • Online Systems
  • Periodicals as Topic / statistics & numerical data
  • Periodicals as Topic / trends*
  • Plants / classification
  • Proteomics / classification
  • Proteomics / statistics & numerical data
  • Selection Bias
  • Species Specificity
  • Terminology as Topic*
  • Vocabulary, Controlled*