Natural language processing: an introduction

J Am Med Inform Assoc. 2011 Sep-Oct;18(5):544-51. doi: 10.1136/amiajnl-2011-000464.


Objectives: To provide an overview and tutorial of natural language processing (NLP) and modern NLP-system design.

Target audience: This tutorial targets the medical informatics generalist who has limited acquaintance with the principles behind NLP and/or limited knowledge of the current state of the art.

Scope: We describe the historical evolution of NLP, and summarize common NLP sub-problems in this extensive field. We then provide a synopsis of selected highlights of medical NLP efforts. After providing a brief description of common machine-learning approaches that are being used for diverse NLP sub-problems, we discuss how modern NLP architectures are designed, with a summary of the Apache Foundation's Unstructured Information Management Architecture. We finally consider possible future directions for NLP, and reflect on the possible impact of IBM Watson on the medical field.

Publication types

  • Research Support, N.I.H., Extramural
  • Review

MeSH terms

  • Humans
  • Information Management
  • Information Storage and Retrieval
  • Medical Informatics / trends*
  • Models, Theoretical
  • Natural Language Processing*
  • Pattern Recognition, Automated
  • User-Computer Interface