Development of an optical character recognition pipeline for handwritten form fields from an electronic health record

Luke V Rasmussen; Peggy L Peissig; Catherine A McCarty; Justin Starren

doi:10.1136/amiajnl-2011-000182

Development of an optical character recognition pipeline for handwritten form fields from an electronic health record

J Am Med Inform Assoc. 2012 Jun;19(e1):e90-5. doi: 10.1136/amiajnl-2011-000182. Epub 2011 Sep 2.

Authors

Luke V Rasmussen¹, Peggy L Peissig, Catherine A McCarty, Justin Starren

Affiliation

¹ Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin 54449, USA. rasmussen.luke@mcrf.mfldclin.edu

Abstract

Background: Although the penetration of electronic health records is increasing rapidly, much of the historical medical record is only available in handwritten notes and forms, which require labor-intensive, human chart abstraction for some clinical research. The few previous studies on automated extraction of data from these handwritten notes have focused on monolithic, custom-developed recognition systems or third-party systems that require proprietary forms.

Methods: We present an optical character recognition processing pipeline, which leverages the capabilities of existing third-party optical character recognition engines, and provides the flexibility offered by a modular custom-developed system. The system was configured and run on a selected set of form fields extracted from a corpus of handwritten ophthalmology forms.

Observations: The processing pipeline allowed multiple configurations to be run, with the optimal configuration consisting of the Nuance and LEADTOOLS engines running in parallel with a positive predictive value of 94.6% and a sensitivity of 13.5%.

Discussion: While limitations exist, preliminary experience from this project yielded insights on the generalizability and applicability of integrating multiple, inexpensive general-purpose third-party optical character recognition engines in a modular pipeline.

Development of an optical character recognition pipeline for handwritten form fields from an electronic health record

Authors

Affiliation

Abstract

Publication types

MeSH terms

Grants and funding