Background: There is a need to better understand the association between sleep and chronic diseases. In this study we developed a natural language processing (NLP) algorithm to mine polysomnography (PSG) free-text notes from electronic medical records (EMR) and evaluated the performance. Methods: Using the Veterans Health Administration EMR, we identified 46,093 PSG studies using CPT code 95,810 from 1 October 2000−30 September 2019. We randomly selected 200 notes to compare the accuracy of the NLP algorithm in mining sleep parameters including total sleep time (TST), sleep efficiency (SE) and sleep onset latency (SOL), wake after sleep onset (WASO), and apnea-hypopnea index (AHI) compared to visual inspection by raters masked to the NLP output. Results: The NLP performance on the training phase was >0.90 for precision, recall, and F-1 score for TST, SOL, SE, WASO, and AHI. The NLP performance on the test phase was >0.90 for precision, recall, and F-1 score for TST, SOL, SE, WASO, and AHI. Conclusions: This study showed that NLP is an accurate technique to extract sleep parameters from PSG reports in the EMR. Thus, NLP can serve as an effective tool in large health care systems to evaluate and improve patient care.
Keywords: natural language processing; polysomnography; sleep parameters.