Parsing Clinical Text Using the State-Of-The-Art Deep Learning Based Parsers: A Systematic Comparison

BMC Med Inform Decis Mak. 2019 Apr 4;19(Suppl 3):77. doi: 10.1186/s12911-019-0783-2.


Background: A shareable repository of clinical notes is critical for advancing natural language processing (NLP) research, and therefore a goal of many NLP researchers is to create a shareable repository of clinical notes, that has breadth (from multiple institutions) as well as depth (as much individual data as possible).

Methods: We aimed to assess the degree to which individuals would be willing to contribute their health data to such a repository. A compact e-survey probed willingness to share demographic and clinical data categories. Participants were faculty, staff, and students in two geographically diverse major medical centers (Utah and New York). Such a sample could be expected to respond like a typical potential participant from the general public who is given complete and fully informed consent about the pros and cons of participating in a research study.

Results: 2140 respondents completed the surveys. 56% of respondents were "somewhat/definitely willing" to share clinical data with identifiers, while 89% of respondents were "somewhat (17%) /definitely willing (72%)" to share without identifiers. Results were consistent across gender, age, and education, but there were some differences by geographical region. Individuals were most reluctant (50-74%) sharing mental health, substance abuse, and domestic violence data.

Conclusions: We conclude that a substantial fraction of potential patient participants, once educated about risks and benefits, would be willing to donate de-identified clinical data to a shared research repository. A slight majority even would be willing to share absent de-identification, suggesting that perceptions about data misuse are not a major concern. Such a repository of clinical notes should be invaluable for clinical NLP research and advancement.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Adult
  • Biomedical Research
  • Confidentiality
  • Data Anonymization
  • Databases as Topic
  • Deep Learning*
  • Female
  • Humans
  • Information Dissemination*
  • Male
  • Natural Language Processing*
  • New York
  • Patient Participation
  • Surveys and Questionnaires