Natural Language Processing (NLP) has been adopted widely in clinical trial matching for its ability to process unstructured text that is often found in electronic health records. Despite the rise in the new tools that use NLP to match patients to eligible clinical trials, the comparison of these tools is difficult due to the lack of consistency in how these tools are evaluated. The ground truth or reference that the tools use to assess results varies, making it difficult to compare the robustness of the tools against each other. This paper alarms the lack of definition and consistency of ground truth data used to evaluate such tools and suggests two ways to define a gold standard for the ground truth in small and large-scale studies.
Keywords: Clinical Trial Matching; Eligibility Criteria; Natural Language Processing.