Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Dec;44(6):927-35.
doi: 10.1016/j.jbi.2011.06.001. Epub 2011 Jun 12.

Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

Affiliations

Dynamic categorization of clinical research eligibility criteria by hierarchical clustering

Zhihui Luo et al. J Biomed Inform. 2011 Dec.

Abstract

Objective: To semi-automatically induce semantic categories of eligibility criteria from text and to automatically classify eligibility criteria based on their semantic similarity.

Design: The UMLS semantic types and a set of previously developed semantic preference rules were utilized to create an unambiguous semantic feature representation to induce eligibility criteria categories through hierarchical clustering and to train supervised classifiers.

Measurements: We induced 27 categories and measured the prevalence of the categories in 27,278 eligibility criteria from 1578 clinical trials and compared the classification performance (i.e., precision, recall, and F1-score) between the UMLS-based feature representation and the "bag of words" feature representation among five common classifiers in Weka, including J48, Bayesian Network, Naïve Bayesian, Nearest Neighbor, and instance-based learning classifier.

Results: The UMLS semantic feature representation outperforms the "bag of words" feature representation in 89% of the criteria categories. Using the semantically induced categories, machine-learning classifiers required only 2000 instances to stabilize classification performance. The J48 classifier yielded the best F1-score and the Bayesian Network classifier achieved the best learning efficiency.

Conclusion: The UMLS is an effective knowledge source and can enable an efficient feature representation for semi-automated semantic category induction and automatic categorization for clinical research eligibility criteria and possibly other clinical text.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A framework for dynamic categorization of free-text clinical eligibility criteria by UMLS-based hierarchical clustering: solid arrows show the machine learning process for classifier development; dotted arrows show the automatic criterion classification process using the classifier; shadowed blocks indicate the shared modules between the training and classification stage.
Figure 2
Figure 2
The process of transforming eligibility criteria into a UMLS-based semantic feature matrix.
Figure 3
Figure 3
F1-scores of all categories using the UMLS and “bag of words” feature representation respectively.
Figure 4
Figure 4
Time-efficiency between the “bag of words” and UMLS feature representation.
Figure 5
Figure 5
The learning efficiency of classifier J48 (X-axis: the number of training instances; Y-axis: F1-score of the J48 classifier)
Figure 6
Figure 6
Example of eligibility criteria narratives on Clinicaltrials.gov
Figure 7
Figure 7
The dynamic categorization results for the example criteria in Figure 6.

Similar articles

Cited by

References

    1. Weng C, Tu SW, Sim I, Richesson R. Formal representation of Eligibility Criteria: A Literature Review. Journal of Biomedical Informatics. 2010;43(3):451–467. - PMC - PubMed
    1. McCray AT. Better Access to Information about Clinical Trials. Annals of Internal Medicine. 2000;133(8):609–614. - PubMed
    1. Sim I, Olasov B, Carini S. An ontology of randomized controlled trials for evidence-based practice: content specification and evaluation using the competency decomposition method. Journal of Biomedical Informatics. 2004;37(2):108–119. - PubMed
    1. Tu SW, Peleg M, Carini S, Bobak M, Ross J, Rubin D, Sim I. A practical method for transforming free-text eligibility criteria into computable criteria. Journal of Biomedical Informatics. 2011;239(2):239–250. - PMC - PubMed
    1. Niland J, Cohen E. ASPIRE: agreement on standardized protocol inclusion requirements for eligibility. 2007

Publication types

LinkOut - more resources