Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature

William Hsu; William Speier; Ricky K Taira

Automated extraction of reported statistical analyses: towards a logical representation of clinical trial literature

AMIA Annu Symp Proc. 2012:2012:350-9. Epub 2012 Nov 3.

Authors

William Hsu¹, William Speier, Ricky K Taira

Affiliation

¹ Medical Imaging Informatics Group, Dept of Radiological Sciences, University of California, Los Angeles, CA, USA.

PMID: 23304305
PMCID: PMC3540551

Abstract

Randomized controlled trials are an important source of evidence for guiding clinical decisions when treating a patient. However, given the large number of studies and their variability in quality, determining how to summarize reported results and formalize them as part of practice guidelines continues to be a challenge. We have developed a set of information extraction and annotation tools to automate the identification of key information from papers related to the hypothesis, sample size, statistical test, confidence interval, significance level, and conclusions. We adapted the Automated Sequence Annotation Pipeline to map extracted phrases to relevant knowledge sources. We trained and tested our system on a corpus of 42 full-text articles related to chemotherapy of non-small cell lung cancer. On our test set of 7 papers, we obtained an overall precision of 86%, recall of 78%, and an F-score of 0.82 for classifying sentences. This work represents our efforts towards utilizing this information for quality assessment, meta-analysis, and modeling.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Carcinoma, Non-Small-Cell Lung
Electronic Data Processing*
Evidence-Based Medicine
Humans
Information Storage and Retrieval / methods*
Lung Neoplasms
Natural Language Processing
Randomized Controlled Trials as Topic* / statistics & numerical data
Sensitivity and Specificity

Abstract

Publication types

MeSH terms

Grants and funding