Extracting data from figures with software was faster, with higher interrater reliability than manual extraction

J Clin Epidemiol. 2016 Jun;74:119-23. doi: 10.1016/j.jclinepi.2016.01.002. Epub 2016 Jan 11.


Objectives: To compare speed and accuracy of graphical data extraction using manual estimation and open source software.

Study design and setting: Data points from eligible graphs/figures published in randomized controlled trials (RCTs) from 2009 to 2014 were extracted by two authors independently, both by manual estimation and with the Plot Digitizer, open source software. Corresponding authors of each RCT were contacted up to four times via e-mail to obtain exact numbers that were used to create graphs. Accuracy of each method was compared against the source data from which the original graphs were produced.

Results: Software data extraction was significantly faster, reducing time for extraction for 47%. Percent agreement between the two raters was 51% for manual and 53.5% for software data extraction. Percent agreement between the raters and original data was 66% vs. 75% for the first rater and 69% vs. 73% for the second rater, for manual and software extraction, respectively.

Conclusions: Data extraction from figures should be conducted using software, whereas manual estimation should be avoided. Using software for data extraction of data presented only in figures is faster and enables higher interrater reliability.

Keywords: Accuracy; Estimation; Extraction; Figures; Graphical data; Software.

MeSH terms

  • Computer Graphics / statistics & numerical data*
  • Data Mining / methods*
  • Data Mining / statistics & numerical data*
  • Humans
  • Observer Variation
  • Randomized Controlled Trials as Topic
  • Reproducibility of Results
  • Software
  • Time Factors