Statistical issues in the comparison of quantitative imaging biomarker algorithms using pulmonary nodule volume as an example

Stat Methods Med Res. 2015 Feb;24(1):107-40. doi: 10.1177/0962280214537392. Epub 2014 Jun 11.


Quantitative imaging biomarkers are being used increasingly in medicine to diagnose and monitor patients' disease. The computer algorithms that measure quantitative imaging biomarkers have different technical performance characteristics. In this paper we illustrate the appropriate statistical methods for assessing and comparing the bias, precision, and agreement of computer algorithms. We use data from three studies of pulmonary nodules. The first study is a small phantom study used to illustrate metrics for assessing repeatability. The second study is a large phantom study allowing assessment of four algorithms' bias and reproducibility for measuring tumor volume and the change in tumor volume. The third study is a small clinical study of patients whose tumors were measured on two occasions. This study allows a direct assessment of six algorithms' performance for measuring tumor change. With these three examples we compare and contrast study designs and performance metrics, and we illustrate the advantages and limitations of various common statistical methods for quantitative imaging biomarker studies.

Keywords: agreement; bias; coverage probability; intraclass correlation coefficient; limits of agreement; repeatability; reproducibility.

Publication types

  • Comparative Study
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Bias
  • Biomarkers*
  • Diagnostic Imaging*
  • Humans
  • Phantoms, Imaging
  • Reproducibility of Results
  • Research Design
  • Solitary Pulmonary Nodule / diagnosis*
  • Statistics as Topic*


  • Biomarkers