Quantitative imaging biomarkers are being used increasingly in medicine to diagnose and monitor patients' disease. The computer algorithms that measure quantitative imaging biomarkers have different technical performance characteristics. In this paper we illustrate the appropriate statistical methods for assessing and comparing the bias, precision, and agreement of computer algorithms. We use data from three studies of pulmonary nodules. The first study is a small phantom study used to illustrate metrics for assessing repeatability. The second study is a large phantom study allowing assessment of four algorithms' bias and reproducibility for measuring tumor volume and the change in tumor volume. The third study is a small clinical study of patients whose tumors were measured on two occasions. This study allows a direct assessment of six algorithms' performance for measuring tumor change. With these three examples we compare and contrast study designs and performance metrics, and we illustrate the advantages and limitations of various common statistical methods for quantitative imaging biomarker studies.
Keywords: agreement; bias; coverage probability; intraclass correlation coefficient; limits of agreement; repeatability; reproducibility.
© The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.