Emphysema is visible on computed tomography (CT) as low-density lesions representing the destruction of the pulmonary alveoli. To train a machine learning model on the emphysema extent in CT images, labeled image data is needed. The provision of these labels requires trained readers, who are a limited resource. The purpose of the study was to test the reading time, inter-observer reliability and validity of the multi-reader-multi-split method for acquiring CT image labels from radiologists. The approximately 500 slices of each stack of lung CT images were split into 1-cm chunks, with 17 thin axial slices per chunk. The chunks were randomly distributed to 26 readers, radiologists and radiology residents. Each chunk was given a quick score concerning emphysema type and severity in the left and right lung separately. A cohort of 102 subjects, with varying degrees of visible emphysema in the lung CT images, was selected from the SCAPIS pilot, performed in 2012 in Gothenburg, Sweden. In total, the readers created 9050 labels for 2881 chunks. Image labels were compared with regional annotations already provided at the SCAPIS pilot inclusion. The median reading time per chunk was 15 s. The inter-observer Krippendorff's alpha was 0.40 and 0.53 for emphysema type and score, respectively, and higher in the apical part than in the basal part of the lungs. The multi-split emphysema scores were generally consistent with regional annotations. In conclusion, the multi-reader-multi-split method provided reasonably valid image labels, with an estimation of the inter-observer reliability.
Keywords: Chronic Obstructive Pulmonary Disease; Computed Tomography; Image Annotation; Machine Learning; Observer Variation; Pulmonary Emphysema; X-Ray.