Rationale and objectives: Computerized classification techniques have been developed to offer accurate and robust pattern recognition in interstitial lung disease using texture features. However, these techniques still present challenges when analyzing computed tomographic (CT) image data from multiprotocols because of disparate acquisition protocols or from standardized, multicenter clinical trials because of noise variability. Our objective is to investigate the utility of denoising thin section CT image data to improve the classification of scleroderma disease patterns. The patterns are lung fibrosis (LF), groundglass (GG), honeycomb (HC), or normal lung (NL) within small regions of interest (ROIs).
Methods: High-resolution CT images were scanned in a multicenter clinical trial for the Scleroderma Lung Study. A thoracic radiologist contoured a training set (38 patients) consisting of 148 ROIs with 46 LF, 85 GG, 4 HC, and 13 NL patterns and contoured a test set (33 new patients) consisting of 132 ROIs with 44 LF, 72 GG, 4 HC, and 12 NL patterns. The corresponding CT slices of a contoured ROI were denoised using Aujol's mathematic partial differential equation algorithm. The algorithm's noise parameter was estimated as the standard deviation of grey-level signal (in Hounsfield units) in a homogeneous, non-lung region: the aorta. Within each contoured ROI, every pixel within a 4 x 4 neighborhood was sampled (4 x 4 grid sampling). All sampled pixels from a contoured ROI were assumed to be the same disease pattern as labeled by the radiologist. 5,690 pixels (3,009 LF, 1,994 GG, 348 HC, and 339 NL) and 5,045 pixels (2,665 LF, 1,753 GG, 291 HC, and 336 NL) were sampled in training and test sets, respectively. Next, 58 texture features from the original and denoised image were calculated for each pixel. Using a multinomial logistic model, subsets of features (one from original and another from denoised images) were selected to classify disease patterns. Finally, pixels were classified into disease patterns using a support vector machine procedure.
Results: From the training set, multinomial logistic model selected 45 features from the original images and 38 features from denoised images to classify disease patterns. Using the test set, the overall pixel classification rate by SVM increased from 87.8% to 89.5% with denoising. The specific classification rates (original/denoised) were 96.3/96.4% for LF, 88.8/89.4% for GG, 21.3/28.9% for HC, and 73.5/88.4% for NL. Denoising significantly improved the NL and overall classification rates (P = .037 and P = .047 respectively) at ROI level.
Conclusions: Analyzing multicenter data using a denoising approach led to more parsimonious classification models with increasing accuracy. This approach offers a novel alternate classification strategy for heterogeneous technical and disease components. Furthermore, the model offers the potential to discriminate the multiple patterns of scleroderma disease correctly.