Given the increased need for consistent quantitative image analysis, variations in radiomics feature calculations due to differences in radiomics software were investigated. Two in-house radiomics packages and two freely available radiomics packages, MaZda and IBEX, were utilized. Forty regions of interest (ROIs) from 40 digital mammograms were studied along with 39 manually delineated ROIs from the head and neck (HN) computed tomography (CT) scans of 39 patients. Each package was used to calculate first-order histogram and second-order gray-level co-occurrence matrix (GLCM) features. Friedman tests determined differences in feature values across packages, whereas intraclass-correlation coefficients (ICC) quantified agreement. All first-order features computed from both mammography and HN cases (except skewness in mammography) showed significant differences across all packages due to systematic biases introduced by each package; however, based on ICC values, all but one first-order feature calculated on mammography ROIs and all but two first-order features calculated on HN CT ROIs showed excellent agreement, indicating the observed differences were small relative to the feature values but the bias was systematic. All second-order features computed from the two databases both differed significantly and showed poor agreement among packages, due largely to discrepancies in package-specific default GLCM parameters. Additional differences in radiomics features were traced to variations in image preprocessing, algorithm implementation, and naming conventions. Large variations in features among software packages indicate that increased efforts to standardize radiomics processes must be conducted.
Keywords: head and neck CT; mammography; radiomics; software packages; texture analysis.