Background: Machine learning (ML) is increasingly used to analyse pain-related data, emphasising how well variables classify individuals, that is, training an algorithm to assign people to predefined groups such as high versus low pain sensitivity, rather than focusing on p-values. A challenge arises when accurate classification persists after removing variables identified as important by feature-selection methods. This creates uncertainty about which factors are genuinely relevant to the trait of interest, as classification information may still reside in the remaining features.
Methods: An iterative ML framework is presented that repeatedly tests groups of variables, combining two established feature-selection techniques with several classification algorithms. The approach was applied to three datasets, two assessing pain traits and one artificial, and compared with classical statistical methods, including logistic regression.
Results: The iterative process clarified which variables were truly relevant for classification by assessing whether unselected features could still discriminate individuals. When they could not, selected variables became more interpretable in a biological context. Combining multiple ML approaches improved feature selection, addressed multicollinearity and enhanced robustness across models. Logistic regression sometimes required preselected inputs or missed known relevant variables. Variation in model performance increased interpretive complexity.
Conclusions: ML-based feature selection broadens methodological options for identifying trait-relevant variables. Iterating through variable sets supports transparent, replicable inference. ML can help identify variables related to pain traits, but selected features should not be assumed uniquely important. Testing unselected variables remains essential, as their failure to predict outcomes may reflect algorithmic limitations rather than definitive trait exclusivity.
Significance statement: This study presents an iterative machine learning framework that improves the identification of trait-relevant features in biomedical pain data. This framework reduces ambiguity in feature selection and clarifies interpretation, helping to distinguish robust, meaningful predictors from coincidental ones. This approach enhances the interpretation and transparency of machine learning analyses in pain research and related biomedical fields.
Keywords: data science; effect sizes; feature selection; knowledge discovery; machine learning; pain research; statistics.
© 2026 The Author(s). European Journal of Pain published by John Wiley & Sons Ltd on behalf of European Pain Federation ‐ EFIC ®.