Public health research faces challenges in recruiting socio-economically disadvantaged groups. This study evaluated whether machine learning (ML) algorithms developed using data from a general population could predict indices of diet quality among a socio-economically disadvantaged group. Data from 5367 adults (77·5 % females) in the NutriQuébec project and on 122 variables potentially associated with dietary intakes were used. Dietary intakes were measured using a web-based 24-h recall. Participants were categorised by fifths of a deprivation score based on income, education and material and social deprivation. Participants in the first four fifths formed the general NutriQuébec sample (n 4180) and those above the fifth quintile formed the high deprivation sample (n 1187). Three indices of diet quality defined as 'high' or 'low' were used: vegetable and fruit consumption (VFC, ≥ 5·0 reference amounts (RA)/d), 'other foods' consumption, meaning, foods not recommended in Canada's Food Guide 2019 (OFC, > 5·0 RA/d) and overall diet quality measured using the Healthy Eating Food Index-2019 (HEFI-2019, > 48·9 points). The algorithms developed and tested in the general NutriQuébec sample predicted high VFC, OFC and HEFI-2019 with accuracies of 0·60 (95 % CI 0·58, 0·62), 0·58 (95 % CI 0·56, 0·60) and 0·61 (95 % CI 0·59, 0·63), respectively. In the high deprivation sample, the algorithms predicted the diet quality indices with comparable accuracies (VFC, 0·69, 95 % CI 0·67, 0·71; OFC, 0·56, 95 % CI 0·54, 0·58; HEFI-2019, 0·66, 95 % CI 0·65, 0·67). ML algorithms trained to predict three diet quality indices in the general NutriQuébec sample were applicable to a high deprivation group.
Keywords: Diet quality; High deprivation; Low socio-economic status; Machine learning; NutriQuébec; Public health; Random forest.