The metabolites of green coffee beans can be influenced by various factors, including species, variety, geographical origin, and post-harvest processing methods. However, previous studies often focused on limited factors separately and were not comprehensive in scope, utilizing only green coffee beans from a restricted area. To fill the gap, we simultaneously analyzed 176 global green coffee beans (C. arabica) from various continents, altitudes, post-harvest processing methods, and varieties to comprehensively investigate the primary factors influencing coffee quality, using metabolomics approach with GC-MS, and machine learning analysis. Partial least squares-discriminant analysis (PLS-DA) revealed that coffee bean characteristics were differently affected by each factor, highlighting 56 key metabolites that varied by each factor, while simultaneously identifying metabolites associated with sub-level variables within each factor. According to the F1 score of the Random Forest model (continent: 91.5 %, altitude: 74.2 %, processing method: 81.4 %, variety: 64.7 %), the continent had the greatest effect on coffee metabolite profiles, followed by the post-harvest processing, altitude, and variety. Additionally, comprehensive heatmap visualizations, incorporating the four factors, are presented, which can be utilized as valuable information for manufacturing customized coffee beans aligned with consumer preferences. These findings provide comprehensive insights into the association between various factors affecting coffee quality and coffee metabolites.
Keywords: Altitude; Coffee; Continent; Machine learning; Metabolite; Post-harvest processing; Variety.
Copyright © 2025 Elsevier Ltd. All rights reserved.