The accurate prediction of standard vaporization enthalpy (ΔvapHm°) for volatile organic compounds (VOCs) is of paramount importance in environmental chemistry, industrial applications and regulatory compliance. To overcome traditional experimental methods for predicting ΔvapHm° of VOCs, machine learning (ML) models enable a high-throughput, cost-effective property estimation. But despite a rising momentum, existing ML algorithms still present limitations in prediction accuracy and broad chemical applications. In this work, we present a data driven, explainable supervised ML model to predict ΔvapHm° of VOCs. The model was built on an established experimental database of 2410 unique molecules and 223 VOCs categorized by chemical groups. Using supervised ML regression algorithms, the Random Forest successfully predicted VOCs' ΔvapHm° with a mean absolute error of 3.02 kJ mol-1 and a 95% test score. The model was successfully validated through the prediction of ΔvapHm° for a known database of VOCs and through molecular group hold-out tests. Through chemical feature importance analysis, this explainable model revealed that VOC polarizability, connectivity indexes and electrotopological state are key for the model's prediction accuracy. We thus present a replicable and explainable model, which can be further expanded towards the prediction of other thermodynamic properties of VOCs.
Keywords: Machine learning; Standard vaporization enthalpy; Supervised learning; Thermochemical predictions; VOC.
Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.