Recent advances in the SISSO method and their implementation in the SISSO++ code

J Chem Phys. 2023 Sep 21;159(11):114110. doi: 10.1063/5.0156620.

Abstract

Accurate and explainable artificial-intelligence (AI) models are promising tools for accelerating the discovery of new materials. Recently, symbolic regression has become an increasingly popular tool for explainable AI because it yields models that are relatively simple analytical descriptions of target properties. Due to its deterministic nature, the sure-independence screening and sparsifying operator (SISSO) method is a particularly promising approach for this application. Here, we describe the new advancements of the SISSO algorithm, as implemented into SISSO++, a C++ code with Python bindings. We introduce a new representation of the mathematical expressions found by SISSO. This is a first step toward introducing "grammar" rules into the feature creation step. Importantly, by introducing a controlled nonlinear optimization to the feature creation step, we expand the range of possible descriptors found by the methodology. Finally, we introduce refinements to the solver algorithms for both regression and classification, which drastically increase the reliability and efficiency of SISSO. For all these improvements to the basic SISSO algorithm, we not only illustrate their potential impact but also fully detail how they operate both mathematically and computationally.