Machine Learning-Based Screening for Potential Singlet Fission Chromophores: The Challenge of Imbalanced Data Sets

J Phys Chem Lett. 2023 Nov 16;14(45):10103-10112. doi: 10.1021/acs.jpclett.3c02365. Epub 2023 Nov 3.

Abstract

Excitation with one photon of a singlet fission (SF) material generates two triplet excitons, thus doubling the solar cell efficiency. Therefore, the SF molecules are regarded as new generation organic photovoltaics, but it is hard to identify them. Recently, it was demonstrated that molecules of low-to-intermediate diradical character (DRC) are potential SF chromophores. This prompts a low-cost strategy for finding new SF candidates by computational high-throughput workflows. We propose a machine learning aided screening for SF entrants based on their DRC. Our data set comprises 469 784 compounds extracted from the PubChem database, structurally rich but inherently imbalanced regarding DRC values. We developed well performing classification models that can retrieve potential SF chromophores. The latter (∼4%) were analyzed by K-means clustering to reveal qualitative structure-property relationships and to extract strategies for molecular design. The developed screening procedure and data set can be easily adapted for applications of diradicaloids in photonics and spintronics.