Image2InChI: Automated Molecular Optical Image Recognition

J Chem Inf Model. 2024 Feb 15. doi: 10.1021/acs.jcim.3c02082. Online ahead of print.

Abstract

The accurate identification and analysis of chemical structures in molecular images are prerequisites of artificial intelligence for drug discovery. It is important to efficiently and automatically convert molecular images into machine-readable representations. Therefore, in this paper, we propose an automated molecular optical image recognition model based on deep learning, called Image2InChI. Additionally, the proposed Image2InChI introduces a novel feature fusion network with attention to integrate image patch and InChI prediction. The improved SwinTransformer as an encoder and the Transformer Decoder as a decoder with patch embedding are applied to predict the image features for the corresponding InChI. The experimental results showed that the Image2InChI model achieves an accuracy of InChI (InChI acc) of 99.8%, a Morgan FP of 94.1%, an accuracy of maximum common structures (MCS acc) of 94.8%, and an accuracy of longest common subsequence (LCS acc) of 96.2%. The experiments demonstrated that the proposed Image2InChI model improves the accuracy and efficiency of molecular image recognition and provided a valuable reference about optical chemical structure recognition for InChI.