ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost

J S Smith; O Isayev; A E Roitberg

doi:10.1039/c6sc05720a

ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost

Chem Sci. 2017 Apr 1;8(4):3192-3203. doi: 10.1039/c6sc05720a. Epub 2017 Feb 8.

Authors

J S Smith¹, O Isayev², A E Roitberg¹

Affiliations

¹ University of Florida , Department of Chemistry , PO Box 117200 , Gainesville , FL , USA 32611-7200 . Email: roitberg@ufl.edu.
² University of North Carolina at Chapel Hill , Division of Chemical Biology and Medicinal Chemistry , UNC Eshelman School of Pharmacy , Chapel Hill , NC , USA 27599 . Email: olexandr@olexandrisayev.com.

Abstract

Deep learning is revolutionizing many areas of science and technology, especially image, text, and speech recognition. In this paper, we demonstrate how a deep neural network (NN) trained on quantum mechanical (QM) DFT calculations can learn an accurate and transferable potential for organic molecules. We introduce ANAKIN-ME (Accurate NeurAl networK engINe for Molecular Energies) or ANI for short. ANI is a new method designed with the intent of developing transferable neural network potentials that utilize a highly-modified version of the Behler and Parrinello symmetry functions to build single-atom atomic environment vectors (AEV) as a molecular representation. AEVs provide the ability to train neural networks to data that spans both configurational and conformational space, a feat not previously accomplished on this scale. We utilized ANI to build a potential called ANI-1, which was trained on a subset of the GDB databases with up to 8 heavy atoms in order to predict total energies for organic molecules containing four atom types: H, C, N, and O. To obtain an accelerated but physically relevant sampling of molecular potential surfaces, we also proposed a Normal Mode Sampling (NMS) method for generating molecular conformations. Through a series of case studies, we show that ANI-1 is chemically accurate compared to reference DFT calculations on much larger molecular systems (up to 54 atoms) than those included in the training data set.

Grants and funding

R01 GM110077/GM/NIGMS NIH HHS/United States