Optimizing thermodynamic trajectories using evolutionary and gradient-based reinforcement learning

Chris Beeler; Uladzimir Yahorau; Rory Coles; Kyle Mills; Stephen Whitelam; Isaac Tamblyn

doi:10.1103/PhysRevE.104.064128

Optimizing thermodynamic trajectories using evolutionary and gradient-based reinforcement learning

Phys Rev E. 2021 Dec;104(6-1):064128. doi: 10.1103/PhysRevE.104.064128.

Authors

Chris Beeler^{1

2}, Uladzimir Yahorau³, Rory Coles⁴, Kyle Mills^{3

5}, Stephen Whitelam⁶, Isaac Tamblyn^{3

5

7}

Affiliations

¹ Department of Mathematics and Statistics, University of Ottawa, Ottawa, Ontario, Canada K1N 6N5.
² National Research Council of Canada, Ottawa, Ontario, Canada K1A 0R6.
³ Department of Physics, University of Ontario Institute of Technology, Oshawa, Ontario, Canada L1G 0C5.
⁴ Department of Physics and Astronomy, University of Victoria, Victoria, British Columbia, Canada V8P 5C2.
⁵ Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada M5G 1M1.
⁶ Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA.
⁷ Department of Physics, University of Ottawa, Ottawa, Ontario, Canada K1N 6N5.

PMID: 35030917
DOI: 10.1103/PhysRevE.104.064128

Abstract

Using a model heat engine, we show that neural-network-based reinforcement learning can identify thermodynamic trajectories of maximal efficiency. We consider both gradient and gradient-free reinforcement learning. We use an evolutionary learning algorithm to evolve a population of neural networks, subject to a directive to maximize the efficiency of a trajectory composed of a set of elementary thermodynamic processes; the resulting networks learn to carry out the maximally efficient Carnot, Stirling, or Otto cycles. When given an additional irreversible process, this evolutionary scheme learns a previously unknown thermodynamic cycle. Gradient-based reinforcement learning is able to learn the Stirling cycle, whereas an evolutionary approach achieves the optimal Carnot cycle. Our results show how the reinforcement learning strategies developed for game playing can be applied to solve physical problems conditioned upon path-extensive order parameters.