Quantum architecture search via truly proximal policy optimization

Xianchao Zhu; Xiaokai Hou

doi:10.1038/s41598-023-32349-2

Quantum architecture search via truly proximal policy optimization

Sci Rep. 2023 Mar 29;13(1):5157. doi: 10.1038/s41598-023-32349-2.

Authors

Xianchao Zhu¹, Xiaokai Hou²

Affiliations

¹ School of Artificial Intelligence and Big Data, Henan University of Technology, Zhengzhou, 450001, China. xczhuiffs@163.com.
² Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.

Abstract

Quantum Architecture Search (QAS) is a process of voluntarily designing quantum circuit architectures using intelligent algorithms. Recently, Kuo et al. (Quantum architecture search via deep reinforcement learning. arXiv preprint arXiv:2104.07715, 2021) proposed a deep reinforcement learning-based QAS (QAS-PPO) method, which used the Proximal Policy Optimization (PPO) algorithm to automatically generate the quantum circuit without any expert knowledge in physics. However, QAS-PPO can neither strictly limit the probability ratio between old and new policies nor enforce well-defined trust domain constraints, resulting in poor performance. In this paper, we present a new deep reinforcement learning-based QAS method, called Trust Region-based PPO with Rollback for QAS (QAS-TR-PPO-RB), to automatically build the quantum gates sequence from the density matrix only. Specifically, inspired by the research work of Wang, we employ an improved clipping function to implement the rollback behavior to limit the probability ratio between the new strategy and the old strategy. In addition, we use the triggering condition of the clipping based on the trust domain to optimize the policy by restricting the policy within the trust domain, which leads to guaranteed monotone improvement. Experiments on several multi-qubit circuits demonstrate that our presented method achieves better policy performance and lower algorithm running time than the original deep reinforcement learning-based QAS method.