Leader-follower UAVs formation control based on a deep Q-network collaborative framework

Zhijun Liu; Jie Li; Jian Shen; Xiaoguang Wang; Pengyun Chen

doi:10.1038/s41598-024-54531-w

Leader-follower UAVs formation control based on a deep Q-network collaborative framework

Sci Rep. 2024 Feb 26;14(1):4674. doi: 10.1038/s41598-024-54531-w.

Authors

Zhijun Liu^{1

2}, Jie Li^{1

2}, Jian Shen^{3

4}, Xiaoguang Wang⁵, Pengyun Chen⁶

Affiliations

¹ Shenzhen MSU-BIT University, Shenzhen, 518172, China.
² School of Mechatronical Engineering, Beijing Institute of Technology, Beijing, 100081, China.
³ School of Mechanical and Electrical Engineering, North University of China, Taiyuan, 030051, China. shenjian@nuc.edu.cn.
⁴ Department of Advanced Technology, Norinco Group Aviation Ammunition Research Institute, Harbin, 150030, China. shenjian@nuc.edu.cn.
⁵ Department of Advanced Technology, Norinco Group Aviation Ammunition Research Institute, Harbin, 150030, China.
⁶ School of Aerospace Engineering, North University of China, Taiyuan, 030051, China.

PMID: 38409308
DOI: 10.1038/s41598-024-54531-w

Abstract

This study examines a collaborative framework that utilizes an intelligent deep Q-network to regulate the formation of leader-follower Unmanned Aerial Vehicles (UAVs). The aim is to tackle the challenges posed by the highly dynamic and uncertain flight environment of UAVs. In the context of UAVs, we have developed a dynamic model that captures the collective state of the system. This model encompasses variables like as the relative positions, heading angle, rolling angle, and velocity of different nodes in the formation. In the subsequent section, we elucidate the operational procedure of UAVs in a collaborative manner, employing the conceptual framework of Markov Decision Process (MDP). Furthermore, we employ the Reinforcement Learning (RL) to facilitate this process. In light of this premise, a fundamental framework is presented for addressing the control problem of UAVs utilizing the DQN scheme. This framework encompasses a technique for action selection known as [Formula: see text]-imitation, as well as algorithmic specifics. Finally, the efficacy and portability of the DQN-based approach are substantiated by numerical simulation validation. The average reward curve demonstrates a satisfactory level of convergence, and kinematic link between the nodes inside the formation satisfies the essential requirements for the creation of a controller.

Abstract

Grants and funding