Trans-UTPA: PSO and MADDPG based multi-UAVs trajectory planning algorithm for emergency communication

Jie Li; Shuang Cao; Xianjie Liu; Ruiyun Yu; Xingwei Wang

doi:10.3389/fnbot.2022.1076338

Trans-UTPA: PSO and MADDPG based multi-UAVs trajectory planning algorithm for emergency communication

Front Neurorobot. 2023 Jan 24:16:1076338. doi: 10.3389/fnbot.2022.1076338. eCollection 2022.

Authors

Jie Li¹, Shuang Cao², Xianjie Liu¹, Ruiyun Yu², Xingwei Wang¹

Affiliations

¹ School of Computer Science and Engineering, Northeastern University, Shenyang, Liaoning, China.
² School of Software, Northeastern University, Shenyang, Liaoning, China.

Abstract

Communication infrastructure is damaged by disasters and it is difficult to support communication services in affected areas. UAVs play an important role in the emergency communication system. Due to the limited airborne energy of a UAV, it is a critical technical issue to effectively design flight routes to complete rescue missions. We fully consider the distribution of the rescue area, the type of mission, and the flight characteristics of the UAV. Firstly, according to the distribution of the crowd, the PSO algorithm is used to cluster the target-POI of the task area, and the neural collaborative filtering algorithm is used to prioritize the target-POI. Then we also design a Trans-UTPA algorithm. Based on MAPPO 's policy network and value function, we introduce transformer model to make Trans-UTPA's policy learning have no action space limitation and can be multi-task parallel, which improves the efficiency and generalization of sample processing. In a three-dimensional space, the UAV selects the emergency task to be performed (data acquisition and networking communication) based on strategic learning of state information (location information, energy consumption information, etc.) and action information (horizontal flight, ascent, and descent), and then designs the UAV flight path based on the maximization of the global value function. The experimental results show that the performance of the Trans-UTPA algorithm is further improved compared with the USCTP algorithm in terms of the success rate of each UAV reaching the target position, the number of collisions, and the average reward of the algorithm. Among them, the average reward of the algorithm exceeds the USCTP algorithm by 13%, and the number of collisions is reduced by 60%. Compared with the heuristic algorithm, it can cover more target-POIs, and has less energy consumption than the heuristic algorithm.

Keywords: PSO; energy consumption; multi-UAVs collaboration; multi-agent reinforcement learning; trajectory planning; transformer.

Grants and funding

This work was supported by the National Key Research and Development Projects (2019YFB1802800), the Liaoning Province Science and Technology Fund Project (2020MS086), the Shenyang Science and Technology Plan Project (20206424), the Fundamental Research Funds for the Central Universities (N2116014), China University Industry-University-Research Innovation Fund (2021ITA10011), and the National Natural Science Foundation of China (61872073).