Joint Deep Reinforcement Learning and Unsupervised Learning for Channel Selection and Power Control in D2D Networks

Ming Sun; Yanhui Jin; Shumei Wang; Erzhuang Mei

doi:10.3390/e24121722

Joint Deep Reinforcement Learning and Unsupervised Learning for Channel Selection and Power Control in D2D Networks

Entropy (Basel). 2022 Nov 24;24(12):1722. doi: 10.3390/e24121722.

Authors

Ming Sun¹, Yanhui Jin¹, Shumei Wang², Erzhuang Mei¹

Affiliations

¹ College of Computer and Control Engineering, Qiqihar University, Qiqihar 161006, China.
² School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150028, China.

Abstract

Device-to-device (D2D) technology enables direct communication between devices, which can effectively solve the problem of insufficient spectrum resources in 5G communication technology. Since the channels are shared among multiple D2D user pairs, it may lead to serious interference between D2D user pairs. In order to reduce interference, effectively increase network capacity, and improve wireless spectrum utilization, this paper proposed a distributed resource allocation algorithm with the joint of a deep Q network (DQN) and an unsupervised learning network. Firstly, a DQN algorithm was constructed to solve the channel allocation in the dynamic and unknown environment in a distributed manner. Then, a deep power control neural network with the unsupervised learning strategy was constructed to output an optimized channel power control scheme to maximize the spectrum transmit sum-rate through the corresponding constraint processing. As opposed to traditional centralized approaches that require the collection of instantaneous global network information, the algorithm proposed in this paper used each transmitter as a learning agent to make channel selection and power control through a small amount of state information collected locally. The simulation results showed that the proposed algorithm was more effective in increasing the convergence speed and maximizing the transmit sum-rate than other traditional centralized and distributed algorithms.

Keywords: channel selection; deep reinforcement learning; device-to-device; power control; unsupervised learning.

Abstract

Grants and funding