Open radio access network (O-RAN) is one of the promising candidates for fulfilling flexible and cost-effective goals by considering openness and intelligence in its architecture. In the O-RAN architecture, a central unit (O-CU) and a distributed unit (O-DU) are virtualized and executed on processing pools of general-purpose processors that can be placed at different locations. Therefore, it is challenging to choose a proper location for executing network functions (NFs) over these entities by considering propagation delay and computational capacity. In this paper, we propose a Soft Actor-Critic Energy-Aware Dynamic DU Selection algorithm (SA2C-EADDUS) by integrating two nested actor-critic agents in the O-RAN architecture. In addition, we formulate an optimization model that minimizes delay and energy consumption. Then, we solve that problem with an MILP solver and use that solution as a lower bound comparison for our SA2C-EADDUS algorithm. Moreover, we compare that algorithm with recent works, including RL- and DRL-based resource allocation algorithms and a heuristic method. We show that by collaborating A2C agents in different layers and by dynamic relocation of NFs, based on service requirements, our schemes improve the energy efficiency by 50% with respect to other schemes. Moreover, we reduce the mean delay by a significant amount with our novel SA2C-EADDUS approach.
Keywords: O-RAN; RAN optimization; actor–critic learning; energy-efficiency.