A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs
; Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:64-79, 2019.
Controlling a squad of fixed-wing UAVs is challenging due to the kinematics complexity and the environmental dynamics. In this paper, we develop a novel actor-critic reinforcement learning approach to solve the leader-follower flocking problem in continuous state and action spaces. Specifically, we propose a CACER algorithm that uses multilayer perceptron to represent both the actor and the critic, which has a deeper structure and provides a better function approximator than the original continuous actor-critic learning automation (CACLA) algorithm. Besides, we propose a double prioritized experience replay (DPER) mechanism to further improve the training efficiency. Specifically, the state transition samples are saved into two different experience replay buffers for updating the actor and the critic separately, based on the calculation of sample priority using the temporal difference errors. We have not only compared CACER with CACLA and a benchmark deep reinforcement learning algorithm DDPG in numerical simulation, but also demonstrated the performance of CACER in semi-physical simulation by transferring the learned policy in the numerical simulation without parameter tuning.