A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs

Chang Wang, Chao Yan, Xiaojia Xiang, Han Zhou
; Proceedings of The Eleventh Asian Conference on Machine Learning, PMLR 101:64-79, 2019.

Abstract

Controlling a squad of fixed-wing UAVs is challenging due to the kinematics complexity and the environmental dynamics. In this paper, we develop a novel actor-critic reinforcement learning approach to solve the leader-follower flocking problem in continuous state and action spaces. Specifically, we propose a CACER algorithm that uses multilayer perceptron to represent both the actor and the critic, which has a deeper structure and provides a better function approximator than the original continuous actor-critic learning automation (CACLA) algorithm. Besides, we propose a double prioritized experience replay (DPER) mechanism to further improve the training efficiency. Specifically, the state transition samples are saved into two different experience replay buffers for updating the actor and the critic separately, based on the calculation of sample priority using the temporal difference errors. We have not only compared CACER with CACLA and a benchmark deep reinforcement learning algorithm DDPG in numerical simulation, but also demonstrated the performance of CACER in semi-physical simulation by transferring the learned policy in the numerical simulation without parameter tuning.

Cite this Paper


BibTeX
@InProceedings{pmlr-v101-wang19a, title = {A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs}, author = {Wang, Chang and Yan, Chao and Xiang, Xiaojia and Zhou, Han}, booktitle = {Proceedings of The Eleventh Asian Conference on Machine Learning}, pages = {64--79}, year = {2019}, editor = {Wee Sun Lee and Taiji Suzuki}, volume = {101}, series = {Proceedings of Machine Learning Research}, address = {Nagoya, Japan}, month = {17--19 Nov}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v101/wang19a/wang19a.pdf}, url = {http://proceedings.mlr.press/v101/wang19a.html}, abstract = {Controlling a squad of fixed-wing UAVs is challenging due to the kinematics complexity and the environmental dynamics. In this paper, we develop a novel actor-critic reinforcement learning approach to solve the leader-follower flocking problem in continuous state and action spaces. Specifically, we propose a CACER algorithm that uses multilayer perceptron to represent both the actor and the critic, which has a deeper structure and provides a better function approximator than the original continuous actor-critic learning automation (CACLA) algorithm. Besides, we propose a double prioritized experience replay (DPER) mechanism to further improve the training efficiency. Specifically, the state transition samples are saved into two different experience replay buffers for updating the actor and the critic separately, based on the calculation of sample priority using the temporal difference errors. We have not only compared CACER with CACLA and a benchmark deep reinforcement learning algorithm DDPG in numerical simulation, but also demonstrated the performance of CACER in semi-physical simulation by transferring the learned policy in the numerical simulation without parameter tuning.} }
Endnote
%0 Conference Paper %T A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs %A Chang Wang %A Chao Yan %A Xiaojia Xiang %A Han Zhou %B Proceedings of The Eleventh Asian Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2019 %E Wee Sun Lee %E Taiji Suzuki %F pmlr-v101-wang19a %I PMLR %J Proceedings of Machine Learning Research %P 64--79 %U http://proceedings.mlr.press %V 101 %W PMLR %X Controlling a squad of fixed-wing UAVs is challenging due to the kinematics complexity and the environmental dynamics. In this paper, we develop a novel actor-critic reinforcement learning approach to solve the leader-follower flocking problem in continuous state and action spaces. Specifically, we propose a CACER algorithm that uses multilayer perceptron to represent both the actor and the critic, which has a deeper structure and provides a better function approximator than the original continuous actor-critic learning automation (CACLA) algorithm. Besides, we propose a double prioritized experience replay (DPER) mechanism to further improve the training efficiency. Specifically, the state transition samples are saved into two different experience replay buffers for updating the actor and the critic separately, based on the calculation of sample priority using the temporal difference errors. We have not only compared CACER with CACLA and a benchmark deep reinforcement learning algorithm DDPG in numerical simulation, but also demonstrated the performance of CACER in semi-physical simulation by transferring the learned policy in the numerical simulation without parameter tuning.
APA
Wang, C., Yan, C., Xiang, X. & Zhou, H.. (2019). A Continuous Actor-Critic Reinforcement Learning Approach to Flocking with Fixed-Wing UAVs. Proceedings of The Eleventh Asian Conference on Machine Learning, in PMLR 101:64-79

Related Material