ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization

Tianying Ji, Yongyuan Liang, Yan Zeng, Yu Luo, Guowei Xu, Jiawei Guo, Ruijie Zheng, Furong Huang, Fuchun Sun, Huazhe Xu
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:21620-21647, 2024.

Abstract

The varying significance of distinct primitive behaviors during the policy learning process has been overlooked by prior model-free RL algorithms. Leveraging this insight, we explore the causal relationship between different action dimensions and rewards to evaluate the significance of various primitive behaviors during training. We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Furthermore, to prevent excessive focus on specific primitive behaviors, we analyze the gradient dormancy phenomenon and introduce a dormancy-guided reset mechanism to further enhance the efficacy of our method. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at https://ace-rl.github.io/.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-ji24b, title = {{ACE}: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization}, author = {Ji, Tianying and Liang, Yongyuan and Zeng, Yan and Luo, Yu and Xu, Guowei and Guo, Jiawei and Zheng, Ruijie and Huang, Furong and Sun, Fuchun and Xu, Huazhe}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {21620--21647}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/ji24b/ji24b.pdf}, url = {https://proceedings.mlr.press/v235/ji24b.html}, abstract = {The varying significance of distinct primitive behaviors during the policy learning process has been overlooked by prior model-free RL algorithms. Leveraging this insight, we explore the causal relationship between different action dimensions and rewards to evaluate the significance of various primitive behaviors during training. We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Furthermore, to prevent excessive focus on specific primitive behaviors, we analyze the gradient dormancy phenomenon and introduce a dormancy-guided reset mechanism to further enhance the efficacy of our method. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at https://ace-rl.github.io/.} }
Endnote
%0 Conference Paper %T ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization %A Tianying Ji %A Yongyuan Liang %A Yan Zeng %A Yu Luo %A Guowei Xu %A Jiawei Guo %A Ruijie Zheng %A Furong Huang %A Fuchun Sun %A Huazhe Xu %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-ji24b %I PMLR %P 21620--21647 %U https://proceedings.mlr.press/v235/ji24b.html %V 235 %X The varying significance of distinct primitive behaviors during the policy learning process has been overlooked by prior model-free RL algorithms. Leveraging this insight, we explore the causal relationship between different action dimensions and rewards to evaluate the significance of various primitive behaviors during training. We introduce a causality-aware entropy term that effectively identifies and prioritizes actions with high potential impacts for efficient exploration. Furthermore, to prevent excessive focus on specific primitive behaviors, we analyze the gradient dormancy phenomenon and introduce a dormancy-guided reset mechanism to further enhance the efficacy of our method. Our proposed algorithm, ACE: Off-policy Actor-critic with Causality-aware Entropy regularization, demonstrates a substantial performance advantage across 29 diverse continuous control tasks spanning 7 domains compared to model-free RL baselines, which underscores the effectiveness, versatility, and efficient sample efficiency of our approach. Benchmark results and videos are available at https://ace-rl.github.io/.
APA
Ji, T., Liang, Y., Zeng, Y., Luo, Y., Xu, G., Guo, J., Zheng, R., Huang, F., Sun, F. & Xu, H.. (2024). ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:21620-21647 Available from https://proceedings.mlr.press/v235/ji24b.html.

Related Material