Functional Wasserstein Variational Policy Optimization

Junyu Xuan, Mengjing Wu, Zihe Liu, Jie Lu
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:3893-3911, 2024.

Abstract

Variational policy optimization has become increasingly attractive to the reinforcement learning community because of its strong capability in uncertainty modeling and environment generalization. However, almost all existing studies in this area rely on Kullback{–}Leibler (KL) divergence which is unfortunately ill-defined in several situations. In addition, the policy is parameterized and optimized in weight space, which may not only bring additional unnecessary bias but also make the policy learning harder due to the complicatedly dependent weight posterior. In the paper, we design a novel functional Wasserstein variational policy optimization (FWVPO) based on the Wasserstein distance between function distributions. Specifically, we firstly parameterize policy as a Bayesian neural network but from a function-space view rather than a weight-space view and then propose FWVPO to optimize and explore the functional policy posterior. We prove that our FWVPO is a valid variational Bayesian objective and also guarantees the monotonic expected reward improvement under certain conditions. Experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithm in terms of both cumulative rewards and uncertainty modeling capability.

Cite this Paper


BibTeX
@InProceedings{pmlr-v244-xuan24a, title = {Functional Wasserstein Variational Policy Optimization}, author = {Xuan, Junyu and Wu, Mengjing and Liu, Zihe and Lu, Jie}, booktitle = {Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence}, pages = {3893--3911}, year = {2024}, editor = {Kiyavash, Negar and Mooij, Joris M.}, volume = {244}, series = {Proceedings of Machine Learning Research}, month = {15--19 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v244/main/assets/xuan24a/xuan24a.pdf}, url = {https://proceedings.mlr.press/v244/xuan24a.html}, abstract = {Variational policy optimization has become increasingly attractive to the reinforcement learning community because of its strong capability in uncertainty modeling and environment generalization. However, almost all existing studies in this area rely on Kullback{–}Leibler (KL) divergence which is unfortunately ill-defined in several situations. In addition, the policy is parameterized and optimized in weight space, which may not only bring additional unnecessary bias but also make the policy learning harder due to the complicatedly dependent weight posterior. In the paper, we design a novel functional Wasserstein variational policy optimization (FWVPO) based on the Wasserstein distance between function distributions. Specifically, we firstly parameterize policy as a Bayesian neural network but from a function-space view rather than a weight-space view and then propose FWVPO to optimize and explore the functional policy posterior. We prove that our FWVPO is a valid variational Bayesian objective and also guarantees the monotonic expected reward improvement under certain conditions. Experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithm in terms of both cumulative rewards and uncertainty modeling capability.} }
Endnote
%0 Conference Paper %T Functional Wasserstein Variational Policy Optimization %A Junyu Xuan %A Mengjing Wu %A Zihe Liu %A Jie Lu %B Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2024 %E Negar Kiyavash %E Joris M. Mooij %F pmlr-v244-xuan24a %I PMLR %P 3893--3911 %U https://proceedings.mlr.press/v244/xuan24a.html %V 244 %X Variational policy optimization has become increasingly attractive to the reinforcement learning community because of its strong capability in uncertainty modeling and environment generalization. However, almost all existing studies in this area rely on Kullback{–}Leibler (KL) divergence which is unfortunately ill-defined in several situations. In addition, the policy is parameterized and optimized in weight space, which may not only bring additional unnecessary bias but also make the policy learning harder due to the complicatedly dependent weight posterior. In the paper, we design a novel functional Wasserstein variational policy optimization (FWVPO) based on the Wasserstein distance between function distributions. Specifically, we firstly parameterize policy as a Bayesian neural network but from a function-space view rather than a weight-space view and then propose FWVPO to optimize and explore the functional policy posterior. We prove that our FWVPO is a valid variational Bayesian objective and also guarantees the monotonic expected reward improvement under certain conditions. Experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithm in terms of both cumulative rewards and uncertainty modeling capability.
APA
Xuan, J., Wu, M., Liu, Z. & Lu, J.. (2024). Functional Wasserstein Variational Policy Optimization. Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research 244:3893-3911 Available from https://proceedings.mlr.press/v244/xuan24a.html.

Related Material