Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient

Ju-Hyun Kim, Seungki Min
Proceedings of the 41st International Conference on Machine Learning, PMLR 235:24354-24369, 2024.

Abstract

This paper addresses a policy optimization task with the conditional value-at-risk (CVaR) objective. We introduce the predictive CVaR policy gradient, a novel approach that seamlessly integrates risk-neutral policy gradient algorithms with minimal modifications. Our method incorporates a reweighting strategy in gradient calculation – individual cost terms are reweighted in proportion to their predicted contribution to the objective. These weights can be easily estimated through a separate learning procedure. We provide theoretical and empirical analyses, demonstrating the validity and effectiveness of our proposed method.

Cite this Paper


BibTeX
@InProceedings{pmlr-v235-kim24x, title = {Risk-Sensitive Policy Optimization via Predictive {CV}a{R} Policy Gradient}, author = {Kim, Ju-Hyun and Min, Seungki}, booktitle = {Proceedings of the 41st International Conference on Machine Learning}, pages = {24354--24369}, year = {2024}, editor = {Salakhutdinov, Ruslan and Kolter, Zico and Heller, Katherine and Weller, Adrian and Oliver, Nuria and Scarlett, Jonathan and Berkenkamp, Felix}, volume = {235}, series = {Proceedings of Machine Learning Research}, month = {21--27 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v235/main/assets/kim24x/kim24x.pdf}, url = {https://proceedings.mlr.press/v235/kim24x.html}, abstract = {This paper addresses a policy optimization task with the conditional value-at-risk (CVaR) objective. We introduce the predictive CVaR policy gradient, a novel approach that seamlessly integrates risk-neutral policy gradient algorithms with minimal modifications. Our method incorporates a reweighting strategy in gradient calculation – individual cost terms are reweighted in proportion to their predicted contribution to the objective. These weights can be easily estimated through a separate learning procedure. We provide theoretical and empirical analyses, demonstrating the validity and effectiveness of our proposed method.} }
Endnote
%0 Conference Paper %T Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient %A Ju-Hyun Kim %A Seungki Min %B Proceedings of the 41st International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2024 %E Ruslan Salakhutdinov %E Zico Kolter %E Katherine Heller %E Adrian Weller %E Nuria Oliver %E Jonathan Scarlett %E Felix Berkenkamp %F pmlr-v235-kim24x %I PMLR %P 24354--24369 %U https://proceedings.mlr.press/v235/kim24x.html %V 235 %X This paper addresses a policy optimization task with the conditional value-at-risk (CVaR) objective. We introduce the predictive CVaR policy gradient, a novel approach that seamlessly integrates risk-neutral policy gradient algorithms with minimal modifications. Our method incorporates a reweighting strategy in gradient calculation – individual cost terms are reweighted in proportion to their predicted contribution to the objective. These weights can be easily estimated through a separate learning procedure. We provide theoretical and empirical analyses, demonstrating the validity and effectiveness of our proposed method.
APA
Kim, J. & Min, S.. (2024). Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient. Proceedings of the 41st International Conference on Machine Learning, in Proceedings of Machine Learning Research 235:24354-24369 Available from https://proceedings.mlr.press/v235/kim24x.html.

Related Material