Towards an Understanding of Default Policies in Multitask Policy Optimization

Ted Moskovitz, Michael Arbel, Jack Parker-Holder, Aldo Pacchiano
Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, PMLR 151:10661-10686, 2022.

Abstract

Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training more generally capable agents. Here, we take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. Using these results, we then derive a principled RPO algorithm for multitask learning with strong performance guarantees.

Cite this Paper


BibTeX
@InProceedings{pmlr-v151-moskovitz22a, title = { Towards an Understanding of Default Policies in Multitask Policy Optimization }, author = {Moskovitz, Ted and Arbel, Michael and Parker-Holder, Jack and Pacchiano, Aldo}, booktitle = {Proceedings of The 25th International Conference on Artificial Intelligence and Statistics}, pages = {10661--10686}, year = {2022}, editor = {Camps-Valls, Gustau and Ruiz, Francisco J. R. and Valera, Isabel}, volume = {151}, series = {Proceedings of Machine Learning Research}, month = {28--30 Mar}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v151/moskovitz22a/moskovitz22a.pdf}, url = {https://proceedings.mlr.press/v151/moskovitz22a.html}, abstract = { Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training more generally capable agents. Here, we take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. Using these results, we then derive a principled RPO algorithm for multitask learning with strong performance guarantees. } }
Endnote
%0 Conference Paper %T Towards an Understanding of Default Policies in Multitask Policy Optimization %A Ted Moskovitz %A Michael Arbel %A Jack Parker-Holder %A Aldo Pacchiano %B Proceedings of The 25th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2022 %E Gustau Camps-Valls %E Francisco J. R. Ruiz %E Isabel Valera %F pmlr-v151-moskovitz22a %I PMLR %P 10661--10686 %U https://proceedings.mlr.press/v151/moskovitz22a.html %V 151 %X Much of the recent success of deep reinforcement learning has been driven by regularized policy optimization (RPO) algorithms with strong performance across multiple domains. In this family of methods, agents are trained to maximize cumulative reward while penalizing deviation in behavior from some reference, or default policy. In addition to empirical success, there is a strong theoretical foundation for understanding RPO methods applied to single tasks, with connections to natural gradient, trust region, and variational approaches. However, there is limited formal understanding of desirable properties for default policies in the multitask setting, an increasingly important domain as the field shifts towards training more generally capable agents. Here, we take a first step towards filling this gap by formally linking the quality of the default policy to its effect on optimization. Using these results, we then derive a principled RPO algorithm for multitask learning with strong performance guarantees.
APA
Moskovitz, T., Arbel, M., Parker-Holder, J. & Pacchiano, A.. (2022). Towards an Understanding of Default Policies in Multitask Policy Optimization . Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 151:10661-10686 Available from https://proceedings.mlr.press/v151/moskovitz22a.html.

Related Material