Minimax Model Learning

Cameron Voloshin, Nan Jiang, Yisong Yue
Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1612-1620, 2021.

Abstract

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.

Cite this Paper


BibTeX
@InProceedings{pmlr-v130-voloshin21a, title = { Minimax Model Learning }, author = {Voloshin, Cameron and Jiang, Nan and Yue, Yisong}, booktitle = {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics}, pages = {1612--1620}, year = {2021}, editor = {Banerjee, Arindam and Fukumizu, Kenji}, volume = {130}, series = {Proceedings of Machine Learning Research}, month = {13--15 Apr}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v130/voloshin21a/voloshin21a.pdf}, url = {https://proceedings.mlr.press/v130/voloshin21a.html}, abstract = { We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO. } }
Endnote
%0 Conference Paper %T Minimax Model Learning %A Cameron Voloshin %A Nan Jiang %A Yisong Yue %B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2021 %E Arindam Banerjee %E Kenji Fukumizu %F pmlr-v130-voloshin21a %I PMLR %P 1612--1620 %U https://proceedings.mlr.press/v130/voloshin21a.html %V 130 %X We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.
APA
Voloshin, C., Jiang, N. & Yue, Y.. (2021). Minimax Model Learning . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1612-1620 Available from https://proceedings.mlr.press/v130/voloshin21a.html.

Related Material