Minimax Model Learning

Cameron Voloshin; Nan Jiang; Yisong Yue

Minimax Model Learning

Cameron Voloshin, Nan Jiang, Yisong Yue

Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:1612-1620, 2021.

Abstract

We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.

Cite this Paper

BibTeX


@InProceedings{pmlr-v130-voloshin21a,
  title = 	 { Minimax Model Learning },
  author =       {Voloshin, Cameron and Jiang, Nan and Yue, Yisong},
  booktitle = 	 {Proceedings of The 24th International Conference on Artificial Intelligence and Statistics},
  pages = 	 {1612--1620},
  year = 	 {2021},
  editor = 	 {Banerjee, Arindam and Fukumizu, Kenji},
  volume = 	 {130},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {13--15 Apr},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v130/voloshin21a/voloshin21a.pdf},
  url = 	 {https://proceedings.mlr.press/v130/voloshin21a.html},
  abstract = 	 { We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO. }
}

Endnote

%0 Conference Paper
%T  Minimax Model Learning 
%A Cameron Voloshin
%A Nan Jiang
%A Yisong Yue
%B Proceedings of The 24th International Conference on Artificial Intelligence and Statistics
%C Proceedings of Machine Learning Research
%D 2021
%E Arindam Banerjee
%E Kenji Fukumizu	
%F pmlr-v130-voloshin21a
%I PMLR
%P 1612--1620
%U https://proceedings.mlr.press/v130/voloshin21a.html
%V 130
%X  We present a novel off-policy loss function for learning a transition model in model-based reinforcement learning. Notably, our loss is derived from the off-policy policy evaluation objective with an emphasis on correcting distribution shift. Compared to previous model-based techniques, our approach allows for greater robustness under model misspecification or distribution shift induced by learning/evaluating policies that are distinct from the data-generating policy. We provide a theoretical analysis and show empirical improvements over existing model-based off-policy evaluation methods. We provide further analysis showing our loss can be used for off-policy optimization (OPO) and demonstrate its integration with more recent improvements in OPO.

APA


Voloshin, C., Jiang, N. & Yue, Y.. (2021).  Minimax Model Learning . Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 130:1612-1620 Available from https://proceedings.mlr.press/v130/voloshin21a.html.

Minimax Model Learning

Abstract

Cite this Paper

Related Material