Co-training for Policy Learning

Jialin Song; Ravi Lanka; Yisong Yue; Masahiro Ono

Co-training for Policy Learning

Jialin Song, Ravi Lanka, Yisong Yue, Masahiro Ono

Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:1191-1201, 2020.

Abstract

We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.

Cite this Paper

BibTeX

@InProceedings{pmlr-v115-song20b,
  title = 	 {Co-training for Policy Learning},
  author =       {Song, Jialin and Lanka, Ravi and Yue, Yisong and Ono, Masahiro},
  booktitle = 	 {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference},
  pages = 	 {1191--1201},
  year = 	 {2020},
  editor = 	 {Adams, Ryan P. and Gogate, Vibhav},
  volume = 	 {115},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {22--25 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v115/song20b/song20b.pdf},
  url = 	 {https://proceedings.mlr.press/v115/song20b.html},
  abstract = 	 {We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.}
}

Endnote

%0 Conference Paper
%T Co-training for Policy Learning
%A Jialin Song
%A Ravi Lanka
%A Yisong Yue
%A Masahiro Ono
%B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference
%C Proceedings of Machine Learning Research
%D 2020
%E Ryan P. Adams
%E Vibhav Gogate	
%F pmlr-v115-song20b
%I PMLR
%P 1191--1201
%U https://proceedings.mlr.press/v115/song20b.html
%V 115
%X We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.

APA

Song, J., Lanka, R., Yue, Y. & Ono, M.. (2020). Co-training for Policy Learning. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:1191-1201 Available from https://proceedings.mlr.press/v115/song20b.html.

Co-training for Policy Learning

Abstract

Cite this Paper

Related Material