Co-training for Policy Learning

Jialin Song, Ravi Lanka, Yisong Yue, Masahiro Ono
Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, PMLR 115:1191-1201, 2020.

Abstract

We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.

Cite this Paper


BibTeX
@InProceedings{pmlr-v115-song20b, title = {Co-training for Policy Learning}, author = {Song, Jialin and Lanka, Ravi and Yue, Yisong and Ono, Masahiro}, booktitle = {Proceedings of The 35th Uncertainty in Artificial Intelligence Conference}, pages = {1191--1201}, year = {2020}, editor = {Adams, Ryan P. and Gogate, Vibhav}, volume = {115}, series = {Proceedings of Machine Learning Research}, month = {22--25 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v115/song20b/song20b.pdf}, url = {https://proceedings.mlr.press/v115/song20b.html}, abstract = {We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.} }
Endnote
%0 Conference Paper %T Co-training for Policy Learning %A Jialin Song %A Ravi Lanka %A Yisong Yue %A Masahiro Ono %B Proceedings of The 35th Uncertainty in Artificial Intelligence Conference %C Proceedings of Machine Learning Research %D 2020 %E Ryan P. Adams %E Vibhav Gogate %F pmlr-v115-song20b %I PMLR %P 1191--1201 %U https://proceedings.mlr.press/v115/song20b.html %V 115 %X We study the problem of learning sequential decision-making policies in settings with multiple state-action representations. Such settings naturally arise in many domains, such as planning (e.g., multiple integer programming formulations) and various combinatorial optimization problems (e.g., those with both integer programming and graph-based formulations). Inspired by the classical co-training framework for classification, we study the problem of co-training for policy learning. We present sufficient conditions under which learning from two views can improve upon learning from a single view alone. Motivated by these theoretical insights, we present a meta-algorithm for co-training for sequential decision making. Our framework is compatible with both reinforcement learning and imitation learning. We validate the effectiveness of our approach across a wide range of tasks, including discrete/continuous control and combinatorial optimization.
APA
Song, J., Lanka, R., Yue, Y. & Ono, M.. (2020). Co-training for Policy Learning. Proceedings of The 35th Uncertainty in Artificial Intelligence Conference, in Proceedings of Machine Learning Research 115:1191-1201 Available from https://proceedings.mlr.press/v115/song20b.html.

Related Material