Reinforcement learning with value advice

Mayank Daswani; Peter Sunehag; Marcus Hutter

Reinforcement learning with value advice

Mayank Daswani, Peter Sunehag, Marcus Hutter

Proceedings of the Sixth Asian Conference on Machine Learning, PMLR 39:299-314, 2015.

Abstract

The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.

Cite this Paper

BibTeX


@InProceedings{pmlr-v39-daswani14,
  title = 	 {Reinforcement learning with value advice},
  author = 	 {Daswani, Mayank and Sunehag, Peter and Hutter, Marcus},
  booktitle = 	 {Proceedings of the Sixth Asian Conference on Machine Learning},
  pages = 	 {299--314},
  year = 	 {2015},
  editor = 	 {Phung, Dinh and Li, Hang},
  volume = 	 {39},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Nha Trang City, Vietnam},
  month = 	 {26--28 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v39/daswani14.pdf},
  url = 	 {https://proceedings.mlr.press/v39/daswani14.html},
  abstract = 	 {The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.}
}

Endnote

%0 Conference Paper
%T Reinforcement learning with value advice
%A Mayank Daswani
%A Peter Sunehag
%A Marcus Hutter
%B Proceedings of the Sixth Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Dinh Phung
%E Hang Li	
%F pmlr-v39-daswani14
%I PMLR
%P 299--314
%U https://proceedings.mlr.press/v39/daswani14.html
%V 39
%X The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.

RIS


TY  - CPAPER
TI  - Reinforcement learning with value advice
AU  - Mayank Daswani
AU  - Peter Sunehag
AU  - Marcus Hutter
BT  - Proceedings of the Sixth Asian Conference on Machine Learning
DA  - 2015/02/16
ED  - Dinh Phung
ED  - Hang Li	
ID  - pmlr-v39-daswani14
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 39
SP  - 299
EP  - 314
L1  - http://proceedings.mlr.press/v39/daswani14.pdf
UR  - https://proceedings.mlr.press/v39/daswani14.html
AB  - The problem we consider in this paper is reinforcement learning with value advice. In this setting, the agent is given limited access to an oracle that can tell it the expected return (value) of any state-action pair with respect to the optimal policy. The agent must use this value to learn an explicit policy that performs well in the environment. We provide an algorithm called RLAdvice, based on the imitation learning algorithm DAgger. We illustrate the effectiveness of this method in the Arcade Learning Environment on three different games, using value estimates from UCT as advice.
ER  -

APA


Daswani, M., Sunehag, P. & Hutter, M.. (2015). Reinforcement learning with value advice. Proceedings of the Sixth Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 39:299-314 Available from https://proceedings.mlr.press/v39/daswani14.html.

Related Material

Download PDF