Learning Intrinsic Rewards as a Bi-Level Optimization Problem

Bradly Stadie, Lunjun Zhang, Jimmy Ba
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:111-120, 2020.

Abstract

We reinterpret the problem of finding intrinsic rewards in reinforcement learning (RL) as a bilevel optimization problem. Using this interpretation, we can make use of recent advancements in the hyperparameter optimization literature, mainly from Self-Tuning Networks (STN), to learn intrinsic rewards. To facilitate our methods, we introduces a new general conditioning layer: Conditional Layer Normalization (CLN). We evaluate our method on several continuous control benchmarks in the Mujoco physics simulator. On all of these benchmarks, the intrinsic rewards learned on the fly lead to higher final rewards.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-stadie20a, title = {Learning Intrinsic Rewards as a Bi-Level Optimization Problem}, author = {Stadie, Bradly and Zhang, Lunjun and Ba, Jimmy}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {111--120}, year = {2020}, editor = {Jonas Peters and David Sontag}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/stadie20a/stadie20a.pdf}, url = { http://proceedings.mlr.press/v124/stadie20a.html }, abstract = {We reinterpret the problem of finding intrinsic rewards in reinforcement learning (RL) as a bilevel optimization problem. Using this interpretation, we can make use of recent advancements in the hyperparameter optimization literature, mainly from Self-Tuning Networks (STN), to learn intrinsic rewards. To facilitate our methods, we introduces a new general conditioning layer: Conditional Layer Normalization (CLN). We evaluate our method on several continuous control benchmarks in the Mujoco physics simulator. On all of these benchmarks, the intrinsic rewards learned on the fly lead to higher final rewards.} }
Endnote
%0 Conference Paper %T Learning Intrinsic Rewards as a Bi-Level Optimization Problem %A Bradly Stadie %A Lunjun Zhang %A Jimmy Ba %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-stadie20a %I PMLR %P 111--120 %U http://proceedings.mlr.press/v124/stadie20a.html %V 124 %X We reinterpret the problem of finding intrinsic rewards in reinforcement learning (RL) as a bilevel optimization problem. Using this interpretation, we can make use of recent advancements in the hyperparameter optimization literature, mainly from Self-Tuning Networks (STN), to learn intrinsic rewards. To facilitate our methods, we introduces a new general conditioning layer: Conditional Layer Normalization (CLN). We evaluate our method on several continuous control benchmarks in the Mujoco physics simulator. On all of these benchmarks, the intrinsic rewards learned on the fly lead to higher final rewards.
APA
Stadie, B., Zhang, L. & Ba, J.. (2020). Learning Intrinsic Rewards as a Bi-Level Optimization Problem. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:111-120 Available from http://proceedings.mlr.press/v124/stadie20a.html .

Related Material