Self-Paced Context Evaluation for Contextual Reinforcement Learning

Theresa Eimer, André Biedenkapp, Frank Hutter, Marius Lindauer
Proceedings of the 38th International Conference on Machine Learning, PMLR 139:2948-2958, 2021.

Abstract

Reinforcement learning (RL) has made a lot of advances for solving a single problem in a given environment; but learning policies that generalize to unseen variations of a problem remains challenging. To improve sample efficiency for learning on such instances of a problem domain, we present Self-Paced Context Evaluation (SPaCE). Based on self-paced learning, SPaCE automatically generates instance curricula online with little computational overhead. To this end, SPaCE leverages information contained in state values during training to accelerate and improve training performance as well as generalization capabilities to new \tasks from the same problem domain. Nevertheless, SPaCE is independent of the problem domain at hand and can be applied on top of any RL agent with state-value function approximation. We demonstrate SPaCE’s ability to speed up learning of different value-based RL agents on two environments, showing better generalization capabilities and up to 10x faster learning compared to naive approaches such as round robin or SPDRL, as the closest state-of-the-art approach.

Cite this Paper


BibTeX
@InProceedings{pmlr-v139-eimer21a, title = {Self-Paced Context Evaluation for Contextual Reinforcement Learning}, author = {Eimer, Theresa and Biedenkapp, Andr{\'e} and Hutter, Frank and Lindauer, Marius}, booktitle = {Proceedings of the 38th International Conference on Machine Learning}, pages = {2948--2958}, year = {2021}, editor = {Meila, Marina and Zhang, Tong}, volume = {139}, series = {Proceedings of Machine Learning Research}, month = {18--24 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v139/eimer21a/eimer21a.pdf}, url = {http://proceedings.mlr.press/v139/eimer21a.html}, abstract = {Reinforcement learning (RL) has made a lot of advances for solving a single problem in a given environment; but learning policies that generalize to unseen variations of a problem remains challenging. To improve sample efficiency for learning on such instances of a problem domain, we present Self-Paced Context Evaluation (SPaCE). Based on self-paced learning, SPaCE automatically generates instance curricula online with little computational overhead. To this end, SPaCE leverages information contained in state values during training to accelerate and improve training performance as well as generalization capabilities to new \tasks from the same problem domain. Nevertheless, SPaCE is independent of the problem domain at hand and can be applied on top of any RL agent with state-value function approximation. We demonstrate SPaCE’s ability to speed up learning of different value-based RL agents on two environments, showing better generalization capabilities and up to 10x faster learning compared to naive approaches such as round robin or SPDRL, as the closest state-of-the-art approach.} }
Endnote
%0 Conference Paper %T Self-Paced Context Evaluation for Contextual Reinforcement Learning %A Theresa Eimer %A André Biedenkapp %A Frank Hutter %A Marius Lindauer %B Proceedings of the 38th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2021 %E Marina Meila %E Tong Zhang %F pmlr-v139-eimer21a %I PMLR %P 2948--2958 %U http://proceedings.mlr.press/v139/eimer21a.html %V 139 %X Reinforcement learning (RL) has made a lot of advances for solving a single problem in a given environment; but learning policies that generalize to unseen variations of a problem remains challenging. To improve sample efficiency for learning on such instances of a problem domain, we present Self-Paced Context Evaluation (SPaCE). Based on self-paced learning, SPaCE automatically generates instance curricula online with little computational overhead. To this end, SPaCE leverages information contained in state values during training to accelerate and improve training performance as well as generalization capabilities to new \tasks from the same problem domain. Nevertheless, SPaCE is independent of the problem domain at hand and can be applied on top of any RL agent with state-value function approximation. We demonstrate SPaCE’s ability to speed up learning of different value-based RL agents on two environments, showing better generalization capabilities and up to 10x faster learning compared to naive approaches such as round robin or SPDRL, as the closest state-of-the-art approach.
APA
Eimer, T., Biedenkapp, A., Hutter, F. & Lindauer, M.. (2021). Self-Paced Context Evaluation for Contextual Reinforcement Learning. Proceedings of the 38th International Conference on Machine Learning, in Proceedings of Machine Learning Research 139:2948-2958 Available from http://proceedings.mlr.press/v139/eimer21a.html.

Related Material