Evaluating the Performance of Reinforcement Learning Algorithms

Scott Jordan, Yash Chandak, Daniel Cohen, Mengxue Zhang, Philip Thomas
Proceedings of the 37th International Conference on Machine Learning, PMLR 119:4962-4973, 2020.

Abstract

Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent reproducibility analyses have shown that reported performance results are often inconsistent and difficult to replicate. In this work, we argue that the inconsistency of performance stems from the use of flawed evaluation metrics. Taking a step towards ensuring that reported results are consistent, we propose a new comprehensive evaluation methodology for reinforcement learning algorithms that produces reliable measurements of performance both on a single environment and when aggregated across environments. We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks.

Cite this Paper


BibTeX
@InProceedings{pmlr-v119-jordan20a, title = {Evaluating the Performance of Reinforcement Learning Algorithms}, author = {Jordan, Scott and Chandak, Yash and Cohen, Daniel and Zhang, Mengxue and Thomas, Philip}, booktitle = {Proceedings of the 37th International Conference on Machine Learning}, pages = {4962--4973}, year = {2020}, editor = {III, Hal Daumé and Singh, Aarti}, volume = {119}, series = {Proceedings of Machine Learning Research}, month = {13--18 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v119/jordan20a/jordan20a.pdf}, url = {https://proceedings.mlr.press/v119/jordan20a.html}, abstract = {Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent reproducibility analyses have shown that reported performance results are often inconsistent and difficult to replicate. In this work, we argue that the inconsistency of performance stems from the use of flawed evaluation metrics. Taking a step towards ensuring that reported results are consistent, we propose a new comprehensive evaluation methodology for reinforcement learning algorithms that produces reliable measurements of performance both on a single environment and when aggregated across environments. We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks.} }
Endnote
%0 Conference Paper %T Evaluating the Performance of Reinforcement Learning Algorithms %A Scott Jordan %A Yash Chandak %A Daniel Cohen %A Mengxue Zhang %A Philip Thomas %B Proceedings of the 37th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2020 %E Hal Daumé III %E Aarti Singh %F pmlr-v119-jordan20a %I PMLR %P 4962--4973 %U https://proceedings.mlr.press/v119/jordan20a.html %V 119 %X Performance evaluations are critical for quantifying algorithmic advances in reinforcement learning. Recent reproducibility analyses have shown that reported performance results are often inconsistent and difficult to replicate. In this work, we argue that the inconsistency of performance stems from the use of flawed evaluation metrics. Taking a step towards ensuring that reported results are consistent, we propose a new comprehensive evaluation methodology for reinforcement learning algorithms that produces reliable measurements of performance both on a single environment and when aggregated across environments. We demonstrate this method by evaluating a broad class of reinforcement learning algorithms on standard benchmark tasks.
APA
Jordan, S., Chandak, Y., Cohen, D., Zhang, M. & Thomas, P.. (2020). Evaluating the Performance of Reinforcement Learning Algorithms. Proceedings of the 37th International Conference on Machine Learning, in Proceedings of Machine Learning Research 119:4962-4973 Available from https://proceedings.mlr.press/v119/jordan20a.html.

Related Material