The complexity of non-stationary reinforcement learning

Binghui Peng, Christos Papadimitriou
Proceedings of The 35th International Conference on Algorithmic Learning Theory, PMLR 237:972-996, 2024.

Abstract

The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just adding a new state-action pair is considerably easier to implement.

Cite this Paper


BibTeX
@InProceedings{pmlr-v237-peng24a, title = {The complexity of non-stationary reinforcement learning}, author = {Peng, Binghui and Papadimitriou, Christos}, booktitle = {Proceedings of The 35th International Conference on Algorithmic Learning Theory}, pages = {972--996}, year = {2024}, editor = {Vernade, Claire and Hsu, Daniel}, volume = {237}, series = {Proceedings of Machine Learning Research}, month = {25--28 Feb}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v237/peng24a/peng24a.pdf}, url = {https://proceedings.mlr.press/v237/peng24a.html}, abstract = {The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just adding a new state-action pair is considerably easier to implement. } }
Endnote
%0 Conference Paper %T The complexity of non-stationary reinforcement learning %A Binghui Peng %A Christos Papadimitriou %B Proceedings of The 35th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Claire Vernade %E Daniel Hsu %F pmlr-v237-peng24a %I PMLR %P 972--996 %U https://proceedings.mlr.press/v237/peng24a.html %V 237 %X The problem of continual learning in the domain of reinforcement learning, often called non-stationary reinforcement learning, has been identified as an important challenge to the application of reinforcement learning. We prove a worst-case complexity result, which we believe captures this challenge: Modifying the probabilities or the reward of a single state-action pair in a reinforcement learning problem requires an amount of time almost as large as the number of states in order to keep the value function up to date, unless the strong exponential time hypothesis (SETH) is false; SETH is a widely accepted strengthening of the P $\neq$ NP conjecture. Recall that the number of states in current applications of reinforcement learning is typically astronomical. In contrast, we show that just adding a new state-action pair is considerably easier to implement.
APA
Peng, B. & Papadimitriou, C.. (2024). The complexity of non-stationary reinforcement learning. Proceedings of The 35th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 237:972-996 Available from https://proceedings.mlr.press/v237/peng24a.html.

Related Material