Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?

Argyrios Gerogiannis, Yu-Han Huang, Venugopal Veeravalli
Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, PMLR 258:2692-2700, 2025.

Abstract

We study the problem of Non-Stationary Reinforcement Learning (NS-RL) without prior knowledge about the system’s non-stationarity. A state-of-the-art, black-box algorithm, known as MASTER, is considered, with a focus on identifying the conditions under which it can achieve its stated goals. Specifically, we prove that MASTER’s non-stationarity detection mechanism is not triggered for practical choices of horizon, leading to performance akin to a random restarting algorithm. Moreover, we show that the regret bound for MASTER, while being order optimal, stays above the worst-case linear regret until unreasonably large values of the horizon. To validate these observations, MASTER is tested for the special case of piecewise stationary multi-armed bandits, along with methods that employ random restarting, and others that use quickest change detection to restart. A simple, order optimal random restarting algorithm, that has prior knowledge of the non-stationarity is proposed as a baseline. The behavior of the MASTER algorithm is validated in simulations, and it is shown that methods employing quickest change detection are more robust and consistently outperform MASTER and other random restarting approaches.

Cite this Paper


BibTeX
@InProceedings{pmlr-v258-gerogiannis25a, title = {Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?}, author = {Gerogiannis, Argyrios and Huang, Yu-Han and Veeravalli, Venugopal}, booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics}, pages = {2692--2700}, year = {2025}, editor = {Li, Yingzhen and Mandt, Stephan and Agrawal, Shipra and Khan, Emtiyaz}, volume = {258}, series = {Proceedings of Machine Learning Research}, month = {03--05 May}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v258/main/assets/gerogiannis25a/gerogiannis25a.pdf}, url = {https://proceedings.mlr.press/v258/gerogiannis25a.html}, abstract = {We study the problem of Non-Stationary Reinforcement Learning (NS-RL) without prior knowledge about the system’s non-stationarity. A state-of-the-art, black-box algorithm, known as MASTER, is considered, with a focus on identifying the conditions under which it can achieve its stated goals. Specifically, we prove that MASTER’s non-stationarity detection mechanism is not triggered for practical choices of horizon, leading to performance akin to a random restarting algorithm. Moreover, we show that the regret bound for MASTER, while being order optimal, stays above the worst-case linear regret until unreasonably large values of the horizon. To validate these observations, MASTER is tested for the special case of piecewise stationary multi-armed bandits, along with methods that employ random restarting, and others that use quickest change detection to restart. A simple, order optimal random restarting algorithm, that has prior knowledge of the non-stationarity is proposed as a baseline. The behavior of the MASTER algorithm is validated in simulations, and it is shown that methods employing quickest change detection are more robust and consistently outperform MASTER and other random restarting approaches.} }
Endnote
%0 Conference Paper %T Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible? %A Argyrios Gerogiannis %A Yu-Han Huang %A Venugopal Veeravalli %B Proceedings of The 28th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2025 %E Yingzhen Li %E Stephan Mandt %E Shipra Agrawal %E Emtiyaz Khan %F pmlr-v258-gerogiannis25a %I PMLR %P 2692--2700 %U https://proceedings.mlr.press/v258/gerogiannis25a.html %V 258 %X We study the problem of Non-Stationary Reinforcement Learning (NS-RL) without prior knowledge about the system’s non-stationarity. A state-of-the-art, black-box algorithm, known as MASTER, is considered, with a focus on identifying the conditions under which it can achieve its stated goals. Specifically, we prove that MASTER’s non-stationarity detection mechanism is not triggered for practical choices of horizon, leading to performance akin to a random restarting algorithm. Moreover, we show that the regret bound for MASTER, while being order optimal, stays above the worst-case linear regret until unreasonably large values of the horizon. To validate these observations, MASTER is tested for the special case of piecewise stationary multi-armed bandits, along with methods that employ random restarting, and others that use quickest change detection to restart. A simple, order optimal random restarting algorithm, that has prior knowledge of the non-stationarity is proposed as a baseline. The behavior of the MASTER algorithm is validated in simulations, and it is shown that methods employing quickest change detection are more robust and consistently outperform MASTER and other random restarting approaches.
APA
Gerogiannis, A., Huang, Y. & Veeravalli, V.. (2025). Is Prior-Free Black-Box Non-Stationary Reinforcement Learning Feasible?. Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 258:2692-2700 Available from https://proceedings.mlr.press/v258/gerogiannis25a.html.

Related Material