Non-stochastic Bandits With Evolving Observations

Yogev Bar-On, Yishay Mansour
Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:204-227, 2025.

Abstract

We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v272-bar-on25a, title = {Non-stochastic Bandits With Evolving Observations}, author = {Bar-On, Yogev and Mansour, Yishay}, booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory}, pages = {204--227}, year = {2025}, editor = {Kamath, Gautam and Loh, Po-Ling}, volume = {272}, series = {Proceedings of Machine Learning Research}, month = {24--27 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v272/main/assets/bar-on25a/bar-on25a.pdf}, url = {https://proceedings.mlr.press/v272/bar-on25a.html}, abstract = {We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.} }
Endnote
%0 Conference Paper %T Non-stochastic Bandits With Evolving Observations %A Yogev Bar-On %A Yishay Mansour %B Proceedings of The 36th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Gautam Kamath %E Po-Ling Loh %F pmlr-v272-bar-on25a %I PMLR %P 204--227 %U https://proceedings.mlr.press/v272/bar-on25a.html %V 272 %X We introduce a novel online learning framework that unifies and generalizes pre-established models, such as delayed and corrupted feedback, to encompass adversarial environments where action feedback evolves over time. In this setting, the observed loss is arbitrary and may not correlate with the true loss incurred, with each round updating previous observations adversarially. We propose regret minimization algorithms for both the full-information and bandit settings, with regret bounds quantified by the average feedback accuracy relative to the true loss. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.
APA
Bar-On, Y. & Mansour, Y.. (2025). Non-stochastic Bandits With Evolving Observations. Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:204-227 Available from https://proceedings.mlr.press/v272/bar-on25a.html.

Related Material