Regret Bounds for Robust Online Decision Making

Alexander Appel, Vanessa Kosoy
Proceedings of Thirty Eighth Conference on Learning Theory, PMLR 291:64-146, 2025.

Abstract

We propose a framework which generalizes “decision making with structured observations" from Foster et al. (2023) by allowing \emph{robust} (i.e. multivalued) models. In this framework, each model associates each decision with a \emph{convex set} of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be non-oblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework, which extends the “decision-estimation coefficients" of Foster et al. (2023). Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits (previously studied in Kosoy (2024)) and tabular robust online reinforcement learning (previously studied in Tian et al. (2021)). In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).

Cite this Paper


BibTeX
@InProceedings{pmlr-v291-appel25a, title = {Regret Bounds for Robust Online Decision Making}, author = {Appel, Alexander and Kosoy, Vanessa}, booktitle = {Proceedings of Thirty Eighth Conference on Learning Theory}, pages = {64--146}, year = {2025}, editor = {Haghtalab, Nika and Moitra, Ankur}, volume = {291}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--04 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v291/main/assets/appel25a/appel25a.pdf}, url = {https://proceedings.mlr.press/v291/appel25a.html}, abstract = {We propose a framework which generalizes “decision making with structured observations" from Foster et al. (2023) by allowing \emph{robust} (i.e. multivalued) models. In this framework, each model associates each decision with a \emph{convex set} of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be non-oblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework, which extends the “decision-estimation coefficients" of Foster et al. (2023). Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits (previously studied in Kosoy (2024)) and tabular robust online reinforcement learning (previously studied in Tian et al. (2021)). In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).} }
Endnote
%0 Conference Paper %T Regret Bounds for Robust Online Decision Making %A Alexander Appel %A Vanessa Kosoy %B Proceedings of Thirty Eighth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Nika Haghtalab %E Ankur Moitra %F pmlr-v291-appel25a %I PMLR %P 64--146 %U https://proceedings.mlr.press/v291/appel25a.html %V 291 %X We propose a framework which generalizes “decision making with structured observations" from Foster et al. (2023) by allowing \emph{robust} (i.e. multivalued) models. In this framework, each model associates each decision with a \emph{convex set} of probability distributions over outcomes. Nature can choose distributions out of this set in an arbitrary (adversarial) manner, that can be non-oblivious and depend on past history. The resulting framework offers much greater generality than classical bandits and reinforcement learning, since the realizability assumption becomes much weaker and more realistic. We then derive a theory of regret bounds for this framework, which extends the “decision-estimation coefficients" of Foster et al. (2023). Although our lower and upper bounds are not tight, they are sufficient to fully characterize power-law learnability. We demonstrate this theory in two special cases: robust linear bandits (previously studied in Kosoy (2024)) and tabular robust online reinforcement learning (previously studied in Tian et al. (2021)). In both cases, we derive regret bounds that improve state-of-the-art (except that we do not address computational efficiency).
APA
Appel, A. & Kosoy, V.. (2025). Regret Bounds for Robust Online Decision Making. Proceedings of Thirty Eighth Conference on Learning Theory, in Proceedings of Machine Learning Research 291:64-146 Available from https://proceedings.mlr.press/v291/appel25a.html.

Related Material