[edit]
Best-of-Both-Worlds Algorithms for Partial Monitoring
Proceedings of The 34th International Conference on Algorithmic Learning Theory, PMLR 201:1484-1515, 2023.
Abstract
This study considers the partial monitoring problem with k-actions and d-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is O(m2k4log(T)log(kΠT)/Δmin in the stochastic regime and O(m k^{3/2} \sqrt{T \log(T) \log k_{\Pi}}) in the adversarial regime, where T is the number of rounds, m is the maximum number of distinct observations per action, \Delta_{\min} is the minimum suboptimality gap, and k_{\Pi} is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is O(c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2) in the stochastic regime and O((c_{\mathcal{G}}^2 \log(T) \log(k_{\Pi} T))^{1/3} T^{2/3}) in the adversarial regime, where c_{\mathcal{G}} is a game-dependent constant. We also provide regret bounds for a stochastic regime with adversarial corruptions. Our algorithms are based on the follow-the-regularized-leader framework and are inspired by the approach of exploration by optimization and the adaptive learning rate in the field of online learning with feedback graphs.