Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments


Gábor Bartók, Dávid Pál, Csaba Szepesvári ;
Proceedings of the 24th Annual Conference on Learning Theory, PMLR 19:133-154, 2011.


In a partial monitoring game, the learner repeatedly chooses an action, theenvironment responds with an outcome, and then the learner suffers a loss andreceives a feedback signal, both of which are fixed functions of the action andthe outcome. The goal of the learner is to minimize his regret, which is thedifference between his total cumulative loss and the total loss of the bestfixed action in hindsight.Assuming that the outcomes are generated in an i.i.d. fashion from an arbitrary andunknown probability distribution, we characterize the minimax regret of anypartial monitoring game with finitely many actions andoutcomes. It turns out that the minimax regret of any such game is either zero,\widetildeΘ(\sqrtT), Θ(T^2/3), or Θ(T). We provide a computationally efficient learningalgorithm that achieves the minimax regret within logarithmic factor for any game.

Related Material