Better Rates for Any Adversarial Deterministic MDP

Ofer Dekel, Elad Hazan
Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):675-683, 2013.

Abstract

We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of O(T^2/3) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T^3/4). Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology.

Cite this Paper


BibTeX
@InProceedings{pmlr-v28-dekel13, title = {Better Rates for Any Adversarial Deterministic MDP}, author = {Dekel, Ofer and Hazan, Elad}, booktitle = {Proceedings of the 30th International Conference on Machine Learning}, pages = {675--683}, year = {2013}, editor = {Dasgupta, Sanjoy and McAllester, David}, volume = {28}, number = {3}, series = {Proceedings of Machine Learning Research}, address = {Atlanta, Georgia, USA}, month = {17--19 Jun}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v28/dekel13.pdf}, url = {https://proceedings.mlr.press/v28/dekel13.html}, abstract = {We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of O(T^2/3) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T^3/4). Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology. } }
Endnote
%0 Conference Paper %T Better Rates for Any Adversarial Deterministic MDP %A Ofer Dekel %A Elad Hazan %B Proceedings of the 30th International Conference on Machine Learning %C Proceedings of Machine Learning Research %D 2013 %E Sanjoy Dasgupta %E David McAllester %F pmlr-v28-dekel13 %I PMLR %P 675--683 %U https://proceedings.mlr.press/v28/dekel13.html %V 28 %N 3 %X We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of O(T^2/3) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T^3/4). Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology.
RIS
TY - CPAPER TI - Better Rates for Any Adversarial Deterministic MDP AU - Ofer Dekel AU - Elad Hazan BT - Proceedings of the 30th International Conference on Machine Learning DA - 2013/05/26 ED - Sanjoy Dasgupta ED - David McAllester ID - pmlr-v28-dekel13 PB - PMLR DP - Proceedings of Machine Learning Research VL - 28 IS - 3 SP - 675 EP - 683 L1 - http://proceedings.mlr.press/v28/dekel13.pdf UR - https://proceedings.mlr.press/v28/dekel13.html AB - We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of O(T^2/3) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T^3/4). Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology. ER -
APA
Dekel, O. & Hazan, E.. (2013). Better Rates for Any Adversarial Deterministic MDP. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):675-683 Available from https://proceedings.mlr.press/v28/dekel13.html.

Related Material