Better Rates for Any Adversarial Deterministic MDP

Ofer Dekel; Elad Hazan

Better Rates for Any Adversarial Deterministic MDP

Ofer Dekel, Elad Hazan

Proceedings of the 30th International Conference on Machine Learning, PMLR 28(3):675-683, 2013.

Abstract

We consider regret minimization in adversarial deterministic Markov Decision Processes (ADMDPs) with bandit feedback. We devise a new algorithm that pushes the state-of-the-art forward in two ways: First, it attains a regret of O(T^2/3) with respect to the best fixed policy in hindsight, whereas the previous best regret bound was O(T^3/4). Second, the algorithm and its analysis are compatible with any feasible ADMDP graph topology, while all previous approaches required additional restrictions on the graph topology.

Cite this Paper

BibTeX

@InProceedings{pmlr-v28-dekel13,
  title = 	 {Better Rates for Any Adversarial Deterministic MDP},
  author = 	 {Dekel, Ofer and Hazan, Elad},
  booktitle = 	 {Proceedings of the 30th International Conference on Machine Learning},
  pages = 	 {675--683},
  year = 	 {2013},
  editor = 	 {Dasgupta, Sanjoy and McAllester, David},
  volume = 	 {28},
  number =       {3},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Atlanta, Georgia, USA},
  month = 	 {17--19 Jun},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v28/dekel13.pdf},
  url = 	 {https://proceedings.mlr.press/v28/dekel13.html},
  abstract = 	 {We consider regret minimization in adversarial deterministic Markov  Decision Processes (ADMDPs) with bandit feedback. We devise a new  algorithm that pushes the state-of-the-art forward in two ways: First,  it attains a regret of O(T^2/3) with respect to the best fixed  policy in hindsight, whereas the previous best regret bound was  O(T^3/4). Second, the algorithm and its analysis are compatible  with any feasible ADMDP graph topology, while all previous approaches  required additional restrictions on the graph topology.  }
}

Endnote

%0 Conference Paper
%T Better Rates for Any Adversarial Deterministic MDP
%A Ofer Dekel
%A Elad Hazan
%B Proceedings of the 30th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2013
%E Sanjoy Dasgupta
%E David McAllester	
%F pmlr-v28-dekel13
%I PMLR
%P 675--683
%U https://proceedings.mlr.press/v28/dekel13.html
%V 28
%N 3
%X We consider regret minimization in adversarial deterministic Markov  Decision Processes (ADMDPs) with bandit feedback. We devise a new  algorithm that pushes the state-of-the-art forward in two ways: First,  it attains a regret of O(T^2/3) with respect to the best fixed  policy in hindsight, whereas the previous best regret bound was  O(T^3/4). Second, the algorithm and its analysis are compatible  with any feasible ADMDP graph topology, while all previous approaches  required additional restrictions on the graph topology.

RIS

TY  - CPAPER
TI  - Better Rates for Any Adversarial Deterministic MDP
AU  - Ofer Dekel
AU  - Elad Hazan
BT  - Proceedings of the 30th International Conference on Machine Learning
DA  - 2013/05/26
ED  - Sanjoy Dasgupta
ED  - David McAllester	
ID  - pmlr-v28-dekel13
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 28
IS  - 3
SP  - 675
EP  - 683
L1  - http://proceedings.mlr.press/v28/dekel13.pdf
UR  - https://proceedings.mlr.press/v28/dekel13.html
AB  - We consider regret minimization in adversarial deterministic Markov  Decision Processes (ADMDPs) with bandit feedback. We devise a new  algorithm that pushes the state-of-the-art forward in two ways: First,  it attains a regret of O(T^2/3) with respect to the best fixed  policy in hindsight, whereas the previous best regret bound was  O(T^3/4). Second, the algorithm and its analysis are compatible  with any feasible ADMDP graph topology, while all previous approaches  required additional restrictions on the graph topology.  
ER  -

APA

Dekel, O. & Hazan, E.. (2013). Better Rates for Any Adversarial Deterministic MDP. Proceedings of the 30th International Conference on Machine Learning, in Proceedings of Machine Learning Research 28(3):675-683 Available from https://proceedings.mlr.press/v28/dekel13.html.

Related Material

Download PDF