Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning

K. Lakshmanan; Ronald Ortner; Daniil Ryabko

Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning

K. Lakshmanan, Ronald Ortner, Daniil Ryabko

Proceedings of the 32nd International Conference on Machine Learning, PMLR 37:524-532, 2015.

Abstract

We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T^3/4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T^2/3) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.

Cite this Paper

BibTeX


@InProceedings{pmlr-v37-lakshmanan15,
  title = 	 {Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning},
  author = 	 {Lakshmanan, K. and Ortner, Ronald and Ryabko, Daniil},
  booktitle = 	 {Proceedings of the 32nd International Conference on Machine Learning},
  pages = 	 {524--532},
  year = 	 {2015},
  editor = 	 {Bach, Francis and Blei, David},
  volume = 	 {37},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Lille, France},
  month = 	 {07--09 Jul},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v37/lakshmanan15.pdf},
  url = 	 {https://proceedings.mlr.press/v37/lakshmanan15.html},
  abstract = 	 {We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T^3/4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T^2/3) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.}
}

Endnote

%0 Conference Paper
%T Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning
%A K. Lakshmanan
%A Ronald Ortner
%A Daniil Ryabko
%B Proceedings of the 32nd International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2015
%E Francis Bach
%E David Blei	
%F pmlr-v37-lakshmanan15
%I PMLR
%P 524--532
%U https://proceedings.mlr.press/v37/lakshmanan15.html
%V 37
%X We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T^3/4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T^2/3) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.

RIS


TY  - CPAPER
TI  - Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning
AU  - K. Lakshmanan
AU  - Ronald Ortner
AU  - Daniil Ryabko
BT  - Proceedings of the 32nd International Conference on Machine Learning
DA  - 2015/06/01
ED  - Francis Bach
ED  - David Blei	
ID  - pmlr-v37-lakshmanan15
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 37
SP  - 524
EP  - 532
L1  - http://proceedings.mlr.press/v37/lakshmanan15.pdf
UR  - https://proceedings.mlr.press/v37/lakshmanan15.html
AB  - We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T^3/4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T^2/3) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.
ER  -

APA


Lakshmanan, K., Ortner, R. & Ryabko, D.. (2015). Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning. Proceedings of the 32nd International Conference on Machine Learning, in Proceedings of Machine Learning Research 37:524-532 Available from https://proceedings.mlr.press/v37/lakshmanan15.html.

Related Material

Download PDF