Unknown mixing times in apprenticeship and reinforcement learning

Tom Zahavy, Alon Cohen, Haim Kaplan, Yishay Mansour
Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR 124:430-439, 2020.

Abstract

We derive and analyze learning algorithms for apprenticeship learning, policy evaluation and policy gradient for average reward criteria. Existing algorithms explicitly require an upper bound on the mixing time. In contrast, we build on ideas from Markov chain theory and derive sampling algorithms that do not require such an upper bound. For these algorithms, we provide theoretical bounds on their sample-complexity and running time.

Cite this Paper


BibTeX
@InProceedings{pmlr-v124-zahavy20a, title = {Unknown mixing times in apprenticeship and reinforcement learning}, author = {Zahavy, Tom and Cohen, Alon and Kaplan, Haim and Mansour, Yishay}, booktitle = {Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI)}, pages = {430--439}, year = {2020}, editor = {Jonas Peters and David Sontag}, volume = {124}, series = {Proceedings of Machine Learning Research}, month = {03--06 Aug}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v124/zahavy20a/zahavy20a.pdf}, url = { http://proceedings.mlr.press/v124/zahavy20a.html }, abstract = {We derive and analyze learning algorithms for apprenticeship learning, policy evaluation and policy gradient for average reward criteria. Existing algorithms explicitly require an upper bound on the mixing time. In contrast, we build on ideas from Markov chain theory and derive sampling algorithms that do not require such an upper bound. For these algorithms, we provide theoretical bounds on their sample-complexity and running time.} }
Endnote
%0 Conference Paper %T Unknown mixing times in apprenticeship and reinforcement learning %A Tom Zahavy %A Alon Cohen %A Haim Kaplan %A Yishay Mansour %B Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI) %C Proceedings of Machine Learning Research %D 2020 %E Jonas Peters %E David Sontag %F pmlr-v124-zahavy20a %I PMLR %P 430--439 %U http://proceedings.mlr.press/v124/zahavy20a.html %V 124 %X We derive and analyze learning algorithms for apprenticeship learning, policy evaluation and policy gradient for average reward criteria. Existing algorithms explicitly require an upper bound on the mixing time. In contrast, we build on ideas from Markov chain theory and derive sampling algorithms that do not require such an upper bound. For these algorithms, we provide theoretical bounds on their sample-complexity and running time.
APA
Zahavy, T., Cohen, A., Kaplan, H. & Mansour, Y.. (2020). Unknown mixing times in apprenticeship and reinforcement learning. Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), in Proceedings of Machine Learning Research 124:430-439 Available from http://proceedings.mlr.press/v124/zahavy20a.html .

Related Material