Adaptive Step-size Policy Gradients with Average Reward Metric

Takamitsu Matsubara; Tetsuro Morimura; Jun Morimoto

Adaptive Step-size Policy Gradients with Average Reward Metric

Takamitsu Matsubara, Tetsuro Morimura, Jun Morimoto

Proceedings of 2nd Asian Conference on Machine Learning, PMLR 13:285-298, 2010.

Abstract

In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of changes on average reward with respect to the policy parameters. Since the metric directly measures the effects on the average reward, the resulting policy gradient learning employs an adaptive step-size strategy that can effectively avoid falling into a stagnant phase from the complex structure of the average reward function with respect to the policy parameters. Two algorithms are derived with the metric as variants of ordinary and natural policy gradients. Their properties are compared with previously proposed policy gradients through numerical experiments with simple, but non-trivial, 3-state Markov Decision Processes (MDPs). We also show performance improvements over previous methods in on-line learning with more challenging 20-state MDPs.

Cite this Paper

BibTeX

@InProceedings{pmlr-v13-matsubara10a,
  title = 	 {Adaptive Step-size Policy Gradients with Average Reward Metric},
  author = 	 {Matsubara, Takamitsu and Morimura, Tetsuro and Morimoto, Jun},
  booktitle = 	 {Proceedings of 2nd Asian Conference on Machine Learning},
  pages = 	 {285--298},
  year = 	 {2010},
  editor = 	 {Sugiyama, Masashi and Yang, Qiang},
  volume = 	 {13},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Tokyo, Japan},
  month = 	 {08--10 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v13/matsubara10a/matsubara10a.pdf},
  url = 	 {https://proceedings.mlr.press/v13/matsubara10a.html},
  abstract = 	 {In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of changes on average reward with respect to the policy parameters. Since the metric directly measures the effects on the average reward, the resulting policy gradient learning employs an adaptive step-size strategy that can effectively avoid falling into a stagnant phase from the complex structure of the average reward function with respect to the policy parameters. Two algorithms are derived with the metric as variants of ordinary and natural policy gradients. Their properties are compared with previously proposed policy gradients through numerical experiments with simple, but non-trivial, 3-state Markov Decision Processes (MDPs). We also show performance improvements over previous methods in on-line learning with more challenging 20-state MDPs.}
}

Endnote

%0 Conference Paper
%T Adaptive Step-size Policy Gradients with Average Reward Metric
%A Takamitsu Matsubara
%A Tetsuro Morimura
%A Jun Morimoto
%B Proceedings of 2nd Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2010
%E Masashi Sugiyama
%E Qiang Yang	
%F pmlr-v13-matsubara10a
%I PMLR
%P 285--298
%U https://proceedings.mlr.press/v13/matsubara10a.html
%V 13
%X In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of changes on average reward with respect to the policy parameters. Since the metric directly measures the effects on the average reward, the resulting policy gradient learning employs an adaptive step-size strategy that can effectively avoid falling into a stagnant phase from the complex structure of the average reward function with respect to the policy parameters. Two algorithms are derived with the metric as variants of ordinary and natural policy gradients. Their properties are compared with previously proposed policy gradients through numerical experiments with simple, but non-trivial, 3-state Markov Decision Processes (MDPs). We also show performance improvements over previous methods in on-line learning with more challenging 20-state MDPs.

RIS

TY  - CPAPER
TI  - Adaptive Step-size Policy Gradients with Average Reward Metric
AU  - Takamitsu Matsubara
AU  - Tetsuro Morimura
AU  - Jun Morimoto
BT  - Proceedings of 2nd Asian Conference on Machine Learning
DA  - 2010/10/31
ED  - Masashi Sugiyama
ED  - Qiang Yang	
ID  - pmlr-v13-matsubara10a
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 13
SP  - 285
EP  - 298
L1  - http://proceedings.mlr.press/v13/matsubara10a/matsubara10a.pdf
UR  - https://proceedings.mlr.press/v13/matsubara10a.html
AB  - In this paper, we propose a novel adaptive step-size approach for policy gradient reinforcement learning. A new metric is defined for policy gradients that measures the effect of changes on average reward with respect to the policy parameters. Since the metric directly measures the effects on the average reward, the resulting policy gradient learning employs an adaptive step-size strategy that can effectively avoid falling into a stagnant phase from the complex structure of the average reward function with respect to the policy parameters. Two algorithms are derived with the metric as variants of ordinary and natural policy gradients. Their properties are compared with previously proposed policy gradients through numerical experiments with simple, but non-trivial, 3-state Markov Decision Processes (MDPs). We also show performance improvements over previous methods in on-line learning with more challenging 20-state MDPs.
ER  -

APA

Matsubara, T., Morimura, T. & Morimoto, J.. (2010). Adaptive Step-size Policy Gradients with Average Reward Metric. Proceedings of 2nd Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 13:285-298 Available from https://proceedings.mlr.press/v13/matsubara10a.html.

Related Material

Download PDF