Improving Policy Gradient Estimates with Influence Information

Jervis Pinto; Alan Fern; Tim Bauer; Martin Erwig

Improving Policy Gradient Estimates with Influence Information

Jervis Pinto, Alan Fern, Tim Bauer, Martin Erwig

Proceedings of the Asian Conference on Machine Learning, PMLR 20:1-18, 2011.

Abstract

In reinforcement learning (RL) it is often possible to obtain sound, but incomplete, information about influences and independencies among problem variables and rewards, even when an exact domain model is unknown. For example, such information can be computed based on a partial, qualitative domain model, or via domain-specific analysis techniques. While, intuitively, such information appears useful for RL, there are no algorithms that incorporate it in a sound way. In this work, we describe how to leverage such information for improving the estimation of policy gradients, which can be used to speedup gradient-based RL. We prove general conditions under which our estimator is unbiased and show that it will typically have reduced variance compared to standard unbiased gradient estimates. We evaluate the approach in the domain of Adaptation-Based Programming where RL is used to optimize the performance of programs and independence information can be computed via standard program analysis techniques. Incorporating independence information produces a large speedup in learning on a variety of adaptive programs.

Cite this Paper

BibTeX


@InProceedings{pmlr-v20-pinto11,
  title = 	 {Improving Policy Gradient Estimates with Influence Information},
  author = 	 {Pinto, Jervis and Fern, Alan and Bauer, Tim and Erwig, Martin},
  booktitle = 	 {Proceedings of the Asian Conference on Machine Learning},
  pages = 	 {1--18},
  year = 	 {2011},
  editor = 	 {Hsu, Chun-Nan and Lee, Wee Sun},
  volume = 	 {20},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {South Garden Hotels and Resorts, Taoyuan, Taiwain},
  month = 	 {14--15 Nov},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v20/pinto11/pinto11.pdf},
  url = 	 {https://proceedings.mlr.press/v20/pinto11.html},
  abstract = 	 {In reinforcement learning (RL) it is often possible to obtain sound, but incomplete, information about influences and independencies among problem variables and rewards, even when an exact domain model is unknown. For example, such information can be computed based on a partial, qualitative domain model, or via domain-specific analysis techniques. While, intuitively, such information appears useful for RL, there are no algorithms that incorporate it in a sound way. In this work, we describe how to leverage such information for improving the estimation of policy gradients, which can be used to speedup gradient-based RL. We prove general conditions under which our estimator is unbiased and show that it will typically have reduced variance compared to standard unbiased gradient estimates. We evaluate the approach in the domain of Adaptation-Based Programming where RL is used to optimize the performance of programs and independence information can be computed via standard program analysis techniques. Incorporating independence information produces a large speedup in learning on a variety of adaptive programs.}
}

Endnote

%0 Conference Paper
%T Improving Policy Gradient Estimates with Influence Information
%A Jervis Pinto
%A Alan Fern
%A Tim Bauer
%A Martin Erwig
%B Proceedings of the Asian Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2011
%E Chun-Nan Hsu
%E Wee Sun Lee	
%F pmlr-v20-pinto11
%I PMLR
%P 1--18
%U https://proceedings.mlr.press/v20/pinto11.html
%V 20
%X In reinforcement learning (RL) it is often possible to obtain sound, but incomplete, information about influences and independencies among problem variables and rewards, even when an exact domain model is unknown. For example, such information can be computed based on a partial, qualitative domain model, or via domain-specific analysis techniques. While, intuitively, such information appears useful for RL, there are no algorithms that incorporate it in a sound way. In this work, we describe how to leverage such information for improving the estimation of policy gradients, which can be used to speedup gradient-based RL. We prove general conditions under which our estimator is unbiased and show that it will typically have reduced variance compared to standard unbiased gradient estimates. We evaluate the approach in the domain of Adaptation-Based Programming where RL is used to optimize the performance of programs and independence information can be computed via standard program analysis techniques. Incorporating independence information produces a large speedup in learning on a variety of adaptive programs.

RIS


TY  - CPAPER
TI  - Improving Policy Gradient Estimates with Influence Information
AU  - Jervis Pinto
AU  - Alan Fern
AU  - Tim Bauer
AU  - Martin Erwig
BT  - Proceedings of the Asian Conference on Machine Learning
DA  - 2011/11/17
ED  - Chun-Nan Hsu
ED  - Wee Sun Lee	
ID  - pmlr-v20-pinto11
PB  - PMLR
DP  - Proceedings of Machine Learning Research
VL  - 20
SP  - 1
EP  - 18
L1  - http://proceedings.mlr.press/v20/pinto11/pinto11.pdf
UR  - https://proceedings.mlr.press/v20/pinto11.html
AB  - In reinforcement learning (RL) it is often possible to obtain sound, but incomplete, information about influences and independencies among problem variables and rewards, even when an exact domain model is unknown. For example, such information can be computed based on a partial, qualitative domain model, or via domain-specific analysis techniques. While, intuitively, such information appears useful for RL, there are no algorithms that incorporate it in a sound way. In this work, we describe how to leverage such information for improving the estimation of policy gradients, which can be used to speedup gradient-based RL. We prove general conditions under which our estimator is unbiased and show that it will typically have reduced variance compared to standard unbiased gradient estimates. We evaluate the approach in the domain of Adaptation-Based Programming where RL is used to optimize the performance of programs and independence information can be computed via standard program analysis techniques. Incorporating independence information produces a large speedup in learning on a variety of adaptive programs.
ER  -

APA


Pinto, J., Fern, A., Bauer, T. & Erwig, M.. (2011). Improving Policy Gradient Estimates with Influence Information. Proceedings of the Asian Conference on Machine Learning, in Proceedings of Machine Learning Research 20:1-18 Available from https://proceedings.mlr.press/v20/pinto11.html.

Related Material

Download PDF