Improving gradient estimation by incorporating sensor data

Gregory Lawrence, Stuart Russell
Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, PMLR R6:375-382, 2008.

Abstract

An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimator with lower variance.

Cite this Paper


BibTeX
@InProceedings{pmlr-vR6-lawrence08a, title = {Improving gradient estimation by incorporating sensor data}, author = {Lawrence, Gregory and Russell, Stuart}, booktitle = {Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence}, pages = {375--382}, year = {2008}, editor = {McAllester, David A. and Myllymäki, Petri}, volume = {R6}, series = {Proceedings of Machine Learning Research}, month = {09--12 Jul}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/r6/main/assets/lawrence08a/lawrence08a.pdf}, url = {https://proceedings.mlr.press/r6/lawrence08a.html}, abstract = {An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimator with lower variance.}, note = {Reissued by PMLR on 09 October 2024.} }
Endnote
%0 Conference Paper %T Improving gradient estimation by incorporating sensor data %A Gregory Lawrence %A Stuart Russell %B Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence %C Proceedings of Machine Learning Research %D 2008 %E David A. McAllester %E Petri Myllymäki %F pmlr-vR6-lawrence08a %I PMLR %P 375--382 %U https://proceedings.mlr.press/r6/lawrence08a.html %V R6 %X An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimator with lower variance. %Z Reissued by PMLR on 09 October 2024.
APA
Lawrence, G. & Russell, S.. (2008). Improving gradient estimation by incorporating sensor data. Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, in Proceedings of Machine Learning Research R6:375-382 Available from https://proceedings.mlr.press/r6/lawrence08a.html. Reissued by PMLR on 09 October 2024.

Related Material