Boosted Fitted Q-Iteration

Samuele Tosatto; Matteo Pirotta; Carlo D’Eramo; Marcello Restelli

Boosted Fitted Q-Iteration

Samuele Tosatto, Matteo Pirotta, Carlo D’Eramo, Marcello Restelli

Proceedings of the 34th International Conference on Machine Learning, PMLR 70:3434-3443, 2017.

Abstract

This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the action-value function in reinforcement learning problems. B-FQI is an iterative off-line algorithm that, given a dataset of transitions, builds an approximation of the optimal action-value function by summing the approximations of the Bellman residuals across all iterations. The advantage of such approach w.r.t. to other AVI methods is twofold: (1) while keeping the same function space at each iteration, B-FQI can represent more complex functions by considering an additive model; (2) since the Bellman residual decreases as the optimal value function is approached, regression problems become easier as iterations proceed. We study B-FQI both theoretically, providing also a finite-sample error upper bound for it, and empirically, by comparing its performance to the one of FQI in different domains and using different regression techniques.

Cite this Paper

BibTeX

@InProceedings{pmlr-v70-tosatto17a,
  title = 	 {Boosted Fitted Q-Iteration},
  author =       {Samuele Tosatto and Matteo Pirotta and Carlo D'Eramo and Marcello Restelli},
  booktitle = 	 {Proceedings of the 34th International Conference on Machine Learning},
  pages = 	 {3434--3443},
  year = 	 {2017},
  editor = 	 {Precup, Doina and Teh, Yee Whye},
  volume = 	 {70},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {06--11 Aug},
  publisher =    {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v70/tosatto17a/tosatto17a.pdf},
  url = 	 {https://proceedings.mlr.press/v70/tosatto17a.html},
  abstract = 	 {This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the action-value function in reinforcement learning problems. B-FQI is an iterative off-line algorithm that, given a dataset of transitions, builds an approximation of the optimal action-value function by summing the approximations of the Bellman residuals across all iterations. The advantage of such approach w.r.t. to other AVI methods is twofold: (1) while keeping the same function space at each iteration, B-FQI can represent more complex functions by considering an additive model; (2) since the Bellman residual decreases as the optimal value function is approached, regression problems become easier as iterations proceed. We study B-FQI both theoretically, providing also a finite-sample error upper bound for it, and empirically, by comparing its performance to the one of FQI in different domains and using different regression techniques.}
}

Endnote

%0 Conference Paper
%T Boosted Fitted Q-Iteration
%A Samuele Tosatto
%A Matteo Pirotta
%A Carlo D’Eramo
%A Marcello Restelli
%B Proceedings of the 34th International Conference on Machine Learning
%C Proceedings of Machine Learning Research
%D 2017
%E Doina Precup
%E Yee Whye Teh	
%F pmlr-v70-tosatto17a
%I PMLR
%P 3434--3443
%U https://proceedings.mlr.press/v70/tosatto17a.html
%V 70
%X This paper is about the study of B-FQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the action-value function in reinforcement learning problems. B-FQI is an iterative off-line algorithm that, given a dataset of transitions, builds an approximation of the optimal action-value function by summing the approximations of the Bellman residuals across all iterations. The advantage of such approach w.r.t. to other AVI methods is twofold: (1) while keeping the same function space at each iteration, B-FQI can represent more complex functions by considering an additive model; (2) since the Bellman residual decreases as the optimal value function is approached, regression problems become easier as iterations proceed. We study B-FQI both theoretically, providing also a finite-sample error upper bound for it, and empirically, by comparing its performance to the one of FQI in different domains and using different regression techniques.

APA

Tosatto, S., Pirotta, M., D’Eramo, C. & Restelli, M.. (2017). Boosted Fitted Q-Iteration. Proceedings of the 34th International Conference on Machine Learning, in Proceedings of Machine Learning Research 70:3434-3443 Available from https://proceedings.mlr.press/v70/tosatto17a.html.

Boosted Fitted Q-Iteration

Abstract

Cite this Paper

Related Material