Boosted Fitted QIteration
[edit]
Proceedings of the 34th International Conference on Machine Learning, PMLR 70:34343443, 2017.
Abstract
This paper is about the study of BFQI, an Approximated Value Iteration (AVI) algorithm that exploits a boosting procedure to estimate the actionvalue function in reinforcement learning problems. BFQI is an iterative offline algorithm that, given a dataset of transitions, builds an approximation of the optimal actionvalue function by summing the approximations of the Bellman residuals across all iterations. The advantage of such approach w.r.t. to other AVI methods is twofold: (1) while keeping the same function space at each iteration, BFQI can represent more complex functions by considering an additive model; (2) since the Bellman residual decreases as the optimal value function is approached, regression problems become easier as iterations proceed. We study BFQI both theoretically, providing also a finitesample error upper bound for it, and empirically, by comparing its performance to the one of FQI in different domains and using different regression techniques.
Related Material


