A Fast and Reliable Policy Improvement Algorithm
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1338-1346, 2016.
We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.