[edit]
Identification of Blackwell Optimal Policies for Deterministic MDPs
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7392-7424, 2023.
Abstract
This paper investigates a new learning problem, the identification of Blackwell optimal policies on deterministic MDPs (DMDPs): A learner has to return a Blackwell optimal policy with fixed confidence using a minimal number of queries. First, we characterize the maximal set of DMDPs for which the identification is possible. Then, we focus on the analysis of algorithms based on product-form confidence regions. We minimize the number of queries by efficiently visiting the state-action pairs with respect to the shape of confidence sets. Furthermore, these confidence sets are themselves optimized to achieve better performances. The performances of our methods compare to the lower bounds up to a factor $n^2$ in the worst case – where $n$ is the number of states, and constant in certain classes of DMDPs.