Identification of Blackwell Optimal Policies for Deterministic MDPs

Victor Boone, Bruno Gaujal
Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, PMLR 206:7392-7424, 2023.

Abstract

This paper investigates a new learning problem, the identification of Blackwell optimal policies on deterministic MDPs (DMDPs): A learner has to return a Blackwell optimal policy with fixed confidence using a minimal number of queries. First, we characterize the maximal set of DMDPs for which the identification is possible. Then, we focus on the analysis of algorithms based on product-form confidence regions. We minimize the number of queries by efficiently visiting the state-action pairs with respect to the shape of confidence sets. Furthermore, these confidence sets are themselves optimized to achieve better performances. The performances of our methods compare to the lower bounds up to a factor $n^2$ in the worst case – where $n$ is the number of states, and constant in certain classes of DMDPs.

Cite this Paper


BibTeX
@InProceedings{pmlr-v206-boone23a, title = {Identification of Blackwell Optimal Policies for Deterministic MDPs}, author = {Boone, Victor and Gaujal, Bruno}, booktitle = {Proceedings of The 26th International Conference on Artificial Intelligence and Statistics}, pages = {7392--7424}, year = {2023}, editor = {Ruiz, Francisco and Dy, Jennifer and van de Meent, Jan-Willem}, volume = {206}, series = {Proceedings of Machine Learning Research}, month = {25--27 Apr}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v206/boone23a/boone23a.pdf}, url = {https://proceedings.mlr.press/v206/boone23a.html}, abstract = {This paper investigates a new learning problem, the identification of Blackwell optimal policies on deterministic MDPs (DMDPs): A learner has to return a Blackwell optimal policy with fixed confidence using a minimal number of queries. First, we characterize the maximal set of DMDPs for which the identification is possible. Then, we focus on the analysis of algorithms based on product-form confidence regions. We minimize the number of queries by efficiently visiting the state-action pairs with respect to the shape of confidence sets. Furthermore, these confidence sets are themselves optimized to achieve better performances. The performances of our methods compare to the lower bounds up to a factor $n^2$ in the worst case – where $n$ is the number of states, and constant in certain classes of DMDPs.} }
Endnote
%0 Conference Paper %T Identification of Blackwell Optimal Policies for Deterministic MDPs %A Victor Boone %A Bruno Gaujal %B Proceedings of The 26th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2023 %E Francisco Ruiz %E Jennifer Dy %E Jan-Willem van de Meent %F pmlr-v206-boone23a %I PMLR %P 7392--7424 %U https://proceedings.mlr.press/v206/boone23a.html %V 206 %X This paper investigates a new learning problem, the identification of Blackwell optimal policies on deterministic MDPs (DMDPs): A learner has to return a Blackwell optimal policy with fixed confidence using a minimal number of queries. First, we characterize the maximal set of DMDPs for which the identification is possible. Then, we focus on the analysis of algorithms based on product-form confidence regions. We minimize the number of queries by efficiently visiting the state-action pairs with respect to the shape of confidence sets. Furthermore, these confidence sets are themselves optimized to achieve better performances. The performances of our methods compare to the lower bounds up to a factor $n^2$ in the worst case – where $n$ is the number of states, and constant in certain classes of DMDPs.
APA
Boone, V. & Gaujal, B.. (2023). Identification of Blackwell Optimal Policies for Deterministic MDPs. Proceedings of The 26th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 206:7392-7424 Available from https://proceedings.mlr.press/v206/boone23a.html.

Related Material