Reliable Active Apprenticeship Learning

Steve Hanneke, Liu Yang, Gongju Wang, Yulun Song
Proceedings of The 36th International Conference on Algorithmic Learning Theory, PMLR 272:512-538, 2025.

Abstract

We propose a learning problem, which we call reliable active apprenticeship learning, for which we define a learning algorithm providing optimal performance guarantees, which we further show are sharply characterized by the eluder dimension of a policy class. In this setting, a learning algorithm is tasked with behaving optimally in an unknown environment given by a Markov decision process. The correct actions are specified by an unknown optimal policy in a given policy class. The learner initially does not know the optimal policy, but it has the ability to query an expert, which returns the optimal action for the current state. A learner is said to be reliable if, whenever it takes an action without querying the expert, its action is guaranteed to be optimal. We are then interested in designing a reliable learner which does not query the expert too often. We propose a reliable learning algorithm which provably makes the minimal possible number of queries, which we show is precisely characterized by the eluder dimension of the policy class. We further extend this to allow for imperfect experts, modeled as an oracle with noisy responses. We study two variants of this, inspired by noise conditions from classification: namely, Massart noise and Tsybakov noise. In both cases, we propose a reliable learning strategy which achieves a nearly-minimal number of queries, and prove upper and lower bounds on the optimal number of queries in terms of the noise conditions and the eluder dimension of the policy class.

Cite this Paper


BibTeX
@InProceedings{pmlr-v272-hanneke25a, title = {Reliable Active Apprenticeship Learning}, author = {Hanneke, Steve and Yang, Liu and Wang, Gongju and Song, Yulun}, booktitle = {Proceedings of The 36th International Conference on Algorithmic Learning Theory}, pages = {512--538}, year = {2025}, editor = {Kamath, Gautam and Loh, Po-Ling}, volume = {272}, series = {Proceedings of Machine Learning Research}, month = {24--27 Feb}, publisher = {PMLR}, pdf = {https://raw.githubusercontent.com/mlresearch/v272/main/assets/hanneke25a/hanneke25a.pdf}, url = {https://proceedings.mlr.press/v272/hanneke25a.html}, abstract = {We propose a learning problem, which we call reliable active apprenticeship learning, for which we define a learning algorithm providing optimal performance guarantees, which we further show are sharply characterized by the eluder dimension of a policy class. In this setting, a learning algorithm is tasked with behaving optimally in an unknown environment given by a Markov decision process. The correct actions are specified by an unknown optimal policy in a given policy class. The learner initially does not know the optimal policy, but it has the ability to query an expert, which returns the optimal action for the current state. A learner is said to be reliable if, whenever it takes an action without querying the expert, its action is guaranteed to be optimal. We are then interested in designing a reliable learner which does not query the expert too often. We propose a reliable learning algorithm which provably makes the minimal possible number of queries, which we show is precisely characterized by the eluder dimension of the policy class. We further extend this to allow for imperfect experts, modeled as an oracle with noisy responses. We study two variants of this, inspired by noise conditions from classification: namely, Massart noise and Tsybakov noise. In both cases, we propose a reliable learning strategy which achieves a nearly-minimal number of queries, and prove upper and lower bounds on the optimal number of queries in terms of the noise conditions and the eluder dimension of the policy class.} }
Endnote
%0 Conference Paper %T Reliable Active Apprenticeship Learning %A Steve Hanneke %A Liu Yang %A Gongju Wang %A Yulun Song %B Proceedings of The 36th International Conference on Algorithmic Learning Theory %C Proceedings of Machine Learning Research %D 2025 %E Gautam Kamath %E Po-Ling Loh %F pmlr-v272-hanneke25a %I PMLR %P 512--538 %U https://proceedings.mlr.press/v272/hanneke25a.html %V 272 %X We propose a learning problem, which we call reliable active apprenticeship learning, for which we define a learning algorithm providing optimal performance guarantees, which we further show are sharply characterized by the eluder dimension of a policy class. In this setting, a learning algorithm is tasked with behaving optimally in an unknown environment given by a Markov decision process. The correct actions are specified by an unknown optimal policy in a given policy class. The learner initially does not know the optimal policy, but it has the ability to query an expert, which returns the optimal action for the current state. A learner is said to be reliable if, whenever it takes an action without querying the expert, its action is guaranteed to be optimal. We are then interested in designing a reliable learner which does not query the expert too often. We propose a reliable learning algorithm which provably makes the minimal possible number of queries, which we show is precisely characterized by the eluder dimension of the policy class. We further extend this to allow for imperfect experts, modeled as an oracle with noisy responses. We study two variants of this, inspired by noise conditions from classification: namely, Massart noise and Tsybakov noise. In both cases, we propose a reliable learning strategy which achieves a nearly-minimal number of queries, and prove upper and lower bounds on the optimal number of queries in terms of the noise conditions and the eluder dimension of the policy class.
APA
Hanneke, S., Yang, L., Wang, G. & Song, Y.. (2025). Reliable Active Apprenticeship Learning. Proceedings of The 36th International Conference on Algorithmic Learning Theory, in Proceedings of Machine Learning Research 272:512-538 Available from https://proceedings.mlr.press/v272/hanneke25a.html.

Related Material