Bandit Learnability can be Undecidable

Steve Hanneke, Liu Yang
Proceedings of Thirty Sixth Conference on Learning Theory, PMLR 195:5813-5849, 2023.

Abstract

We initiate a general investigation into structured bandits. Specifically, for an abstract space $X$, we suppose a true reward function $f$ resides in a known, but arbitrary, function class $F$. The algorithm may then pull a number of arms $x$ (i.e., query for the value $f(x)$), and thereby attempts to identify an arm $\hat{x}$ of near-maximum reward: $f(\hat{x}) \geq \sup_x f(x) - \epsilon$. While special cases of this problem are well understood in the literature, our interest is in the possibility of a fully-general theory of bandit learnability, analogous to the PAC model for classification: that is, a theory which precisely characterizes which function classes $F$ admit a learning algorithm guaranteed to identify a near-optimal arm within a bounded number of pulls.Our main result in this regard is an illuminating impossibility result. Namely, there exist well-defined function classes $F$ such that bandit learnability is \emph{undecidable} within ZFC set theory. While such undecidability results have previously been shown for a certain abstractly-defined learning problem known as EMX, this is the first example of a natural or commonly-encountered learning problem (i.e., bandits) for which learnability can be provably undecidable. Our proof is based on establishing a (rather-sophisticated) equivalence between certain subfamilies of EMX learning problems and corresponding constructed bandit problems.Despite this general undecidability result, we also establish new general results in special cases. Specifically, we characterize the optimal query complexity in the special case of binary-valued reward functions in terms of a combinatorial complexity measure related to the teaching dimension. We also present an extension to general bounded real-valued rewards, though in this case the upper bound is not always optimal. We instantiate the new complexity measures for several important families of function classes $F$.

Cite this Paper


BibTeX
@InProceedings{pmlr-v195-hanneke23d, title = {Bandit Learnability can be Undecidable}, author = {Hanneke, Steve and Yang, Liu}, booktitle = {Proceedings of Thirty Sixth Conference on Learning Theory}, pages = {5813--5849}, year = {2023}, editor = {Neu, Gergely and Rosasco, Lorenzo}, volume = {195}, series = {Proceedings of Machine Learning Research}, month = {12--15 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v195/hanneke23d/hanneke23d.pdf}, url = {https://proceedings.mlr.press/v195/hanneke23d.html}, abstract = {We initiate a general investigation into structured bandits. Specifically, for an abstract space $X$, we suppose a true reward function $f$ resides in a known, but arbitrary, function class $F$. The algorithm may then pull a number of arms $x$ (i.e., query for the value $f(x)$), and thereby attempts to identify an arm $\hat{x}$ of near-maximum reward: $f(\hat{x}) \geq \sup_x f(x) - \epsilon$. While special cases of this problem are well understood in the literature, our interest is in the possibility of a fully-general theory of bandit learnability, analogous to the PAC model for classification: that is, a theory which precisely characterizes which function classes $F$ admit a learning algorithm guaranteed to identify a near-optimal arm within a bounded number of pulls.Our main result in this regard is an illuminating impossibility result. Namely, there exist well-defined function classes $F$ such that bandit learnability is \emph{undecidable} within ZFC set theory. While such undecidability results have previously been shown for a certain abstractly-defined learning problem known as EMX, this is the first example of a natural or commonly-encountered learning problem (i.e., bandits) for which learnability can be provably undecidable. Our proof is based on establishing a (rather-sophisticated) equivalence between certain subfamilies of EMX learning problems and corresponding constructed bandit problems.Despite this general undecidability result, we also establish new general results in special cases. Specifically, we characterize the optimal query complexity in the special case of binary-valued reward functions in terms of a combinatorial complexity measure related to the teaching dimension. We also present an extension to general bounded real-valued rewards, though in this case the upper bound is not always optimal. We instantiate the new complexity measures for several important families of function classes $F$.} }
Endnote
%0 Conference Paper %T Bandit Learnability can be Undecidable %A Steve Hanneke %A Liu Yang %B Proceedings of Thirty Sixth Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2023 %E Gergely Neu %E Lorenzo Rosasco %F pmlr-v195-hanneke23d %I PMLR %P 5813--5849 %U https://proceedings.mlr.press/v195/hanneke23d.html %V 195 %X We initiate a general investigation into structured bandits. Specifically, for an abstract space $X$, we suppose a true reward function $f$ resides in a known, but arbitrary, function class $F$. The algorithm may then pull a number of arms $x$ (i.e., query for the value $f(x)$), and thereby attempts to identify an arm $\hat{x}$ of near-maximum reward: $f(\hat{x}) \geq \sup_x f(x) - \epsilon$. While special cases of this problem are well understood in the literature, our interest is in the possibility of a fully-general theory of bandit learnability, analogous to the PAC model for classification: that is, a theory which precisely characterizes which function classes $F$ admit a learning algorithm guaranteed to identify a near-optimal arm within a bounded number of pulls.Our main result in this regard is an illuminating impossibility result. Namely, there exist well-defined function classes $F$ such that bandit learnability is \emph{undecidable} within ZFC set theory. While such undecidability results have previously been shown for a certain abstractly-defined learning problem known as EMX, this is the first example of a natural or commonly-encountered learning problem (i.e., bandits) for which learnability can be provably undecidable. Our proof is based on establishing a (rather-sophisticated) equivalence between certain subfamilies of EMX learning problems and corresponding constructed bandit problems.Despite this general undecidability result, we also establish new general results in special cases. Specifically, we characterize the optimal query complexity in the special case of binary-valued reward functions in terms of a combinatorial complexity measure related to the teaching dimension. We also present an extension to general bounded real-valued rewards, though in this case the upper bound is not always optimal. We instantiate the new complexity measures for several important families of function classes $F$.
APA
Hanneke, S. & Yang, L.. (2023). Bandit Learnability can be Undecidable. Proceedings of Thirty Sixth Conference on Learning Theory, in Proceedings of Machine Learning Research 195:5813-5849 Available from https://proceedings.mlr.press/v195/hanneke23d.html.

Related Material