Online Learning with Set-valued Feedback

Vinod Raman, Unique Subedi, Ambuj Tewari
Proceedings of Thirty Seventh Conference on Learning Theory, PMLR 247:4381-4412, 2024.

Abstract

We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension characterizes online learnability in the agnostic setting and tightly quantifies the minimax regret. Finally, we use our results to establish bounds on the minimax regret for three practical learning settings: online multilabel ranking, online multilabel classification, and real-valued prediction with interval-valued response.

Cite this Paper


BibTeX
@InProceedings{pmlr-v247-raman24b, title = {Online Learning with Set-valued Feedback}, author = {Raman, Vinod and Subedi, Unique and Tewari, Ambuj}, booktitle = {Proceedings of Thirty Seventh Conference on Learning Theory}, pages = {4381--4412}, year = {2024}, editor = {Agrawal, Shipra and Roth, Aaron}, volume = {247}, series = {Proceedings of Machine Learning Research}, month = {30 Jun--03 Jul}, publisher = {PMLR}, pdf = {https://proceedings.mlr.press/v247/raman24b/raman24b.pdf}, url = {https://proceedings.mlr.press/v247/raman24b.html}, abstract = {We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension characterizes online learnability in the agnostic setting and tightly quantifies the minimax regret. Finally, we use our results to establish bounds on the minimax regret for three practical learning settings: online multilabel ranking, online multilabel classification, and real-valued prediction with interval-valued response.} }
Endnote
%0 Conference Paper %T Online Learning with Set-valued Feedback %A Vinod Raman %A Unique Subedi %A Ambuj Tewari %B Proceedings of Thirty Seventh Conference on Learning Theory %C Proceedings of Machine Learning Research %D 2024 %E Shipra Agrawal %E Aaron Roth %F pmlr-v247-raman24b %I PMLR %P 4381--4412 %U https://proceedings.mlr.press/v247/raman24b.html %V 247 %X We study a variant of online multiclass classification where the learner predicts a single label but receives a \textit{set of labels} as feedback. In this model, the learner is penalized for not outputting a label contained in the revealed set. We show that unlike online multiclass learning with single-label feedback, deterministic and randomized online learnability are \textit{not equivalent} even in the realizable setting with set-valued feedback. Accordingly, we give two new combinatorial dimensions, named the Set Littlestone and Measure Shattering dimension, that tightly characterize deterministic and randomized online learnability respectively in the realizable setting. In addition, we show that the Measure Shattering dimension characterizes online learnability in the agnostic setting and tightly quantifies the minimax regret. Finally, we use our results to establish bounds on the minimax regret for three practical learning settings: online multilabel ranking, online multilabel classification, and real-valued prediction with interval-valued response.
APA
Raman, V., Subedi, U. & Tewari, A.. (2024). Online Learning with Set-valued Feedback. Proceedings of Thirty Seventh Conference on Learning Theory, in Proceedings of Machine Learning Research 247:4381-4412 Available from https://proceedings.mlr.press/v247/raman24b.html.

Related Material