Online Learning with Noisy Side Observations

Tomáš Kocák, Gergely Neu, Michal Valko
Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, PMLR 51:1186-1194, 2016.

Abstract

We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of O(\sqrt(α^* T) after T rounds, where α^* is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of alpha^*. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor & Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds.

Cite this Paper


BibTeX
@InProceedings{pmlr-v51-kocak16, title = {Online Learning with Noisy Side Observations}, author = {Kocák, Tomáš and Neu, Gergely and Valko, Michal}, booktitle = {Proceedings of the 19th International Conference on Artificial Intelligence and Statistics}, pages = {1186--1194}, year = {2016}, editor = {Gretton, Arthur and Robert, Christian C.}, volume = {51}, series = {Proceedings of Machine Learning Research}, address = {Cadiz, Spain}, month = {09--11 May}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v51/kocak16.pdf}, url = {https://proceedings.mlr.press/v51/kocak16.html}, abstract = {We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of O(\sqrt(α^* T) after T rounds, where α^* is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of alpha^*. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor & Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds.} }
Endnote
%0 Conference Paper %T Online Learning with Noisy Side Observations %A Tomáš Kocák %A Gergely Neu %A Michal Valko %B Proceedings of the 19th International Conference on Artificial Intelligence and Statistics %C Proceedings of Machine Learning Research %D 2016 %E Arthur Gretton %E Christian C. Robert %F pmlr-v51-kocak16 %I PMLR %P 1186--1194 %U https://proceedings.mlr.press/v51/kocak16.html %V 51 %X We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of O(\sqrt(α^* T) after T rounds, where α^* is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of alpha^*. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor & Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds.
RIS
TY - CPAPER TI - Online Learning with Noisy Side Observations AU - Tomáš Kocák AU - Gergely Neu AU - Michal Valko BT - Proceedings of the 19th International Conference on Artificial Intelligence and Statistics DA - 2016/05/02 ED - Arthur Gretton ED - Christian C. Robert ID - pmlr-v51-kocak16 PB - PMLR DP - Proceedings of Machine Learning Research VL - 51 SP - 1186 EP - 1194 L1 - http://proceedings.mlr.press/v51/kocak16.pdf UR - https://proceedings.mlr.press/v51/kocak16.html AB - We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of O(\sqrt(α^* T) after T rounds, where α^* is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of alpha^*. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor & Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds. ER -
APA
Kocák, T., Neu, G. & Valko, M.. (2016). Online Learning with Noisy Side Observations. Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, in Proceedings of Machine Learning Research 51:1186-1194 Available from https://proceedings.mlr.press/v51/kocak16.html.

Related Material