[edit]
Online learning with kernel losses
Proceedings of the 36th International Conference on Machine Learning, PMLR 97:971-980, 2019.
Abstract
We present a generalization of the adversarial linear bandits framework, where the underlying losses are kernel functions (with an associated reproducing kernel Hilbert space) rather than linear functions. We study a version of the exponential weights algorithm and bound its regret in this setting. Under conditions on the eigen-decay of the kernel we provide a sharp characterization of the regret for this algorithm. When we have polynomial eigen-decay (μj≤O(j−β)), we find that the regret is bounded by Rn≤O(nβ/2(β−1)). While under the assumption of exponential eigen-decay (μj≤O(e−βj)) we get an even tighter bound on the regret Rn≤˜O(n1/2). When the eigen-decay is polynomial we also show a non-matching minimax lower bound on the regret of Rn≥Ω(n(β+1)/2β) and a lower bound of Rn≥Ω(n1/2) when the decay in the eigen-values is exponentially fast. We also study the full information setting when the underlying losses are kernel functions and present an adapted exponential weights algorithm and a conditional gradient descent algorithm.