ZigZag: A New Approach to Adaptive Online Learning

[edit]

Dylan J. Foster, Alexander Rakhlin, Karthik Sridharan ;
Proceedings of the 2017 Conference on Learning Theory, PMLR 65:876-924, 2017.

Abstract

We develop a new family of algorithms for the online learning setting with regret against any data sequence bounded by the empirical Rademacher complexity of that sequence. To develop a general theory of when this type of adaptive regret bound is achievable we establish a connection to the theory of decoupling inequalities for martingales in Banach spaces. When the hypothesis class is a set of linear functions bounded in some norm, such a regret bound is achievable if and only if the norm satisfies certain decoupling inequalities for martingales. Donald Burkholder’s celebrated geometric characterization of decoupling inequalities (1984) states that such an inequality holds if and only if there exists a special function called a Burkholder function satisfying certain restricted concavity properties. Our online learning algorithms are efficient in terms of queries to this function. We realize our general theory by giving new efficient and adaptive algorithms for classes including $\ell_p$ norms, group norms, and reproducing kernel Hilbert spaces. The empirical Rademacher complexity regret bound implies — when used in the i.i.d. setting — a data-dependent complexity bound for excess risk after online-to-batch conversion. To showcase the power of the empirical Rademacher complexity regret bound, we derive improved rates for a supervised learning generalization of the online learning with low rank experts task and for the online matrix prediction task. In addition to obtaining tight data-dependent regret bounds, our algorithms enjoy improved efficiency over previous techniques based on Rademacher complexity, automatically work in the infinite horizon setting, and adapt to scale. To obtain such adaptive methods, we introduce novel machinery, and the resulting algorithms are not based on the standard tools of online convex optimization. We conclude with a number of open problems and new directions, both algorithmic and information-theoretic.

Related Material