- title: 'Conference on Learning Theory 2015: Preface'
  abstract: 'Preface to COLT 2015'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Grunwald15.html
  PDF: http://proceedings.mlr.press/v40/Grunwald15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Grunwald15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1-3
  id: Grunwald15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1
  lastpage: 3
  published: 2015-06-26 00:00:00 +0000
- title: 'On Consistent Surrogate Risk Minimization and Property Elicitation'
  abstract: 'Surrogate risk minimization is a popular framework for supervised learning; property elicitation is a widely studied area in probability forecasting, machine learning, statistics and economics. In this paper, we connect these two themes by showing that calibrated surrogate losses in supervised learning can essentially be viewed as eliciting or estimating certain properties of the underlying conditional label distribution that are sufficient to construct an optimal classifier under the target loss of interest. Our study helps to shed light on the design of convex calibrated surrogates. We also give a new framework for designing convex calibrated surrogates under low-noise conditions by eliciting properties that allow one to construct ‘coarse’ estimates of the underlying distribution.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Agarwal15.html
  PDF: http://proceedings.mlr.press/v40/Agarwal15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Agarwal15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Arpit
    family: Agarwal
  - given: Shivani
    family: Agarwal
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 4-22
  id: Agarwal15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 4
  lastpage: 22
  published: 2015-06-26 00:00:00 +0000
- title: 'Online Learning with Feedback Graphs: Beyond Bandits'
  abstract: 'We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced T-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: \emphstrongly observable graphs, \emphweakly observable graphs, and \emphunobservable graphs. We prove that the first class induces learning problems with \widetildeΘ(α^1/2 T^1/2) minimax regret, where αis the independence number of the underlying graph; the second class induces problems with \widetildeΘ(δ^1/3T^2/3) minimax regret, where δis the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Alon15.html
  PDF: http://proceedings.mlr.press/v40/Alon15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Alon15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Noga
    family: Alon
  - given: Nicolò
    family: Cesa-Bianchi
  - given: Ofer
    family: Dekel
  - given: Tomer
    family: Koren
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 23-35
  id: Alon15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 23
  lastpage: 35
  published: 2015-06-26 00:00:00 +0000
- title: 'Learning Overcomplete Latent Variable Models through Tensor Methods'
  abstract: 'We provide guarantees for learning latent variable models  emphasizing on the overcomplete regime, where the dimensionality of the latent space exceeds  the observed dimensionality.  In particular, we consider multiview mixtures, ICA, and sparse coding models. Our main tool is a new algorithm for tensor decomposition that works in the overcomplete regime. In the semi-supervised setting, we exploit label information to get a rough estimate of the model parameters, and then refine it using the tensor method on unlabeled samples. We establish learning guarantees   when the number of components scales as k=o(d^p/2), where d is the observed dimension, and p is the order of the observed moment employed in the tensor method (usually p=3,4).  In the unsupervised setting, a simple initialization algorithm based on SVD of the tensor slices is proposed, and the guarantees are provided under the stricter condition that k ≤βd (where constant βcan be larger than 1). For the learning applications, we provide tight sample complexity bounds through novel covering arguments. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Anandkumar15.html
  PDF: http://proceedings.mlr.press/v40/Anandkumar15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Anandkumar15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Animashree
    family: Anandkumar
  - given: Rong
    family: Ge
  - given: Majid
    family: Janzamin
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 36-112
  id: Anandkumar15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 36
  lastpage: 112
  published: 2015-06-26 00:00:00 +0000
- title: 'Simple, Efficient, and Neural Algorithms for Sparse Coding'
  abstract: 'Sparse coding is a basic task in many fields including signal processing, neuroscience and machine learning where the goal is to learn a basis that enables a sparse representation of a given set of data, if one exists. Its standard formulation is as a non-convex optimization problem which is solved in practice by heuristics based on alternating minimization.  Recent work has resulted in several algorithms for sparse coding with provable guarantees, but somewhat surprisingly these are outperformed by the simple alternating minimization heuristics. Here we give a general framework for understanding alternating minimization which we leverage to analyze existing heuristics and to design new ones also with provable guarantees. Some of these algorithms seem implementable on simple neural architectures, which was the original motivation of Olshausen and Field in introducing sparse coding. We also give the first efficient algorithm for sparse coding that works almost up to the information theoretic limit for sparse recovery on incoherent dictionaries. All previous algorithms that approached or surpassed this limit run in time exponential in some natural parameter. Finally, our algorithms improve upon the sample complexity of existing approaches. We believe that our analysis framework will have applications in other settings where simple iterative algorithms are used'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Arora15.html
  PDF: http://proceedings.mlr.press/v40/Arora15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Arora15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Sanjeev
    family: Arora
  - given: Rong
    family: Ge
  - given: Tengyu
    family: Ma
  - given: Ankur
    family: Moitra
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 113-149
  id: Arora15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 113
  lastpage: 149
  published: 2015-06-26 00:00:00 +0000
- title: 'Label optimal regret bounds for online local learning'
  abstract: 'We resolve an open question from Christiano (2014b) posed in COLT’14 regarding the optimal dependency of the regret achievable for online local learning on the size of the label set. In this framework, the algorithm is shown a pair of items at each step, chosen from a set of n items. The learner then predicts a label for each item, from a label set of size L and receives a real valued payoff. This is a natural framework which captures many interesting scenarios such as online gambling and online max cut. Christiano (2014a) designed an efficient online learning algorithm for this problem achieving a regret of O(\sqrtnL^3 T), where T is the number of rounds. Information theoretically, one can achieve a regret of O(\sqrtn \log L T). One of the main open questions left in this framework concerns closing the above gap. In this work, we provide a complete answer to the question above via two main results. We show, via a tighter analysis, that the semi-definite programming based algorithm of Christiano (2014a) in fact achieves a regret of O(\sqrtnLT). Second, we show a matching computational lower bound. Namely, we show that a polynomial time algorithm for online local learning with lower regret would imply a polynomial time algorithm for the planted clique problem which is widely believed to be hard. We prove a similar hardness result under a related conjecture concerning planted dense subgraphs that we put forth. Unlike planted clique, the planted dense subgraph problem does not have any known quasi-polynomial time algorithms. Computational lower bounds for online learning are relatively rare, and we hope that the ideas developed in this work will lead to lower bounds for other online learning scenarios as well.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Awasthi15a.html
  PDF: http://proceedings.mlr.press/v40/Awasthi15a.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Awasthi15a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Pranjal
    family: Awasthi
  - given: Moses
    family: Charikar
  - given: Kevin A
    family: Lai
  - given: Andrej
    family: Risteski
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 150-166
  id: Awasthi15a
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 150
  lastpage: 166
  published: 2015-06-26 00:00:00 +0000
- title: 'Efficient Learning of Linear Separators under Bounded Noise'
  abstract: 'We study the learnability of linear separators in \Re^d in the presence of bounded  (a.k.a Massart) noise. This is a  realistic generalization of the random classification noise model, where the adversary can flip each example x with probability η(x) ≤η. We provide the first polynomial time algorithm that can learn linear separators to arbitrarily small excess error in this noise model under the uniform distribution over the unit sphere in \Re^d, for some constant value of η. While widely studied in the statistical learning theory community in the context of getting faster convergence rates, computationally efficient algorithms in this model had remained elusive. Our work provides the first evidence that one can indeed design  algorithms achieving arbitrarily small excess error in  polynomial time  under this realistic noise model and thus opens up a new and exciting line of research. We additionally provide lower bounds showing that popular algorithms such as hinge loss minimization and averaging cannot lead to arbitrarily small excess error under Massart noise, even under the uniform distribution. Our work, instead, makes use of a margin based technique developed in the context of active learning. As a result, our algorithm is also an active learning algorithm with label complexity that is only logarithmic in the desired excess error ε. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Awasthi15b.html
  PDF: http://proceedings.mlr.press/v40/Awasthi15b.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Awasthi15b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Pranjal
    family: Awasthi
  - given: Maria-Florina
    family: Balcan
  - given: Nika
    family: Haghtalab
  - given: Ruth
    family: Urner
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 167-190
  id: Awasthi15b
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 167
  lastpage: 190
  published: 2015-06-26 00:00:00 +0000
- title: 'Efficient Representations for Lifelong Learning and Autoencoding'
  abstract: 'It has been a long-standing goal in machine learning, as well as in AI more generally, to develop life-long learning systems that learn many different tasks over time, and reuse insights from tasks learned, “learning to learn” as they do so.  In this work we pose and provide efficient algorithms for several natural theoretical formulations of this goal.  Specifically, we consider the problem of learning many different target functions over time, that share certain commonalities that are initially unknown to the learning algorithm.  Our aim is to learn new internal representations as the algorithm learns new target functions,  that capture this commonality and allow subsequent learning tasks to be solved more efficiently and from less data. We develop efficient algorithms for two very different kinds of commonalities that target functions might share: one based on learning common low-dimensional and unions of low-dimensional subspaces and one based on learning  nonlinear Boolean combinations of features.  Our algorithms for learning Boolean feature combinations additionally have a dual interpretation, and can be viewed as giving an efficient procedure for constructing near-optimal sparse Boolean autoencoders under a natural “anchor-set” assumption.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Balcan15.html
  PDF: http://proceedings.mlr.press/v40/Balcan15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Balcan15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Maria-Florina
    family: Balcan
  - given: Avrim
    family: Blum
  - given: Santosh
    family: Vempala
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 191-210
  id: Balcan15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 191
  lastpage: 210
  published: 2015-06-26 00:00:00 +0000
- title: 'Optimally Combining Classifiers Using Unlabeled Data'
  abstract: 'We develop a worst-case analysis of aggregation of classifier ensembles for binary classification. The task of predicting to minimize error is formulated as a game played over a given set of unlabeled data (a transductive setting), where prior label information is encoded as constraints on the game. The minimax solution of this game identifies cases where a weighted combination of the classifiers can perform significantly better than any single classifier.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Balsubramani15.html
  PDF: http://proceedings.mlr.press/v40/Balsubramani15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Balsubramani15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Akshay
    family: Balsubramani
  - given: Yoav
    family: Freund
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 211-225
  id: Balsubramani15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 211
  lastpage: 225
  published: 2015-06-26 00:00:00 +0000
- title: 'Minimax Fixed-Design Linear Regression'
  abstract: 'We consider a linear regression game in which the covariates are known in advance: at each round, the learner predicts a real-value, the adversary reveals a label, and the learner incurs a squared error loss. The aim is to minimize the regret with respect to linear predictions.  For a variety of constraints on the adversary’s labels, we show that the minimax optimal strategy is linear, with a parameter choice that is reminiscent of ordinary least squares (and as easy to compute).  The predictions depend on all covariates, past and future, with a particular weighting assigned to future covariates corresponding to the role that they play in the minimax regret.  We study two families of label sequences: box constraints (under a covariate compatibility condition), and a weighted 2-norm constraint that emerges naturally from the analysis.  The strategy is adaptive in the sense that it requires no knowledge of the constraint set.  We obtain an explicit expression for the minimax regret for these games.  For the case of uniform box constraints, we show that, with worst case covariate sequences, the regret is O(d\log T), with no dependence on the scaling of the covariates. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Bartlett15.html
  PDF: http://proceedings.mlr.press/v40/Bartlett15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Bartlett15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Peter L.
    family: Bartlett
  - given: Wouter M.
    family: Koolen
  - given: Alan
    family: Malek
  - given: Eiji
    family: Takimoto
  - given: Manfred K.
    family: Warmuth
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 226-239
  id: Bartlett15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 226
  lastpage: 239
  published: 2015-06-26 00:00:00 +0000
- title: 'Escaping the Local Minima via Simulated Annealing: Optimization of Approximately Convex Functions'
  abstract: 'We consider the problem of optimizing an approximately convex function over a bounded convex set in \mathbbR^n  using only function evaluations. The problem is reduced to sampling from an \emphapproximately log-concave distribution using the Hit-and-Run method, which is shown to have the same \mathcalO^* complexity as sampling from log-concave distributions. In addition to extend the analysis for log-concave distributions to approximate log-concave distributions, the implementation of the 1-dimensional sampler of the Hit-and-Run walk requires new methods and analysis. The algorithm then is based on simulated annealing which does not relies on first order conditions which makes it essentially immune to local minima. We then apply the method to different motivating problems. In the context of zeroth order stochastic convex optimization, the proposed method produces an ε-minimizer after \mathcalO^*(n^7.5ε^-2) noisy function evaluations  by inducing a \mathcalO(ε/n)-approximately log concave distribution. We also consider in detail the case when the “amount of non-convexity” decays towards the optimum of the function. Other applications of the method discussed in this work include private computation of empirical risk minimizers, two-stage stochastic programming, and approximate dynamic programming for online learning.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Belloni15.html
  PDF: http://proceedings.mlr.press/v40/Belloni15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Belloni15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Alexandre
    family: Belloni
  - given: Tengyuan
    family: Liang
  - given: Hariharan
    family: Narayanan
  - given: Alexander
    family: Rakhlin
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 240-265
  id: Belloni15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 240
  lastpage: 265
  published: 2015-06-26 00:00:00 +0000
- title: 'Bandit Convex Optimization: \sqrtT Regret in One Dimension'
  abstract: 'We analyze the minimax regret of the adversarial bandit convex optimization problem.  Focusing on the one-dimensional case, we prove that the minimax regret is \widetildeΘ(\sqrtT) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Bubeck15a.html
  PDF: http://proceedings.mlr.press/v40/Bubeck15a.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Bubeck15a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Sébastien
    family: Bubeck
  - given: Ofer
    family: Dekel
  - given: Tomer
    family: Koren
  - given: Yuval
    family: Peres
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 266-278
  id: Bubeck15a
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 266
  lastpage: 278
  published: 2015-06-26 00:00:00 +0000
- title: 'The entropic barrier: a simple and optimal universal self-concordant barrier'
  abstract: 'We prove that the Fenchel dual of the log-Laplace transform of the uniform measure on a convex body in \mathbbR^n is a (1+o(1)) n-self-concordant barrier, improving a seminal result of Nesterov and Nemirovski. This gives the first explicit construction of a universal barrier for convex bodies with optimal self-concordance parameter. The proof is based on basic geometry of log-concave distributions, and elementary duality in exponential families. The result also gives a new perspective on the minimax regret for the linear bandit problem.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Bubeck15b.html
  PDF: http://proceedings.mlr.press/v40/Bubeck15b.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Bubeck15b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Sébastien
    family: Bubeck
  - given: Ronen
    family: Eldan
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 279-279
  id: Bubeck15b
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 279
  lastpage: 279
  published: 2015-06-26 00:00:00 +0000
- title: 'Optimum Statistical Estimation with Strategic Data Sources'
  abstract: 'We propose an optimum mechanism for providing monetary incentives to the data sources of a statistical estimator such as linear regression, so that high quality data is provided at low cost, in the sense that the weighted sum of payments and estimation error is minimized.  The mechanism applies to a broad range of estimators, including linear and polynomial regression, kernel regression, and, under some additional assumptions, ridge regression. It also generalizes to several objectives, including minimizing estimation error subject to budget constraints. Besides our concrete results for regression problems, we contribute a mechanism design framework through which to design and analyze statistical estimators whose examples are supplied by workers with cost for labeling said examples.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Cai15.html
  PDF: http://proceedings.mlr.press/v40/Cai15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Cai15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Yang
    family: Cai
  - given: Constantinos
    family: Daskalakis
  - given: Christos
    family: Papadimitriou
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 280-296
  id: Cai15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 280
  lastpage: 296
  published: 2015-06-26 00:00:00 +0000
- title: 'On the Complexity of Learning with Kernels'
  abstract: 'A well-recognized limitation of kernel learning is the requirement to handle a kernel matrix, whose size is quadratic in the number of training examples. Many methods have been proposed to reduce this computational cost, mostly by using a subset of the kernel matrix entries, or some form of low-rank matrix approximation, or a random projection method. In this paper, we study lower bounds on the error attainable by such methods as a function of the number of entries observed in the kernel matrix or the rank of an approximate kernel matrix. We show that there are kernel learning problems where no such method will lead to non-trivial computational savings. Our results also quantify how the problem difficulty depends on parameters such as the nature of the loss function, the regularization parameter, the norm of the desired predictor, and the kernel matrix rank. Our results also suggest cases where more efficient kernel learning might be possible.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Cesa-Bianchi15.html
  PDF: http://proceedings.mlr.press/v40/Cesa-Bianchi15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Cesa-Bianchi15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Nicolò
    family: Cesa-Bianchi
  - given: Yishay
    family: Mansour
  - given: Ohad
    family: Shamir
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 297-325
  id: Cesa-Bianchi15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 297
  lastpage: 325
  published: 2015-06-26 00:00:00 +0000
- title: 'Learnability of Solutions to Conjunctive Queries: The Full Dichotomy'
  abstract: 'The problem of learning the solution space of an unknown formula has been studied in multiple embodiments in computational learning theory. In this article, we study a family of such learning problems; this family contains, for each relational structure, the problem of learning the solution space of an unknown conjunctive query evaluated on the structure. A progression of results aimed to classify the learnability of each of the problems in this family, and thus far a culmination thereof was a positive learnability result generalizing all previous ones. This article completes the classification program towards which this progression of results strived, by presenting a negative learnability result that complements the mentioned positive learnability result. In order to obtain our negative result, we make use of universal-algebraic concepts, and our result is phrased in terms of the varietal property of non-congruence modularity.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Chen15a.html
  PDF: http://proceedings.mlr.press/v40/Chen15a.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Chen15a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Hubie
    family: Chen
  - given: Matthew
    family: Valeriote
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 326-337
  id: Chen15a
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 326
  lastpage: 337
  published: 2015-06-26 00:00:00 +0000
- title: 'Sequential Information Maximization: When is Greedy Near-optimal?'
  abstract: 'Optimal information gathering is a central challenge in machine learning and science in general.  A common objective that quantifies the usefulness of observations is Shannon’s mutual information, defined w.r.t. a probabilistic model. Greedily selecting observations that maximize the mutual information is the method of choice in numerous applications, ranging from Bayesian experimental design to automated diagnosis, to active learning in Bayesian models. Despite its importance and widespread use in applications, little is known about the theoretical properties of sequential information maximization, in particular under noisy observations.  In this paper, we analyze the widely used greedy policy for this task, and identify problem instances where it provides provably near-maximal utility, even in the challenging setting of persistent noise.  Our results depend on a natural separability condition associated with a channel injecting noise into the observations. We also identify examples where this separability parameter is necessary in the bound: if it is too small, then the greedy policy fails to select informative tests.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Chen15b.html
  PDF: http://proceedings.mlr.press/v40/Chen15b.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Chen15b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Yuxin
    family: Chen
  - given: S. Hamed
    family: Hassani
  - given: Amin
    family: Karbasi
  - given: Andreas
    family: Krause
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 338-363
  id: Chen15b
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 338
  lastpage: 363
  published: 2015-06-26 00:00:00 +0000
- title: 'Efficient Sampling for Gaussian Graphical Models  via Spectral Sparsification'
  abstract: 'Motivated by a sampling problem basic to computational statistical inference, we develop a toolset based on spectral sparsification for a family of fundamental problems involving Gaussian sampling, matrix functionals, and reversible Markov chains. Drawing on the connection between Gaussian graphical models and the recent breakthroughs in spectral graph theory, we give the first nearly linear time algorithm for the following basic matrix problem: Given an n\times n Laplacian matrix \mathbfM and a constant -1 ≤p ≤1, provide efficient access to a sparse n\times n linear operator \tilde\mathbfC such that $\mathbfM^p ≈\tilde\mathbfC \tilde\mathbfC^⊤, where ≈denotes spectral similarity. When p is set to -1, this gives the first parallel sampling algorithm that is essentially optimal both in total work and randomness for Gaussian random fields with symmetric diagonally dominant (SDD) precision matrices. It only requires \em nearly linear work and 2n \em i.i.d. random univariate Gaussian samples to generate an n-dimensional \em i.i.d. Gaussian random sample in polylogarithmic depth. The key ingredient of our approach is an integration of spectral sparsification with multilevel method: Our algorithms are based on factoring \mathbfM^p$ into a product of well-conditioned matrices, then introducing powers and replacing dense matrices with sparse approximations. We give two sparsification methods for this approach that may be of independent interest. The first invokes Maclaurin series on the factors, while the second builds on our new nearly linear time spectral sparsification algorithm for random-walk matrix polynomials. We expect these algorithmic advances will also help to strengthen the connection between machine learning and spectral graph theory, two of the most active fields in understanding large data and networks. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Cheng15.html
  PDF: http://proceedings.mlr.press/v40/Cheng15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Cheng15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Dehua
    family: Cheng
  - given: Yu
    family: Cheng
  - given: Yan
    family: Liu
  - given: Richard
    family: Peng
  - given: Shang-Hua
    family: Teng
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 364-390
  id: Cheng15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 364
  lastpage: 390
  published: 2015-06-26 00:00:00 +0000
- title: 'Stochastic Block Model and Community Detection in Sparse Graphs: A spectral algorithm with optimal rate of recovery'
  abstract: 'In this paper, we present  and analyze  a simple  and robust  spectral algorithm for the stochastic block model with k blocks, for any k fixed.  Our algorithm works with graphs having constant edge density, under an optimal condition on the gap between the density inside a block and the density between the blocks. As a co-product, we settle an open question posed by Abbe et. al. concerning  censor block models.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Chin15.html
  PDF: http://proceedings.mlr.press/v40/Chin15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Chin15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Peter
    family: Chin
  - given: Anup
    family: Rao
  - given: Van
    family: Vu
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 391-423
  id: Chin15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 391
  lastpage: 423
  published: 2015-06-26 00:00:00 +0000
- title: 'On-Line Learning Algorithms for Path Experts with Non-Additive Losses'
  abstract: 'We consider two broad families of non-additive loss functions covering a large number of applications: rational losses and tropical losses. We give new algorithms extending the Follow-the-Perturbed-Leader (FPL) algorithm to both of these families of loss functions and similarly give new algorithms extending the Randomized Weighted Majority (RWM) algorithm to both of these families. We prove that the time complexity of our extensions to rational losses of both FPL and RWM is polynomial and present regret bounds for both.  We further show that these algorithms can play a critical role in improving performance in applications such as structured prediction. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Cortes15.html
  PDF: http://proceedings.mlr.press/v40/Cortes15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Cortes15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Corinna
    family: Cortes
  - given: Vitaly
    family: Kuznetsov
  - given: Mehryar
    family: Mohri
  - given: Manfred
    family: Warmuth
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 424-447
  id: Cortes15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 424
  lastpage: 447
  published: 2015-06-26 00:00:00 +0000
- title: 'Truthful Linear Regression'
  abstract: 'We consider the problem of fitting a linear model to data held by individuals who are concerned about their privacy. Incentivizing most players to truthfully report their data to the analyst constrains our design to mechanisms that provide a privacy guarantee to the participants; we use differential privacy to model individuals’ privacy losses. This immediately poses a problem, as differentially private computation of a linear model necessarily produces a biased estimation, and existing approaches to design mechanisms to elicit data from privacy-sensitive individuals do not generalize well to biased estimators. We overcome this challenge through an appropriate design of the computation and payment scheme.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Cummings15.html
  PDF: http://proceedings.mlr.press/v40/Cummings15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Cummings15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Rachel
    family: Cummings
  - given: Stratis
    family: Ioannidis
  - given: Katrina
    family: Ligett
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 448-483
  id: Cummings15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 448
  lastpage: 483
  published: 2015-06-26 00:00:00 +0000
- title: 'A PTAS for Agnostically Learning Halfspaces'
  abstract: 'We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the d dimensional sphere. Namely, we show that for every μ>0 there is an algorithm that runs in time \mathrmpoly\left(d,\frac1ε\right), and is guaranteed to return a classifier with error at most (1+μ)\mathrmopt+ε, where \mathrmopt is the error of the best halfspace classifier. This improves on Awasthi, Balcan and Long (STOC 2014) who showed an algorithm with an (unspecified) constant approximation ratio. Our algorithm combines the classical technique of polynomial regression, together with the new localization technique of Awasthi et. al.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Daniely15.html
  PDF: http://proceedings.mlr.press/v40/Daniely15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Daniely15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Amit
    family: Daniely
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 484-502
  id: Daniely15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 484
  lastpage: 502
  published: 2015-06-26 00:00:00 +0000
- title: 'S2: An Efficient Graph Based Active Learning Algorithm with Application to Nonparametric Classification'
  abstract: 'This paper investigates the problem of active learning for binary label prediction on a graph. We introduce a simple and label-efficient algorithm called S^2 for this task. At each step, S^2 selects the vertex to be labeled based on the structure of the graph and all previously gathered labels. Specifically, S^2 queries for the label of the vertex that bisects the \em shortest shortest path between any pair of oppositely labeled vertices. We present a theoretical estimate of the number of queries S^2 needs in terms of a novel  parametrization of the complexity of binary functions on graphs. We also present experimental results demonstrating the performance of S^2 on both real and synthetic data. While other graph-based active learning algorithms have shown promise in practice, our algorithm is the first with both good performance and theoretical guarantees. Finally, we demonstrate the implications of the S^2 algorithm to the theory of nonparametric active learning. In particular, we show that S^2 achieves near minimax optimal excess risk for an important class of nonparametric classification problems.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Dasarathy15.html
  PDF: http://proceedings.mlr.press/v40/Dasarathy15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Dasarathy15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Gautam
    family: Dasarathy
  - given: Robert
    family: Nowak
  - given: Xiaojin
    family: Zhu
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 503-522
  id: Dasarathy15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 503
  lastpage: 522
  published: 2015-06-26 00:00:00 +0000
- title: 'Improved Sum-of-Squares Lower Bounds for Hidden Clique and Hidden Submatrix Problems'
  abstract: 'Given a large data matrix A∈\mathbbR^n\times n, we consider the problem of determining whether its entries are i.i.d. from some known marginal distribution A_ij∼P_0, or instead A contains a principal submatrix A_\sf Q,\sf Q whose entries have marginal distribution A_ij∼P_1≠P_0. As a special case, the hidden (or planted) clique problem is finding a planted clique in an otherwise uniformly random graph. Assuming unbounded computational resources, this hypothesis testing problem is statistically solvable  provided |\sf Q|\ge C \log n for a suitable constant C. However, despite substantial effort, no polynomial time algorithm is known that succeeds with high probability when |\sf Q| = o(\sqrtn).  Recently, \citemeka2013association proposed a method to establish lower bounds for the hidden clique problem within the Sum of Squares (SOS) semidefinite hierarchy. Here we consider the degree-4 SOS relaxation, and study the construction of \citemeka2013association to prove that SOS fails unless k\ge C\,n^1/3/\log n. An argument presented by \citeBarakLectureNotes implies that this lower bound cannot be substantially  improved unless  the witness construction is changed in the proof. Our proof uses the moment method to bound the spectrum of a certain random association scheme, i.e. a symmetric random matrix whose rows and columns are indexed by the edges of an Erdös-Renyi random graph. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Deshpande15.html
  PDF: http://proceedings.mlr.press/v40/Deshpande15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Deshpande15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Yash
    family: Deshpande
  - given: Andrea
    family: Montanari
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 523-562
  id: Deshpande15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 523
  lastpage: 562
  published: 2015-06-26 00:00:00 +0000
- title: 'Contextual Dueling Bandits'
  abstract: 'We consider the problem of learning to choose actions using contextual information when provided with limited feedback in the form of relative pairwise comparisons. We study this problem in the dueling-bandits framework of Yue et al. (COLT’09), which we extend to incorporate context. Roughly, the learner’s goal is to find the best policy, or way of behaving, in some space of policies, although “best” is not always so clearly defined. Here, we propose a new and natural solution concept, rooted in game theory, called a \emphvon Neumann winner, a randomized policy that beats or ties every other policy. We show that this notion overcomes important limitations of existing solutions, particularly the Condorcet winner which has typically been used in the past, but which requires strong and often unrealistic assumptions. We then present three \emphefficient algorithms for online learning in our setting, and for approximating a von Neumann winner from batch-like data. The first of these algorithms achieves particularly low regret, even when data is adversarial, although its time and space requirements are linear in the size of the policy space. The other two algorithms require time and space only logarithmic in the size of the policy space when provided access to an oracle for solving classification problems on the space.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Dudik15.html
  PDF: http://proceedings.mlr.press/v40/Dudik15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Dudik15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Miroslav
    family: Dudík
  - given: Katja
    family: Hofmann
  - given: Robert E.
    family: Schapire
  - given: Aleksandrs
    family: Slivkins
  - given: Masrour
    family: Zoghi
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 563-587
  id: Dudik15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 563
  lastpage: 587
  published: 2015-06-26 00:00:00 +0000
- title: 'Beyond Hartigan Consistency: Merge Distortion Metric for Hierarchical Clustering'
  abstract: 'Hierarchical clustering is a popular method for analyzing data which associates a tree to a dataset. Hartigan consistency has been used extensively as a framework to analyze such clustering algorithms from a statistical point of view. Still, as we show in the paper, a tree which is Hartigan consistent with a given density can look very different than the correct limit tree. Specifically, Hartigan consistency permits two types of undesirable configurations which we term \emphover-segmentation and \emphimproper nesting.  Moreover, Hartigan consistency is a limit property and does not directly quantify difference between trees. In this paper we identify two limit properties, \emphseparation and \emphminimality, which address both over-segmentation and improper nesting and together imply (but are not implied by) Hartigan consistency. We proceed to introduce a \emphmerge distortion metric between hierarchical clusterings and show that convergence in our distance implies both separation and minimality. We also prove that uniform separation and minimality imply convergence in the merge distortion metric.  Furthermore, we show that our merge distortion metric is stable under perturbations of the density. Finally, we demonstrate applicability of these concepts by proving convergence results for two clustering algorithms. First, we show convergence (and hence separation and minimality) of the recent robust single linkage algorithm of Chaudhuri and Dasgupta (2010). Second, we provide convergence results on manifolds for  topological  split tree clustering.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Eldridge15.html
  PDF: http://proceedings.mlr.press/v40/Eldridge15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Eldridge15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Justin
    family: Eldridge
  - given: Mikhail
    family: Belkin
  - given: Yusu
    family: Wang
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 588-606
  id: Eldridge15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 588
  lastpage: 606
  published: 2015-06-26 00:00:00 +0000
- title: 'Faster Algorithms for Testing under Conditional Sampling'
  abstract: 'There has been considerable recent interest in distribution-tests whose run-time and sample requirements are sublinear in the domain-size k. We study two of the most important tests under the conditional-sampling model where each query specifies a subset S of the domain, and the response is a sample drawn from S according to the underlying distribution. For identity testing, which asks whether the underlying distribution equals a specific given distribution or ε-differs from it, we reduce the known time and sample complexities from \widetilde\mathcalO(ε^-4) to \widetilde\mathcalO(ε^-2), thereby matching the information theoretic lower bound. For closeness testing, which asks whether two distributions underlying observed data sets are equal or different, we reduce existing complexity from \widetilde\mathcalO(ε^-4 \log^5 k) to an even sub-logarithmic \widetilde\mathcalO(ε^-5 \log \log k) thus providing a better bound to an open problem in Bertinoro Workshop on Sublinear Algorithms (Fisher, 2014).'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Falahatgar15.html
  PDF: http://proceedings.mlr.press/v40/Falahatgar15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Falahatgar15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Moein
    family: Falahatgar
  - given: Ashkan
    family: Jafarpour
  - given: Alon
    family: Orlitsky
  - given: Venkatadheeraj
    family: Pichapati
  - given: Ananda Theertha
    family: Suresh
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 607-636
  id: Falahatgar15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 607
  lastpage: 636
  published: 2015-06-26 00:00:00 +0000
- title: 'Learning and inference in the presence of corrupted inputs'
  abstract: 'We consider a model where given an uncorrupted input an adversary can corrupt it to one out of m corrupted inputs. We model the classification and inference problems as a zero-sum game between a learner, minimizing the expected error, and an adversary, maximizing the expected error. The value of this game is the optimal error rate achievable. For learning using a limited hypothesis class \mathcalH over corrupted inputs, we give an efficient algorithm that given an uncorrupted sample returns a hypothesis h∈\mathcalH whose error on adversarially corrupted inputs is near optimal. Our algorithm uses as a blackbox an oracle that solves the ERM problem for the hypothesis class \mathcalH. We provide a generalization bound for our setting, showing that for a sufficiently large sample, the performance on the sample and future unseen corrupted inputs will be similar. This gives an efficient learning algorithm for our adversarial setting, based on an ERM oracle. We also consider an inference related setting of the problem, where given a corrupted input, the learner queries the  target function on various uncorrupted inputs and generates a prediction regarding the given corrupted input. There is no limitation on the prediction function the learner may generate, so implicitly the hypothesis class includes all possible hypotheses. In this setting we characterize the optimal learner policy as a minimum vertex cover in a given bipartite graph, and the optimal adversary policy as a maximum matching in the same bipartite graph. We design efficient local algorithms for approximating minimum vertex cover in bipartite graphs, which implies an efficient near optimal algorithm for the learner.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Feige15.html
  PDF: http://proceedings.mlr.press/v40/Feige15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Feige15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Uriel
    family: Feige
  - given: Yishay
    family: Mansour
  - given: Robert
    family: Schapire
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 637-657
  id: Feige15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 637
  lastpage: 657
  published: 2015-06-26 00:00:00 +0000
- title: 'From Averaging to Acceleration, There is Only a Step-size'
  abstract: 'We show that accelerated gradient descent, averaged gradient descent and the heavy-ball method for quadratic non-strongly-convex problems may be reformulated as   constant parameter second-order difference equation algorithms, where stability of the system is equivalent to convergence at rate O(1/n^2), where n is the number of iterations. We provide a detailed analysis of the eigenvalues of the corresponding linear dynamical system, showing various oscillatory and non-oscillatory behaviors, together with a sharp stability result with explicit constants. We also consider the situation where noisy gradients are available, where we extend our general convergence result, which suggests an alternative algorithm (i.e., with different step sizes) that exhibits the good aspects of both averaging and acceleration.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Flammarion15.html
  PDF: http://proceedings.mlr.press/v40/Flammarion15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Flammarion15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Nicolas
    family: Flammarion
  - given: Francis
    family: Bach
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 658-695
  id: Flammarion15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 658
  lastpage: 695
  published: 2015-06-26 00:00:00 +0000
- title: 'Variable Selection is Hard'
  abstract: 'Variable selection for sparse linear regression is the problem of finding, given an m\times p  matrix B and a target vector \bfy,  a sparse vector \bfx such that B\bfx approximately equals \bfy. Assuming a standard complexity hypothesis, we show that no polynomial-time algorithm can find a k’-sparse \bfx with \|B\bfx-\bfy\|^2\le h(m,p), where k’=k⋅2^\log ^1-δ p and h(m,p)= p^C_1 m^1-C_2, where δ>0,C_1>0,C_2>0 are arbitrary. This is true even under the promise that there is an unknown k-sparse vector \bfx^* satisfying B\bfx^*=\bfy. We prove a similar result for a statistical version of the problem in which the data are corrupted by noise. To the authors’ knowledge, these are the first hardness results for sparse regression that apply when the algorithm simultaneously has k’>k and h(m,p)>0.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Foster15.html
  PDF: http://proceedings.mlr.press/v40/Foster15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Foster15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Dean
    family: Foster
  - given: Howard
    family: Karloff
  - given: Justin
    family: Thaler
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 696-709
  id: Foster15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 696
  lastpage: 709
  published: 2015-06-26 00:00:00 +0000
- title: 'Vector-Valued Property Elicitation'
  abstract: 'The elicitation of a statistic, or property of a distribution, is the task of devising proper scoring rules, equivalently proper losses, which incentivize an agent or algorithm to truthfully estimate the desired property of the underlying probability distribution or data set.  Leveraging connections between elicitation and convex analysis, we address the vector-valued property case, which has received little attention in the literature despite its applications to both machine learning and statistics. We first provide a very general characterization of linear and ratio-of-linear properties, the first of which resolves an open problem by unifying and strengthening several previous characterizations in machine learning and statistics.  We then ask which vectors of properties admit nonseparable scores, which cannot be expressed as a sum of scores for each coordinate separately, a natural desideratum for machine learning.  We show that linear and ratio-of-linear do admit nonseparable scores, and provide evidence for a conjecture that these are the only such properties (up to link functions). Finally, we give a general method for producing identification functions and address an open problem by showing that convex maximal level sets are insufficient for elicitability in general.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Frongillo15.html
  PDF: http://proceedings.mlr.press/v40/Frongillo15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Frongillo15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Rafael
    family: Frongillo
  - given: Ian A.
    family: Kash
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 710-727
  id: Frongillo15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 710
  lastpage: 727
  published: 2015-06-26 00:00:00 +0000
- title: 'Competing with the Empirical Risk Minimizer in a Single Pass'
  abstract: 'In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible.  In the absence of computational constraints, the minimizer of a sample average of observed data – commonly referred to as either the empirical risk minimizer (ERM) or the M-estimator – is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties. Our goal in this work is to perform as well as the ERM, on \emphevery problem, while minimizing the use of computational resources such as running time and space usage. We provide a simple streaming algorithm which, under standard regularity assumptions on the underlying problem, enjoys the following properties: \beginenumerate \item The algorithm can be implemented in linear time with a single pass of the observed data, using space linear in the size of a single sample. \item The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer on every problem, even considering constant factors. \item The algorithm’s performance depends on the initial error at a rate that decreases super-polynomially. \item The algorithm is easily parallelizable. \endenumerate Moreover, we quantify the (finite-sample) rate at which the algorithm becomes competitive with the ERM.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Frostig15.html
  PDF: http://proceedings.mlr.press/v40/Frostig15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Frostig15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Roy
    family: Frostig
  - given: Rong
    family: Ge
  - given: Sham M.
    family: Kakade
  - given: Aaron
    family: Sidford
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 728-763
  id: Frostig15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 728
  lastpage: 763
  published: 2015-06-26 00:00:00 +0000
- title: 'A Chaining Algorithm for Online Nonparametric Regression'
  abstract: 'We consider the problem of online nonparametric regression with arbitrary deterministic sequences. Using ideas from the chaining technique, we design an algorithm that achieves a Dudley-type regret bound similar to the one obtained in a non-constructive fashion by Rakhlin and Sridharan (2014). Our regret bound is expressed in terms of the metric entropy in the sup norm, which yields optimal guarantees when the metric and sequential entropies are of the same order of magnitude. In particular our algorithm is the first one that achieves optimal rates for online regression over Hölder balls. In addition we show for this example how to adapt our chaining algorithm to get a reasonable computational efficiency with similar regret guarantees (up to a log factor).'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Gaillard15.html
  PDF: http://proceedings.mlr.press/v40/Gaillard15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Gaillard15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Pierre
    family: Gaillard
  - given: Sébastien
    family: Gerchinovitz
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 764-796
  id: Gaillard15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 764
  lastpage: 796
  published: 2015-06-26 00:00:00 +0000
- title: 'Escaping From Saddle Points — Online Stochastic Gradient for Tensor Decomposition'
  abstract: 'We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in \em saddle points. In this paper we identify \em strict saddle property for non-convex problem that allows for efficient optimization. Using this property we show that from an \em arbitrary starting point,  stochastic gradient descent converges to a local minimum in a polynomial number of iterations. To the best of our knowledge this is the first work that gives \em global convergence guarantees for stochastic gradient descent on non-convex functions with exponentially many local minima and saddle points. Our analysis can be applied to orthogonal tensor decomposition, which is widely used in learning a rich class of  latent variable models. We propose a new optimization formulation for the tensor decomposition problem that has strict saddle property. As a result we get the first online algorithm for orthogonal tensor decomposition with global convergence guarantee.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Ge15.html
  PDF: http://proceedings.mlr.press/v40/Ge15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Ge15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Rong
    family: Ge
  - given: Furong
    family: Huang
  - given: Chi
    family: Jin
  - given: Yang
    family: Yuan
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 797-842
  id: Ge15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 797
  lastpage: 842
  published: 2015-06-26 00:00:00 +0000
- title: 'Learning the dependence structure of rare events: a non-asymptotic study'
  abstract: 'Assessing the probability of occurrence of extreme events  is a crucial issue in various fields like finance, insurance, telecommunication or environmental sciences. In a multivariate framework, the tail dependence is characterized by the so-called \emphstable tail dependence function (\textscstdf). Learning this structure is the keystone of multivariate extremes. Although extensive studies have proved consistency and asymptotic normality for the empirical version of the \textscstdf, non-asymptotic bounds are still missing. The main purpose of this paper is to fill this gap. Taking advantage of adapted VC-type concentration inequalities, upper bounds are derived with expected rate of convergence in O(k^-1/2). The concentration tools involved in this analysis rely on a more general study of maximal deviations in low probability regions, and thus directly apply to the classification of extreme data. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Goix15.html
  PDF: http://proceedings.mlr.press/v40/Goix15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Goix15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Nicolas
    family: Goix
  - given: Anne
    family: Sabourin
  - given: Stéphan
    family: Clémen\ccon
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 843-860
  id: Goix15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 843
  lastpage: 860
  published: 2015-06-26 00:00:00 +0000
- title: 'Thompson Sampling for Learning Parameterized Markov Decision Processes'
  abstract: 'We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards. Consequently, observing a particular state transition might yield useful information about other, unobserved, parts of the MDP. We present a version of Thompson sampling for parameterized reinforcement learning problems, and derive a frequentist regret bound for priors over general parameter spaces. The result shows that the number of instants where suboptimal actions are chosen scales logarithmically with time, with high probability. It holds for prior distributions that put significant probability near the true model, without any additional, specific closed-form structure such as conjugate or product-form priors. The constant factor in the logarithmic scaling encodes the information complexity of learning the MDP in terms of the Kullback-Leibler geometry of the parameter space.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Gopalan15.html
  PDF: http://proceedings.mlr.press/v40/Gopalan15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Gopalan15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Aditya
    family: Gopalan
  - given: Shie
    family: Mannor
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 861-898
  id: Gopalan15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 861
  lastpage: 898
  published: 2015-06-26 00:00:00 +0000
- title: 'Computational Lower Bounds for Community Detection on Random Graphs'
  abstract: 'This paper studies the problem of detecting the presence of a small dense community planted in a large Erdős-Rényi random graph \calG(N,q), where the edge probability within the community exceeds q by a constant factor. Assuming the hardness of the planted clique detection problem, we show that the  computational  complexity of detecting the community exhibits the following phase transition phenomenon: As the graph size N grows and the graph becomes sparser according to q=N^-α, there exists a critical value of α= \frac23, below which there exists a computationally intensive procedure that can detect far smaller communities than any computationally efficient procedure, and above which a linear-time procedure is statistically optimal. The results also lead to the average-case hardness results for recovering the dense community and approximating the densest K-subgraph. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Hajek15.html
  PDF: http://proceedings.mlr.press/v40/Hajek15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Hajek15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Bruce
    family: Hajek
  - given: Yihong
    family: Wu
  - given: Jiaming
    family: Xu
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 899-928
  id: Hajek15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 899
  lastpage: 928
  published: 2015-06-26 00:00:00 +0000
- title: 'Adaptive Recovery of Signals by Convex Optimization'
  abstract: 'We present a theoretical framework for adaptive estimation and prediction of signals of unknown structure in the presence of noise. The framework allows to address two intertwined challenges: (i) designing optimal statistical estimators; (ii) designing efficient numerical algorithms. In particular, we establish oracle inequalities for the performance of adaptive procedures, which rely upon convex optimization and thus can be efficiently implemented. As an application of the proposed approach, we consider denoising of harmonic oscillations.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Harchaoui15.html
  PDF: http://proceedings.mlr.press/v40/Harchaoui15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Harchaoui15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Zaid
    family: Harchaoui
  - given: Anatoli
    family: Juditsky
  - given: Arkadi
    family: Nemirovski
  - given: Dmitry
    family: Ostrovsky
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 929-955
  id: Harchaoui15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 929
  lastpage: 955
  published: 2015-06-26 00:00:00 +0000
- title: 'Tensor principal component analysis via sum-of-square proofs'
  abstract: 'We study a statistical model for the \emphtensor principal component analysis problem introduced by Montanari and Richard: Given a order-3 tensor \mathbf T of the form \mathbf T = τ⋅v_0^⊗3 + \mathbf A, where τ≥0 is a signal-to-noise ratio, v_0 is a unit vector, and \mathbf A is a random noise tensor, the goal is to recover the planted vector v_0. For the case that \mathbf A has iid standard Gaussian entries, we give an efficient algorithm to recover v_0 whenever τ≥ω(n^3/4 \log(n)^1/4), and certify that the recovered vector is close to a maximum likelihood estimator, all with high probability over the random choice of \mathbf A. The previous best algorithms with provable guarantees required τ≥Ω(n). In the regime τ≤o(n), natural tensor-unfolding-based spectral relaxations for the underlying optimization problem break down. To go beyond this barrier, we use convex relaxations based on the sum-of-squares method. Our recovery algorithm proceeds by rounding a degree-4 sum-of-squares relaxations of the maximum-likelihood-estimation problem for the statistical model. To complement our algorithmic results, we show that degree-4 sum-of-squares relaxations break down for τ≤O(n^3/4/\log(n)^1/4), which demonstrates that improving our current guarantees (by more than logarithmic factors) would require new techniques or might even be intractable. Finally, we show how to exploit additional problem structure in order to solve our sum-of-squares relaxations, up to some approximation, very efficiently. Our fastest algorithm runs in nearly-linear time using shifted (matrix) power iteration and has similar guarantees as above. The analysis of this algorithm also confirms a variant of a conjecture of Montanari and Richard about singular vectors of tensor unfoldings.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Hopkins15.html
  PDF: http://proceedings.mlr.press/v40/Hopkins15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Hopkins15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Samuel B.
    family: Hopkins
  - given: Jonathan
    family: Shi
  - given: David
    family: Steurer
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 956-1006
  id: Hopkins15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 956
  lastpage: 1006
  published: 2015-06-26 00:00:00 +0000
- title: 'Fast Exact Matrix Completion with Finite Samples'
  abstract: 'Matrix completion is the problem of recovering a low rank matrix by observing a small fraction of its entries. A series of recent works (Keshavan 2012),(Jain et al. 2013) and (Hardt, 2013) have proposed fast non-convex optimization based iterative algorithms to solve this problem. However, the sample complexity in all these results is sub-optimal in its dependence on the rank, condition number and the desired accuracy. In this paper, we present a fast iterative algorithm that solves the matrix completion problem by observing O\left(nr^5 \log^3 n\right) entries, which is independent of the condition number and the desired accuracy. The run time of our algorithm is O\left( nr^7\log^3 n\log 1/ε\right) which is near linear in the dimension of the matrix. To the best of our knowledge, this is the first near linear time algorithm for exact matrix completion with finite sample complexity (i.e. independent of ε). Our algorithm is based on a well known projected gradient descent method, where the projection is onto the (non-convex) set of low rank matrices. There are two key ideas in our result: 1) our argument is based on a \ell_∞norm potential function (as opposed to the spectral norm) and provides a novel way to obtain perturbation bounds for it. 2) we prove and use a natural extension of the Davis-Kahan theorem to obtain perturbation bounds on the best low rank approximation of matrices with good eigen gap. Both of these ideas may be of independent interest. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Jain15.html
  PDF: http://proceedings.mlr.press/v40/Jain15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Jain15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Prateek
    family: Jain
  - given: Praneeth
    family: Netrapalli
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1007-1034
  id: Jain15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1007
  lastpage: 1034
  published: 2015-06-26 00:00:00 +0000
- title: 'Exp-Concavity of Proper Composite Losses'
  abstract: 'The goal of online prediction with expert advice is to find a decision strategy which will perform almost as well as the best expert in a given pool of experts, on any sequence of outcomes. This problem has been widely studied and O(\sqrtT) and O(\logT) regret bounds can be achieved for convex losses and strictly convex losses with bounded first and second derivatives respectively. In special cases like the Aggregating Algorithm with mixable losses and the Weighted Average Algorithm with exp-concave losses, it is possible to achieve O(1) regret bounds. But mixability and exp-concavity are roughly equivalent under certain conditions. Thus by understanding the underlying relationship between these two notions we can gain the best of both algorithms (strong theoretical performance guarantees of the Aggregating Algorithm and the computational efficiency of the Weighted Average Algorithm). In this paper we provide a complete characterization of the exp-concavity of any proper composite loss. Using this characterization and the mixability condition of proper losses, we show that it is possible to transform (re-parameterize) any β-mixable binary proper loss into a β-exp-concave composite loss with the same β. In the multi-class case, we propose an approximation approach for this transformation.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Kamalaruban15.html
  PDF: http://proceedings.mlr.press/v40/Kamalaruban15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Kamalaruban15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Parameswaran
    family: Kamalaruban
  - given: Robert
    family: Williamson
  - given: Xinhua
    family: Zhang
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1035-1065
  id: Kamalaruban15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1035
  lastpage: 1065
  published: 2015-06-26 00:00:00 +0000
- title: 'On Learning Distributions from their Samples'
  abstract: 'One of the most natural and important questions in statistical learning is: how well can a distribution be approximated from its samples. Surprisingly, this question has so far been resolved for only one loss, the KL-divergence and even in this case, the estimator used is ad hoc and not well understood. We study distribution approximations for general loss measures.  For \ell_2^2 we determine the best approximation possible, for \ell_1 and χ^2 we derive tight bounds on the best approximation, and when the probabilities are bounded away from zero, we resolve the question for all sufficiently smooth loss measures, thereby providing a coherent understanding of the rate at which distributions can be approximated from their samples.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Kamath15.html
  PDF: http://proceedings.mlr.press/v40/Kamath15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Kamath15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Sudeep
    family: Kamath
  - given: Alon
    family: Orlitsky
  - given: Dheeraj
    family: Pichapati
  - given: Ananda Theertha
    family: Suresh
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1066-1100
  id: Kamath15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1066
  lastpage: 1100
  published: 2015-06-26 00:00:00 +0000
- title: 'MCMC Learning'
  abstract: 'The theory of learning under the uniform distribution is rich and deep, with connections to cryptography, computational complexity, and the analysis of boolean functions to name a few areas. This theory however is very limited due to the fact that the uniform distribution and the corresponding Fourier basis are rarely encountered as a statistical model. A family of distributions that vastly generalizes the uniform distribution on the Boolean cube is that of distributions represented by Markov Random Fields (MRF). Markov Random Fields are one of the main tools for modeling high dimensional data in many areas of statistics and machine learning. In this paper we initiate the investigation of extending central ideas, methods and algorithms from the theory of learning under the uniform distribution to the setup of learning concepts given examples from MRF distributions. In particular, our results establish a novel connection between properties of MCMC sampling of MRFs and learning under the MRF distribution. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Kanade15.html
  PDF: http://proceedings.mlr.press/v40/Kanade15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Kanade15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Varun
    family: Kanade
  - given: Elchanan
    family: Mossel
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1101-1128
  id: Kanade15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1101
  lastpage: 1128
  published: 2015-06-26 00:00:00 +0000
- title: 'Online PCA with Spectral Bounds'
  abstract: 'This paper revisits the online PCA problem. Given a stream of n vectors x_t ∈\mathbbR^d (columns of X) the algorithm must output y_t ∈\mathbbR^\ell  (columns of Y) before receiving x_t+1. The goal of online PCA is to simultaneously minimize the target dimension \ell and the error \|X - (XY^\scriptstyle  \textrm +)Y\|^2. We describe two simple and deterministic algorithms. The first, receives a parameter ∆and guarantees that \|X - (XY^\scriptstyle  \textrm +)Y\|^2 is not significantly larger than ∆. It requires a target dimension of \ell = O(k/ε) for any k,εsuch that ∆\ge ε\sigma_1^2 + \sigma_k+1^2, with \sigma_i being the i’th singular value of X. The second receives k and εand guarantees that \|X - (XY^\scriptstyle  \textrm +)Y\|^2 \le ε\sigma_1^2 + \sigma_k+1^2. It requires a target dimension of O( k\log n/ε^2). Different models and algorithms for Online PCA were considered in the past. This is the first that achieves a bound on the spectral norm of the residual matrix. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Karnin15.html
  PDF: http://proceedings.mlr.press/v40/Karnin15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Karnin15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Zohar
    family: Karnin
  - given: Edo
    family: Liberty
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1129-1140
  id: Karnin15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1129
  lastpage: 1140
  published: 2015-06-26 00:00:00 +0000
- title: 'Regret Lower Bound and Optimal Algorithm in Dueling Bandit Problem'
  abstract: 'We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. We introduce a tight asymptotic regret lower bound that is based on the information divergence. An algorithm that is inspired by the Deterministic Minimum Empirical Divergence algorithm (Honda and Takemura, 2010) is proposed, and its regret is analyzed. The proposed algorithm is found to be the first one with a regret upper bound that matches the lower bound. Experimental comparisons of dueling bandit algorithms show that the proposed algorithm significantly outperforms existing ones.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Komiyama15.html
  PDF: http://proceedings.mlr.press/v40/Komiyama15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Komiyama15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Junpei
    family: Komiyama
  - given: Junya
    family: Honda
  - given: Hisashi
    family: Kashima
  - given: Hiroshi
    family: Nakagawa
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1141-1154
  id: Komiyama15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1141
  lastpage: 1154
  published: 2015-06-26 00:00:00 +0000
- title: 'Second-order Quantile Methods for Experts and Combinatorial Games'
  abstract: 'We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision tasks. We are not satisfied with just guaranteeing minimax regret rates, but we want our algorithms to perform significantly better on easy data. Two popular ways to formalize such adaptivity are second-order regret bounds and quantile bounds. The underlying notions of ‘easy data’, which may be paraphrased as “the learning problem has small variance” and “multiple decisions are useful”, are synergetic. But even though there are sophisticated algorithms that exploit one of the two, no existing algorithm is able to adapt to both. The difficulty in combining the two notions lies in tuning a parameter called the learning rate, whose optimal value behaves non-monotonically.  We introduce a potential function for which (very surprisingly!) it is sufficient to simply put a prior on learning rates; an approach that does not work for any previous method. By choosing the right prior we construct efficient algorithms and show that they reap both benefits by proving the first bounds that are both second-order and incorporate quantiles.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Koolen15a.html
  PDF: http://proceedings.mlr.press/v40/Koolen15a.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Koolen15a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Wouter M.
    family: Koolen
  - given: Tim
    family: Van Erven
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1155-1175
  id: Koolen15a
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1155
  lastpage: 1175
  published: 2015-06-26 00:00:00 +0000
- title: 'Hierarchical Label Queries with Data-Dependent Partitions'
  abstract: 'Given a joint distribution P_X, Y over a space \Xcal and a label set \Ycal=\braces0, 1, we consider the problem of recovering the labels of an unlabeled sample with as few label queries as possible. The recovered 		     labels can be passed to a passive learner, thus turning the procedure into an active learning approach. We analyze a family of labeling procedures based on a hierarchical clustering of the data. While such labeling procedures have been studied in the past, we provide a new parametrization of P_X, Y that captures their behavior in general low-noise settings, and which accounts for data-dependent clustering, thus providing new theoretical underpinning to practically used tools. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Kpotufe15.html
  PDF: http://proceedings.mlr.press/v40/Kpotufe15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Kpotufe15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Samory
    family: Kpotufe
  - given: Ruth
    family: Urner
  - given: Shai
    family: Ben-David
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1176-1189
  id: Kpotufe15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1176
  lastpage: 1189
  published: 2015-06-26 00:00:00 +0000
- title: 'Algorithms for Lipschitz Learning on Graphs'
  abstract: 'We develop fast algorithms for solving regression problems on graphs where one is given the value of a function at some vertices, and must find its smoothest possible extension to all vertices. The extension we compute is the absolutely minimal Lipschitz extension, and is the limit for large p of p-Laplacian regularization. We present an algorithm that computes a minimal Lipschitz extension in expected linear time, and an algorithm that computes an absolutely minimal Lipschitz extension in expected time \widetildeO (m n). The latter algorithm has variants that seem to run much faster in practice. These extensions are particularly amenable to regularization: we can perform l_0-regularization on the given values in polynomial time and l_1-regularization on the initial function values and on graph edge weights in time \widetildeO (m^3/2). Our definitions and algorithms naturally extend to directed graphs.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Kyng15.html
  PDF: http://proceedings.mlr.press/v40/Kyng15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Kyng15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Rasmus
    family: Kyng
  - given: Anup
    family: Rao
  - given: Sushant
    family: Sachdeva
  - given: Daniel A.
    family: Spielman
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1190-1223
  id: Kyng15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1190
  lastpage: 1223
  published: 2015-06-26 00:00:00 +0000
- title: 'Low Rank Matrix Completion with Exponential Family Noise'
  abstract: 'The matrix completion problem consists in reconstructing a matrix from a sample of entries, possibly observed with noise. A popular class of estimator, known as nuclear norm penalized estimators, are based on minimizing the sum of a data fitting term and a nuclear norm penalization. Here, we investigate the case where the noise distribution belongs to the exponential family and is sub-exponential. Our framework allows for a general sampling scheme. We first consider an estimator defined as the minimizer of the sum of a log-likelihood term and a nuclear norm penalization and prove an upper bound on the Frobenius prediction risk. The rate obtained improves on previous works on matrix completion for exponential family. When the sampling distribution is known, we propose another estimator and prove an oracle inequality \em w.r.t. the Kullback-Leibler prediction risk, which translates immediately into an upper bound on the Frobenius prediction risk. Finally, we show that all the rates obtained are minimax optimal up to a logarithmic factor.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Lafond15.html
  PDF: http://proceedings.mlr.press/v40/Lafond15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Lafond15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Jean
    family: Lafond
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1224-1243
  id: Lafond15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1224
  lastpage: 1243
  published: 2015-06-26 00:00:00 +0000
- title: 'Bad Universal Priors and Notions of Optimality'
  abstract: 'A big open question of algorithmic information theory is the choice of the universal Turing machine (UTM). For Kolmogorov complexity and Solomonoff induction we have invariance theorems: the choice of the UTM changes bounds only by a constant. For the universally intelligent agent AIXI (Hutter, 2005) no invariance theorem is known. Our results are entirely negative: we discuss cases in which unlucky or adversarial choices of the UTM cause AIXI to misbehave drastically. We show that Legg-Hutter intelligence and thus balanced Pareto optimality is entirely subjective, and that every policy is Pareto optimal in the class of all computable environments. This undermines all existing optimality properties for AIXI. While it may still serve as a gold standard for AI, our results imply that AIXI is a \emphrelative theory, dependent on the choice of the UTM.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Leike15.html
  PDF: http://proceedings.mlr.press/v40/Leike15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Leike15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Jan
    family: Leike
  - given: Marcus
    family: Hutter
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1244-1259
  id: Leike15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1244
  lastpage: 1259
  published: 2015-06-26 00:00:00 +0000
- title: 'Learning with Square Loss: Localization through Offset Rademacher Complexity'
  abstract: 'We consider regression with square loss and general classes of functions without the boundedness assumption. We introduce a notion of offset Rademacher complexity that provides a transparent way to study localization both in expectation and in high probability. For any (possibly non-convex) class, the excess loss of a two-step estimator is shown to be upper bounded by this offset complexity through a novel geometric inequality. In the convex case, the estimator reduces to an empirical risk minimizer. The method recovers the results of \citepRakSriTsy15 for the bounded case while also providing guarantees without the boundedness assumption.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Liang15.html
  PDF: http://proceedings.mlr.press/v40/Liang15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Liang15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Tengyuan
    family: Liang
  - given: Alexander
    family: Rakhlin
  - given: Karthik
    family: Sridharan
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1260-1285
  id: Liang15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1260
  lastpage: 1285
  published: 2015-06-26 00:00:00 +0000
- title: 'Achieving All with No Parameters: AdaNormalHedge'
  abstract: 'We study the classic online learning problem of predicting with expert advice, and propose a truly parameter-free and adaptive algorithm that achieves several objectives simultaneously without using any prior information. The main component of this work is an improved version of the NormalHedge.DT algorithm (Luo and Schapire, 2014), called AdaNormalHedge. On one hand, this new algorithm ensures small regret when the competitor has small loss and almost constant regret when the losses are stochastic. On the other hand, the algorithm is able to compete with any convex combination of the experts simultaneously, with a regret in terms of the relative entropy of the prior and the competitor. This resolves an open problem proposed by Chaudhuri et al. (2009) and Chernov and Vovk (2010). Moreover, we extend the results to the sleeping expert setting and provide two applications to illustrate the power of AdaNormalHedge: 1) competing with time-varying unknown competitors and 2) predicting almost as well as the best pruning tree. Our results on these applications significantly improve previous work from different aspects, and a special case of the first application resolves another open problem proposed by Warmuth and Koolen (2014) on whether one can simultaneously achieve optimal shifting regret for both adversarial and stochastic losses.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Luo15.html
  PDF: http://proceedings.mlr.press/v40/Luo15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Luo15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Haipeng
    family: Luo
  - given: Robert E.
    family: Schapire
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1286-1304
  id: Luo15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1286
  lastpage: 1304
  published: 2015-06-26 00:00:00 +0000
- title: 'Lower and Upper Bounds on the Generalization of  Stochastic   Exponentially Concave Optimization'
  abstract: 'In this paper we derive \textithigh probability lower and upper  bounds on the excess risk of stochastic optimization of exponentially concave loss functions. Exponentially  concave loss functions encompass several fundamental problems in machine learning such as squared loss in linear regression, logistic loss in classification, and negative logarithm loss in portfolio management.  We demonstrate an O(d \log T/T) upper bound on the excess risk of stochastic online Newton step algorithm, and an O(d/T) lower bound on the excess risk of any stochastic  optimization method for \textitsquared loss, indicating that the obtained upper bound  is  optimal up to a logarithmic factor. The analysis of upper bound is based on recent advances in  concentration inequalities for bounding self-normalized martingales, which is interesting by its own right, and the proof technique used to achieve the lower bound is a probabilistic method and relies on an information-theoretic minimax analysis.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Mahdavi15.html
  PDF: http://proceedings.mlr.press/v40/Mahdavi15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Mahdavi15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Mehrdad
    family: Mahdavi
  - given: Lijun
    family: Zhang
  - given: Rong
    family: Jin
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1305-1320
  id: Mahdavi15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1305
  lastpage: 1320
  published: 2015-06-26 00:00:00 +0000
- title: 'Correlation Clustering with Noisy Partial Information'
  abstract: 'In this paper, we propose and study a semi-random model for the Correlation Clustering problem on arbitrary graphs G. We give two approximation algorithms for Correlation Clustering instances from this model. The first algorithm finds a solution of value (1+ δ)\mathrmopt-cost + O_δ(n\log^3 n) with high probability, where \mathrmopt-cost is the value of the optimal solution (for every δ> 0). The second algorithm finds the ground truth clustering with an arbitrarily small classification error η(under some additional assumptions on the instance).'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Makarychev15.html
  PDF: http://proceedings.mlr.press/v40/Makarychev15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Makarychev15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Konstantin
    family: Makarychev
  - given: Yury
    family: Makarychev
  - given: Aravindan
    family: Vijayaraghavan
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1321-1342
  id: Makarychev15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1321
  lastpage: 1342
  published: 2015-06-26 00:00:00 +0000
- title: 'Online Density Estimation of Bradley-Terry Models'
  abstract: 'We consider an online density estimation problem for the Bradley-Terry model, where each model parameter defines the probability of a match result between any pair in a set of n teams. The problem is hard because the loss function (i.e., the negative log-likelihood function in our problem setting) is not convex. To avoid the non-convexity, we can change parameters so that the loss function becomes convex with respect to the new parameter. But then the radius K of the reparameterized domain may be infinite, where K depends on the outcome sequence. So we put a mild assumption that guarantees that K is finite. We can thus employ standard online convex optimization algorithms, namely OGD and ONS, over the reparameterized domain, and get regret bounds O(n^\frac12(\ln K)\sqrtT) and O(n^\frac32K\ln T), respectively, where T is the horizon of the game. The bounds roughly means that OGD is better when K is large while ONS is better when K is small. But how large can K be? We show that K can be as large as Θ(T^n-1), which implies that the worst case regret bounds of OGD and ONS are O(n^\frac32\sqrtT\ln T) and \tildeO(n^\frac32(T)^n-1), respectively. We then propose a version of Follow the Regularized Leader, whose regret bound is close to the minimum of those of OGD and ONS. In other words, our algorithm is competitive with both for a wide range of values of K. In particular, our algorithm achieves the worst case regret bound O(n^\frac52T^\frac13 \ln T), which is slightly better than OGD with respect to T. In addition, our algorithm works without the knowledge K, which is a practical advantage.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Matsumoto15.html
  PDF: http://proceedings.mlr.press/v40/Matsumoto15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Matsumoto15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Issei
    family: Matsumoto
  - given: Kohei
    family: Hatano
  - given: Eiji
    family: Takimoto
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1343-1359
  id: Matsumoto15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1343
  lastpage: 1359
  published: 2015-06-26 00:00:00 +0000
- title: 'First-order regret bounds for combinatorial semi-bandits'
  abstract: 'We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algorithms that guarantee that the learner’s expected regret grows as \widetildeO(\sqrtT) with the number of rounds T. In this paper, we propose an algorithm that improves this scaling to \widetildeO(\sqrtL_T^*), where L_T^* is the total loss of the best action. Our algorithm is among the first to achieve such guarantees in a partial-feedback scheme, and the first one to do so in a combinatorial setting.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Neu15.html
  PDF: http://proceedings.mlr.press/v40/Neu15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Neu15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Gergely
    family: Neu
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1360-1375
  id: Neu15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1360
  lastpage: 1375
  published: 2015-06-26 00:00:00 +0000
- title: 'Norm-Based Capacity Control in Neural Networks'
  abstract: 'We investigate the capacity, convexity and characterization of a general family of norm-constrained feed-forward networks.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Neyshabur15.html
  PDF: http://proceedings.mlr.press/v40/Neyshabur15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Neyshabur15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Behnam
    family: Neyshabur
  - given: Ryota
    family: Tomioka
  - given: Nathan
    family: Srebro
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1376-1401
  id: Neyshabur15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1376
  lastpage: 1401
  published: 2015-06-26 00:00:00 +0000
- title: 'Cortical Learning via Prediction'
  abstract: 'What is the mechanism of learning in the brain?  Despite breathtaking advances in neuroscience,  and in machine learning, we do not seem close to an answer.  Using Valiant’s neuronal model as a foundation, we introduce PJOIN (for “predictive join"), a primitive that combines association and prediction. We show that PJOIN can be implemented naturally in Valiant’s conservative, formal model of cortical computation. Using PJOIN — and almost nothing else — we give a simple algorithm for unsupervised learning of arbitrary ensembles of binary patterns (solving an open problem in Valiant’s work). This algorithm relies crucially on prediction, and entails significant downward traffic (“feedback") while parsing stimuli. Prediction and feedback are well-known features of neural cognition and, as far as we know, this is the first theoretical prediction of their essential role in learning.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Papadimitriou15.html
  PDF: http://proceedings.mlr.press/v40/Papadimitriou15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Papadimitriou15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Christos H.
    family: Papadimitriou
  - given: Santosh S.
    family: Vempala
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1402-1422
  id: Papadimitriou15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1402
  lastpage: 1422
  published: 2015-06-26 00:00:00 +0000
- title: 'Partitioning Well-Clustered Graphs: Spectral Clustering Works!'
  abstract: 'In this work we study the widely used \emphspectral clustering algorithms, i.e. partition a graph into k clusters via (1) embedding the vertices of a graph into a low-dimensional space using the bottom eigenvectors of the Laplacian matrix, and (2) partitioning embedded points via k-means algorithms.  We show that, for a wide class of \emphwell-clustered graphs, spectral clustering algorithms can give a good approximation of the optimal clustering. To the best of our knowledge, it is the \emphfirst theoretical analysis of spectral clustering algorithms for a wide family of graphs, even though such approach was proposed in the early 1990s and has comprehensive applications. We also give a nearly-linear time algorithm for partitioning well-clustered graphs, which is based on heat kernel embeddings and approximate nearest neighbor data structures.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Peng15.html
  PDF: http://proceedings.mlr.press/v40/Peng15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Peng15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Richard
    family: Peng
  - given: He
    family: Sun
  - given: Luca
    family: Zanetti
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1423-1455
  id: Peng15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1423
  lastpage: 1455
  published: 2015-06-26 00:00:00 +0000
- title: 'Batched Bandit Problems'
  abstract: 'Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic multi-armed bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives already close to minimax optimal regret bounds and we also evaluate the number of trials in each batch. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Perchet15.html
  PDF: http://proceedings.mlr.press/v40/Perchet15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Perchet15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Vianney
    family: Perchet
  - given: Philippe
    family: Rigollet
  - given: Sylvain
    family: Chassang
  - given: Erik
    family: Snowberg
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1456-1456
  id: Perchet15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1456
  lastpage: 1456
  published: 2015-06-26 00:00:00 +0000
- title: 'Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints'
  abstract: 'We study online prediction where regret of the algorithm is measured against a benchmark defined via evolving constraints. This framework captures online prediction on graphs, as well as other prediction problems with combinatorial structure. A key aspect here is that finding the optimal benchmark predictor (even in hindsight, given all the data) might be computationally hard due to the combinatorial nature of the constraints. Despite this, we provide polynomial-time prediction algorithms that achieve low regret against combinatorial benchmark sets. We do so by building improper learning algorithms based on two ideas that work together. The first is to alleviate part of the computational burden through random playout, and the second is to employ Lasserre semidefinite hierarchies to approximate the resulting integer program. Interestingly, for our prediction algorithms, we only need to compute the values of the semidefinite programs and not the rounded solutions. However, the integrality gap for Lasserre hierarchy does enter the generic regret bound in terms of Rademacher complexity of the benchmark set. This establishes a trade-off between the computation time and the regret bound of the algorithm.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Rakhlin15.html
  PDF: http://proceedings.mlr.press/v40/Rakhlin15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Rakhlin15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Rakhlin
  - given: Karthik
    family: Sridharan
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1457-1479
  id: Rakhlin15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1457
  lastpage: 1479
  published: 2015-06-26 00:00:00 +0000
- title: 'Fast Mixing for Discrete Point Processes'
  abstract: 'We investigate the systematic mechanism for designing fast mixing Markov chain Monte Carlo algorithms to sample from discrete point processes under the Dobrushin uniqueness condition for Gibbs measures. Discrete point processes are defined as probability distributions μ(S)∝\exp(βf(S)) over all subsets S∈2^V of a finite set V through a bounded set function f:2^V→\mathbbR and a parameter β>0. A subclass of discrete point processes characterized by submodular functions (which include log-submodular distributions, submodular point processes, and determinantal point processes) has recently gained a lot of interest in machine learning and shown to be effective for modeling diversity and coverage. We show that if the set function (not necessarily submodular) displays a natural notion of decay of correlation, then, for βsmall enough, it is possible to design fast mixing Markov chain Monte Carlo methods that yield error bounds on marginal approximations that do not depend on the size of the set V. The sufficient conditions that we derive involve a control on the (discrete) Hessian of set functions, a quantity that has not been previously considered in the literature. We specialize our results for submodular functions, and we discuss canonical examples where the Hessian can be easily controlled.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Rebeschini15.html
  PDF: http://proceedings.mlr.press/v40/Rebeschini15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Rebeschini15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Patrick
    family: Rebeschini
  - given: Amin
    family: Karbasi
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1480-1500
  id: Rebeschini15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1480
  lastpage: 1500
  published: 2015-06-26 00:00:00 +0000
- title: 'Generalized Mixability via Entropic Duality'
  abstract: 'Mixability is a property of a loss which characterizes when constant regret is possible in the game of prediction with expert advice. We show that a key property of mixability generalizes, and the \exp and \log operations present in the usual theory are not as special as one might have thought.  In doing so we introduce a more general notion of Φ-mixability where Φis a general entropy (\emphi.e., any convex function on probabilities). We show how a property shared by the convex dual of any such entropy yields a natural algorithm (the minimizer of a regret bound) which, analogous to the classical Aggregating Algorithm, is guaranteed a constant regret when used with Φ-mixable losses.  We characterize which Φhave non-trivial Φ-mixable losses and relate Φ-mixability and its associated Aggregating Algorithm to potential-based methods, a Blackwell-like condition, mirror descent, and risk measures from finance.  We also define a notion of “dominance” between different entropies in terms of bounds they guarantee and conjecture that classical mixability gives optimal bounds, for which we provide some supporting empirical evidence.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Reid15.html
  PDF: http://proceedings.mlr.press/v40/Reid15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Reid15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Mark D.
    family: Reid
  - given: Rafael M.
    family: Frongillo
  - given: Robert C.
    family: Williamson
  - given: Nishant
    family: Mehta
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1501-1522
  id: Reid15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1501
  lastpage: 1522
  published: 2015-06-26 00:00:00 +0000
- title: 'On the Complexity of Bandit Linear Optimization'
  abstract: 'We study the attainable regret for online linear optimization problems with bandit feedback, where unlike the full-information setting, the player can only observe its own loss rather than the full loss vector. We show that the price of bandit information in this setting can be as large as d, disproving the well-known conjecture (Danie et al. (2007)) that the regret for bandit linear optimization is at most \sqrtd times the full-information regret. Surprisingly, this is shown using “trivial” modifications of standard domains, which have no effect in the full-information setting. This and other results we present highlight some interesting differences between full-information and bandit learning, which were not considered in previous literature.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Shamir15.html
  PDF: http://proceedings.mlr.press/v40/Shamir15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Shamir15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Ohad
    family: Shamir
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1523-1551
  id: Shamir15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1523
  lastpage: 1551
  published: 2015-06-26 00:00:00 +0000
- title: 'An Almost Optimal PAC Algorithm'
  abstract: 'The best currently known general lower and upper bounds on the number of labeled examples needed for learning a concept class in the PAC framework (the realizable case) do not perfectly match: they leave a gap of order \log(1/ε) (resp. a gap which is logarithmic in another one of the relevant parameters). It is an unresolved question whether there exists an “optimal PAC algorithm” which establishes a general upper bound with precisely the same order of magnitude as the general lower bound. According to a result of Auer and Ortner, there is no way for showing that arbitrary consistent algorithms are optimal because they can provably differ from optimality by factor \log(1/ε). In contrast to this result, we show that every consistent algorithm L (even a provably suboptimal one) induces a family (L_K)_K\ge1 of PAC algorithms (with 2K-1 calls of L as a subroutine) which come very close to optimality: the number of labeled examples needed by L_K exceeds the general lower bound only by factor \ell_K(1/\epsillon) where \ell_K denotes (a truncated version of) the K-times iterated logarithm. Moreover, L_K is applicable to any concept class C of finite VC-dimension and it can be implemented efficiently whenever the consistency problem for C is feasible. We show furthermore that, for every consistent algorithm L, L_2 is an optimal PAC algorithm for precisely the same concept classes which were used by Auer and Ortner for showing the existence of suboptimal consistent algorithms.  This can be seen as an indication that L_K may have an even better performance than it is suggested by our worstcase analysis.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Simon15a.html
  PDF: http://proceedings.mlr.press/v40/Simon15a.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Simon15a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Hans U.
    family: Simon
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1552-1563
  id: Simon15a
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1552
  lastpage: 1563
  published: 2015-06-26 00:00:00 +0000
- title: 'Minimax rates for memory-bounded sparse linear regression'
  abstract: 'We establish a minimax lower bound of Ω(\frackdBε) on the sample size needed to estimate parameters in a k-sparse linear regression of dimension d under memory restrictions to B bits, where εis the \ell_2 parameter error.  When the covariance of the regressors is the identity matrix, we also provide an algorithm that uses \tildeO(B+k) bits and requires \tildeO(\frackdBε^2) observations to achieve error ε. Our lower bound also holds in the more general communication-bounded setting, where instead of a memory bound, at most B bits of information are allowed to be (adaptively) communicated about each sample. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Steinhardt15.html
  PDF: http://proceedings.mlr.press/v40/Steinhardt15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Steinhardt15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Jacob
    family: Steinhardt
  - given: John
    family: Duchi
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1564-1587
  id: Steinhardt15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1564
  lastpage: 1587
  published: 2015-06-26 00:00:00 +0000
- title: 'Interactive Fingerprinting Codes and the Hardness of Preventing False Discovery'
  abstract: 'We show an essentially tight bound on the number of adaptively chosen statistical queries that a computationally efficient algorithm can answer accurately given n samples from an unknown distribution.  A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is accurate if it is “close” to the correct expectation over the distribution.  This question was recently studied by Dwork et al. (2015), who showed how to answer \tildeΩ(n^2) queries efficiently, and also by Hardt and Ulman (2014), who showed that answering \tildeO(n^3) queries is hard.  We close the gap between the two bounds and show that, under a standard hardness assumption, there is no computationally efficient algorithm that, given n samples from an unknown distribution, can give valid answers to O(n^2) adaptively chosen statistical queries.  An implication of our results is that computationally efficient algorithms for answering arbitrary, adaptively chosen statistical queries may as well be \emphdifferentially private. We obtain our results using a new connection between the problem of answering adaptively chosen statistical queries and a combinatorial object called an \emphinteractive fingerprinting code Fiat and Tassa (2001).  In order to optimize our hardness result, we give a new Fourier-analytic approach to analyzing fingerprinting codes that is simpler, more flexible, and yields better parameters than previous constructions.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Steinke15.html
  PDF: http://proceedings.mlr.press/v40/Steinke15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Steinke15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Thomas
    family: Steinke
  - given: Jonathan
    family: Ullman
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1588-1628
  id: Steinke15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1588
  lastpage: 1628
  published: 2015-06-26 00:00:00 +0000
- title: 'Convex Risk Minimization and Conditional Probability Estimation'
  abstract: 'This paper proves, in very general settings, that convex risk minimization is a procedure to select a unique conditional probability model determined by the classification problem. Unlike most previous work, we give results that are general enough to include cases in which no minimum exists, as occurs typically, for instance, with standard boosting algorithms. Concretely, we first show that any sequence of predictors minimizing convex risk over the source distribution will converge to this unique model when the class of predictors is linear (but potentially of infinite dimension). Secondly, we show the same result holds for \emphempirical risk minimization whenever this class of predictors is finite dimensional, where the essential technical contribution is a norm-free generalization bound. '
  volume: 40
  URL: https://proceedings.mlr.press/v40/Telgarsky15.html
  PDF: http://proceedings.mlr.press/v40/Telgarsky15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Telgarsky15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Matus
    family: Telgarsky
  - given: Miroslav
    family: Dudík
  - given: Robert
    family: Schapire
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1629-1682
  id: Telgarsky15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1629
  lastpage: 1682
  published: 2015-06-26 00:00:00 +0000
- title: 'Regularized Linear Regression: A Precise Analysis of the Estimation Error'
  abstract: 'Non-smooth regularized convex optimization procedures have emerged as a powerful tool to recover structured signals (sparse, low-rank, etc.) from (possibly compressed) noisy linear measurements. We focus on the problem of linear regression and consider a general class of optimization methods that minimize a loss function measuring the misfit of the model to the observations with an added structured-inducing regularization term. Celebrated instances include the LASSO, Group-LASSO, Least-Absolute Deviations method, etc.. We develop a quite general framework for how to determine precise prediction performance guaranties (e.g. mean-square-error) of such methods for the case of Gaussian measurement ensemble. The  machinery builds upon  Gordon’s Gaussian min-max theorem under additional convexity assumptions that arise in many practical applications. This theorem associates with a primary optimization (PO) problem a simplified auxiliary optimization  (AO) problem from which we can tightly infer properties of the original (PO), such as the optimal cost, the norm of the optimal solution, etc. Our theory applies to general loss functions and regularization and provides guidelines on how to optimally tune the regularizer coefficient when certain structural properties (such as sparsity level, rank, etc.) are known.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Thrampoulidis15.html
  PDF: http://proceedings.mlr.press/v40/Thrampoulidis15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Thrampoulidis15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Christos
    family: Thrampoulidis
  - given: Samet
    family: Oymak
  - given: Babak
    family: Hassibi
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1683-1709
  id: Thrampoulidis15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1683
  lastpage: 1709
  published: 2015-06-26 00:00:00 +0000
- title: 'Max vs Min: Tensor Decomposition and ICA with nearly Linear Sample Complexity'
  abstract: 'We present a simple, general technique for reducing the sample complexity of matrix and tensor decomposition algorithms applied to distributions. We use the technique to give a polynomial-time algorithm for standard ICA with sample complexity nearly linear in the dimension, thereby improving substantially on previous bounds. The analysis is based on properties of random polynomials, namely the spacings of an ensemble of polynomials. Our technique also applies to other applications of tensor decompositions, including spherical Gaussian mixture models.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Vempala15.html
  PDF: http://proceedings.mlr.press/v40/Vempala15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Vempala15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Santosh S.
    family: Vempala
  - given: Ying.
    family: Xiao
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1710-1723
  id: Vempala15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1710
  lastpage: 1723
  published: 2015-06-26 00:00:00 +0000
- title: 'On Convergence of Emphatic Temporal-Difference Learning'
  abstract: 'We consider emphatic temporal-difference learning algorithms for policy evaluation in discounted Markov decision processes with finite spaces. Such algorithms were recently proposed by Sutton, Mahmood, and White (2015) as an improved solution to the problem of divergence of off-policy temporal-difference learning with linear function approximation. We present in this paper the first convergence proofs for two emphatic algorithms, ETD(λ) and ELSTD(λ). We prove, under general off-policy conditions, the convergence in L^1 for ELSTD(λ) iterates, and the almost sure convergence of the approximate value functions calculated by both algorithms using a single infinitely long trajectory. Our analysis involves new techniques with applications beyond emphatic algorithms leading, for example, to the first proof that standard TD(λ) also converges under off-policy training for λsufficiently large.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Yu15.html
  PDF: http://proceedings.mlr.press/v40/Yu15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Yu15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: H.
    family: Yu
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1724-1751
  id: Yu15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1724
  lastpage: 1751
  published: 2015-06-26 00:00:00 +0000
- title: 'Open Problem: Restricted Eigenvalue Condition for Heavy Tailed Designs'
  abstract: 'The restricted eigenvalue (RE) condition characterizes the sample complexity of accurate recovery in the context of high-dimensional estimators such as Lasso and Dantzig selector (Bickel et al., 2009). Recent work has shown that random design matrices drawn from any thin-tailed (sub-Gaussian) distributions satisfy the RE condition with high probability, when the number of samples scale as the square of the Gaussian width of the restricted set (Banerjee et al., 2014; Tropp, 2015). We pose the equivalent question for heavy-tailed distributions: Given a random design matrix drawn from a heavy-tailed distribution satisfying the smallball property (Mendelson, 2015), does the design matrix satisfy the RE condition with the same order of sample complexity as sub-Gaussian distributions? An answer to the question will guide the design of highdimensional estimators for heavy tailed problems.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Banerjee15.html
  PDF: http://proceedings.mlr.press/v40/Banerjee15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Banerjee15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Arindam
    family: Banerjee
  - given: Sheng
    family: Chen
  - given: Vidyashankar
    family: Sivakumar
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1752-1755
  id: Banerjee15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1752
  lastpage: 1755
  published: 2015-06-26 00:00:00 +0000
- title: 'Open Problem: The landscape of the loss surfaces of multilayer networks'
  abstract: 'Deep learning has enjoyed a resurgence of interest in the last few years for such applications as image and speech recognition, or natural language processing. The vast majority of practical applications of deep learning focus on supervised learning, where the supervised loss function is minimized using stochastic gradient descent. The properties of this highly non-convex loss function, such as its landscape and the behavior of critical points (maxima, minima, and saddle points), as well as the reason why large- and small-size networks achieve radically different practical performance, are however very poorly understood. It was only recently shown that new results in spin-glass theory potentially may provide an explanation for these problems by establishing a connection between the loss function of the neural networks and the Hamiltonian of the spherical spin-glass models. The connection between both models relies on a number of possibly unrealistic assumptions, yet the empirical evidence suggests that the connection may exist in real. The question we pose is whether it is possible to drop some of these assumptions to establish a stronger connection between both models.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Choromanska15.html
  PDF: http://proceedings.mlr.press/v40/Choromanska15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Choromanska15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Anna
    family: Choromanska
  - given: Yann
    family: LeCun
  - given: Gérard
    family: Ben Arous
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1756-1760
  id: Choromanska15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1756
  lastpage: 1760
  published: 2015-06-26 00:00:00 +0000
- title: 'Open Problem: The Oracle Complexity of Smooth Convex Optimization in Nonstandard Settings'
  abstract: 'First-order convex minimization algorithms are currently the methods of choice for large-scale sparse – and more generally parsimonious – regression models. We pose the question on the limits of performance of black-box oriented methods for convex minimization in \em non-standard settings, where the regularity of the objective is measured in a norm not necessarily induced by the feasible domain. This question is studied for \ell_p/\ell_q-settings, and their matrix analogues (Schatten norms), where we find surprising gaps on lower bounds compared to state of the art methods. We propose a conjecture on the optimal convergence rates for these settings, for which a positive answer would lead to significant improvements on minimization algorithms for parsimonious regression models.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Guzman15.html
  PDF: http://proceedings.mlr.press/v40/Guzman15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Guzman15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Cristóbal
    family: Guzmán
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1761-1763
  id: Guzman15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1761
  lastpage: 1763
  published: 2015-06-26 00:00:00 +0000
- title: 'Open Problem: Online Sabotaged Shortest Path'
  abstract: 'There has been much work on extending the prediction with expert advice methodology to the case when experts are composed of components and there are combinatorially many such experts. One of the core examples is the Online Shortest Path problem where the components are edges and the experts are paths. In this note we revisit this online routing problem in the case where in each trial some of the edges or components are sabotaged / blocked. In the vanilla expert setting a known method can solve this extension where experts are now awake or asleep in each trial. We ask whether this technology can be upgraded efficiently to the case when at each trial every component can be awake or asleep. It is easy get to get an initial regret bound by using combinatorially many experts. However it is open whether there are efficient algorithms achieving the same regret.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Koolen15b.html
  PDF: http://proceedings.mlr.press/v40/Koolen15b.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Koolen15b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Wouter M.
    family: Koolen
  - given: Manfred K.
    family: Warmuth
  - given: Dmitri
    family: Adamskiy
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1764-1766
  id: Koolen15b
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1764
  lastpage: 1766
  published: 2015-06-26 00:00:00 +0000
- title: 'Open Problem: Learning Quantum Circuits with Queries'
  abstract: 'We pose an open problem on the complexity of learning the behavior of a quantum circuit with value injection queries. We define the learning model for quantum circuits and give preliminary results. Using the test-path lemma of Angluin et al. (2009a), we show that new ideas are likely needed to tackle value injection queries for the quantum setting.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Kun15.html
  PDF: http://proceedings.mlr.press/v40/Kun15.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Kun15.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Jeremy
    family: Kun
  - given: Lev
    family: Reyzin
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1767-1769
  id: Kun15
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1767
  lastpage: 1769
  published: 2015-06-26 00:00:00 +0000
- title: 'Open Problem: Recursive Teaching Dimension Versus VC Dimension'
  abstract: 'The Recursive Teaching Dimension (RTD) of a concept class \mathcalC is a complexity parameter referring to the worst-case number of labelled examples needed to learn any target concept in \mathcalC from a teacher following the recursive teaching model. It is the first teaching complexity notion for which interesting relationships to the VC dimension (VCD) have been established. In particular, for finite maximum classes of a given VCD d, the RTD equals d. To date, there is no concept class known for which the ratio of RTD over VCD exceeds 3/2. However, the only known upper bound on RTD in terms of VCD is exponential in the VCD and depends on the size of the concept class. We pose the following question: is the RTD upper-bounded by a function that grows only linearly in the VCD? Answering this question would further our understanding of the relationships between the complexity of teaching and the complexity of learning from randomly chosen examples. In addition, the answer to this question, whether positive or negative, is known to have implications on the study of the long-standing open sample compression conjecture, which claims that every concept class of VCD d has a sample compression scheme in which samples for concepts in the class are compressed to subsets of size no larger than d.'
  volume: 40
  URL: https://proceedings.mlr.press/v40/Simon15b.html
  PDF: http://proceedings.mlr.press/v40/Simon15b.pdf
  edit: https://github.com/mlresearch//v40/edit/gh-pages/_posts/2015-06-26-Simon15b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 28th Conference on Learning Theory'
  publisher: 'PMLR'
  author: 
  - given: Hans U.
    family: Simon
  - given: Sandra
    family: Zilles
  editor: 
  - given: Peter
    family: Grünwald
  - given: Elad
    family: Hazan
  - given: Satyen
    family: Kale
  address: Paris, France
  page: 1770-1772
  id: Simon15b
  issued:
    date-parts: 
      - 2015
      - 6
      - 26
  firstpage: 1770
  lastpage: 1772
  published: 2015-06-26 00:00:00 +0000