- title: 'No Oops, You Won’t Do It Again: Mechanisms for Self-correction in Crowdsourcing'
  abstract: 'Crowdsourcing is a very popular means of obtaining the large amounts of labeled data that modern machine learning methods require. Although cheap and fast to obtain, crowdsourced labels suffer from significant amounts of error, thereby degrading the performance of downstream machine learning tasks. With the goal of improving the quality of the labeled data, we seek to mitigate the many errors that occur due to silly mistakes or inadvertent errors by crowdsourcing workers. We propose a two-stage setting for crowdsourcing where the worker first answers the questions, and is then allowed to change her answers after looking at a (noisy) reference answer. We mathematically formulate this process and develop mechanisms to incentivize workers to act appropriately. Our mathematical guarantees show that our mechanism incentivizes the workers to answer honestly in both stages, and refrain from answering randomly in the first stage or simply copying in the second. Numerical experiments reveal a significant boost in performance that such "self-correction" can provide when using crowdsourcing to train machine learning algorithms.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shaha16.html
  PDF: http://proceedings.mlr.press/v48/shaha16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shaha16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nihar
    family: Shah
  - given: Dengyong
    family: Zhou
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1-10
  id: shaha16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1
  lastpage: 10
  published: 2016-06-11 00:00:00 +0000
- title: 'Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational Issues'
  abstract: 'There are various parametric models for analyzing pairwise comparison data, including the Bradley-Terry-Luce (BTL) and Thurstone models, but their reliance on strong parametric assumptions is limiting. In this work, we study a flexible model for pairwise comparisons, under which the probabilities of outcomes are required only to satisfy a natural form of stochastic transitivity. This class includes parametric models including the BTL and Thurstone models as special cases, but is considerably more general. We provide various examples of models in this broader stochastically transitive class for which classical parametric models provide poor fits. Despite this greater flexibility, we show that the matrix of probabilities can be estimated at the same rate as in standard parametric models. On the other hand, unlike in the BTL and Thurstone models, computing the minimax-optimal estimator in the stochastically transitive model is non-trivial, and we explore various computationally tractable alternatives. We show that a simple singular value thresholding algorithm is statistically consistent but does not achieve the minimax rate. We then propose and study algorithms that achieve the minimax rate over interesting sub-classes of the full stochastically transitive class. We complement our theoretical results with thorough numerical simulations.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shahb16.html
  PDF: http://proceedings.mlr.press/v48/shahb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shahb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nihar
    family: Shah
  - given: Sivaraman
    family: Balakrishnan
  - given: Aditya
    family: Guntuboyina
  - given: Martin
    family: Wainwright
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 11-20
  id: shahb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 11
  lastpage: 20
  published: 2016-06-11 00:00:00 +0000
- title: 'Uprooting and Rerooting Graphical Models'
  abstract: 'We show how any binary pairwise model may be “uprooted” to a fully symmetric model, wherein original singleton potentials are transformed to potentials on edges to an added variable, and then “rerooted” to a new model on the original number of variables. The new model is essentially equivalent to the original model, with the same partition function and allowing recovery of the original marginals or a MAP configuration, yet may have very different computational properties that allow much more efficient inference. This meta-approach deepens our understanding, may be applied to any existing algorithm to yield improved methods in practice, generalizes earlier theoretical results, and reveals a remarkable interpretation of the triplet-consistent polytope.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/weller16.html
  PDF: http://proceedings.mlr.press/v48/weller16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-weller16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Adrian
    family: Weller
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 21-29
  id: weller16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 21
  lastpage: 29
  published: 2016-06-11 00:00:00 +0000
- title: 'A Deep Learning Approach to Unsupervised Ensemble Learning'
  abstract: 'We show how deep learning methods can be applied in the context of crowdsourcing and unsupervised ensemble learning. First, we prove that the popular model of Dawid and Skene, which assumes that all classifiers are conditionally independent, is \em equivalent to a Restricted Boltzmann Machine (RBM) with a single hidden node. Hence, under this model, the posterior probabilities of the true labels can be instead estimated via a trained RBM. Next, to address the more general case, where classifiers may strongly violate the conditional independence assumption, we propose to apply RBM-based Deep Neural Net (DNN). Experimental results on various simulated and real-world datasets demonstrate that our proposed DNN approach outperforms other state-of-the-art methods, in particular when the data violates the conditional independence assumption.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shaham16.html
  PDF: http://proceedings.mlr.press/v48/shaham16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shaham16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Uri
    family: Shaham
  - given: Xiuyuan
    family: Cheng
  - given: Omer
    family: Dror
  - given: Ariel
    family: Jaffe
  - given: Boaz
    family: Nadler
  - given: Joseph
    family: Chang
  - given: Yuval
    family: Kluger
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 30-39
  id: shaham16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 30
  lastpage: 39
  published: 2016-06-11 00:00:00 +0000
- title: 'Revisiting Semi-Supervised Learning with Graph Embeddings'
  abstract: 'We present a semi-supervised learning framework based on graph embeddings. Given a graph between instances, we train an embedding for each instance to jointly predict the class label and the neighborhood context in the graph. We develop both transductive and inductive variants of our method. In the transductive variant of our method, the class labels are determined by both the learned embeddings and input feature vectors, while in the inductive variant, the embeddings are defined as a parametric function of the feature vectors, so predictions can be made on instances not seen during training. On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/yanga16.html
  PDF: http://proceedings.mlr.press/v48/yanga16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-yanga16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhilin
    family: Yang
  - given: William
    family: Cohen
  - given: Ruslan
    family: Salakhudinov
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 40-48
  id: yanga16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 40
  lastpage: 48
  published: 2016-06-11 00:00:00 +0000
- title: 'Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization'
  abstract: 'Reinforcement learning can acquire complex behaviors from high-level specifications. However, defining a cost function that can be optimized effectively and encodes the correct task is challenging in practice. We explore how inverse optimal control (IOC) can be used to learn behaviors from demonstrations, with applications to torque control of high-dimensional robotic systems. Our method addresses two key challenges in inverse optimal control: first, the need for informative features and effective regularization to impose structure on the cost, and second, the difficulty of learning the cost function under unknown dynamics for high-dimensional continuous systems. To address the former challenge, we present an algorithm capable of learning arbitrary nonlinear cost functions, such as neural networks, without meticulous feature engineering. To address the latter challenge, we formulate an efficient sample-based approximation for MaxEnt IOC. We evaluate our method on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvement over prior methods both in terms of task complexity and sample efficiency.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/finn16.html
  PDF: http://proceedings.mlr.press/v48/finn16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-finn16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chelsea
    family: Finn
  - given: Sergey
    family: Levine
  - given: Pieter
    family: Abbeel
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 49-58
  id: finn16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 49
  lastpage: 58
  published: 2016-06-11 00:00:00 +0000
- title: 'Diversity-Promoting Bayesian Learning of Latent Variable Models'
  abstract: 'In learning latent variable models (LVMs), it is important to effectively capture infrequent patterns and shrink model size without sacrificing modeling power. Various studies have been done to “diversify” a LVM, which aim to learn a diverse set of latent components in LVMs. Most existing studies fall into a frequentist-style regularization framework, where the components are learned via point estimation. In this paper, we investigate how to “diversify” LVMs in the paradigm of Bayesian learning, which has advantages complementary to point estimation, such as alleviating overfitting via model averaging and quantifying uncertainty. We propose two approaches that have complementary advantages. One is to define diversity-promoting mutual angular priors which assign larger density to components with larger mutual angles based on Bayesian network and von Mises-Fisher distribution and use these priors to affect the posterior via Bayes rule. We develop two efficient approximate posterior inference algorithms based on variational inference and Markov chain Monte Carlo sampling. The other approach is to impose diversity-promoting regularization directly over the post-data distribution of components. These two methods are applied to the Bayesian mixture of experts model to encourage the “experts” to be diverse and experimental results demonstrate the effectiveness and efficiency of our methods.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/xiea16.html
  PDF: http://proceedings.mlr.press/v48/xiea16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-xiea16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Pengtao
    family: Xie
  - given: Jun
    family: Zhu
  - given: Eric
    family: Xing
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 59-68
  id: xiea16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 59
  lastpage: 68
  published: 2016-06-11 00:00:00 +0000
- title: 'Additive Approximations in High Dimensional Nonparametric Regression via the SALSA'
  abstract: 'High dimensional nonparametric regression is an inherently difficult problem with known lower bounds depending exponentially in dimension. A popular strategy to alleviate this curse of dimensionality has been to use additive models of \emphfirst order, which model the regression function as a sum of independent functions on each dimension. Though useful in controlling the variance of the estimate, such models are often too restrictive in practical settings. Between non-additive models which often have large variance and first order additive models which have large bias, there has been little work to exploit the trade-off in the middle via additive models of intermediate order. In this work, we propose salsa, which bridges this gap by allowing interactions between variables, but controls model capacity by limiting the order of interactions. salsas minimises the residual sum of squares with squared RKHS norm penalties. Algorithmically, it can be viewed as Kernel Ridge Regression with an additive kernel. When the regression function is additive, the excess risk is only polynomial in dimension. Using the Girard-Newton formulae, we efficiently sum over a combinatorial number of terms in the additive expansion. Via a comparison on 15 real datasets, we show that our method is competitive against 21 other alternatives.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kandasamy16.html
  PDF: http://proceedings.mlr.press/v48/kandasamy16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kandasamy16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Kirthevasan
    family: Kandasamy
  - given: Yaoliang
    family: Yu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 69-78
  id: kandasamy16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 69
  lastpage: 78
  published: 2016-06-11 00:00:00 +0000
- title: 'Hawkes Processes with Stochastic Excitations'
  abstract: 'We propose an extension to Hawkes processes by treating the levels of self-excitation as a stochastic differential equation. Our new point process allows better approximation in application domains where events and intensities accelerate each other with correlated levels of contagion. We generalize a recent algorithm for simulating draws from Hawkes processes whose levels of excitation are stochastic processes, and propose a hybrid Markov chain Monte Carlo approach for model fitting. Our sampling procedure scales linearly with the number of required events and does not require stationarity of the point process. A modular inference procedure consisting of a combination between Gibbs and Metropolis Hastings steps is put forward. We recover expectation maximization as a special case. Our general approach is illustrated for contagion following geometric Brownian motion and exponential Langevin dynamics.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/leea16.html
  PDF: http://proceedings.mlr.press/v48/leea16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-leea16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Young
    family: Lee
  - given: Kar Wai
    family: Lim
  - given: Cheng Soon
    family: Ong
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 79-88
  id: leea16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 79
  lastpage: 88
  published: 2016-06-11 00:00:00 +0000
- title: 'Data-driven Rank Breaking for Efficient Rank Aggregation'
  abstract: 'Rank aggregation systems collect ordinal preferences from individuals to produce a global ranking that represents the social preference. To reduce the computational complexity of learning the global ranking, a common practice is to use rank-breaking. Individuals’ preferences are broken into pairwise comparisons and then applied to efficient algorithms tailored for independent pairwise comparisons. However, due to the ignored dependencies, naive rank-breaking approaches can result in inconsistent estimates. The key idea to produce unbiased and accurate estimates is to treat the paired comparisons outcomes unequally, depending on the topology of the collected data. In this paper, we provide the optimal rank-breaking estimator, which not only achieves consistency but also achieves the best error bound. This allows us to characterize the fundamental tradeoff between accuracy and complexity in some canonical scenarios. Further, we identify how the accuracy depends on the spectral gap of a corresponding comparison graph.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/khetan16.html
  PDF: http://proceedings.mlr.press/v48/khetan16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-khetan16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ashish
    family: Khetan
  - given: Sewoong
    family: Oh
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 89-98
  id: khetan16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 89
  lastpage: 98
  published: 2016-06-11 00:00:00 +0000
- title: 'Dropout distillation'
  abstract: 'Dropout is a popular stochastic regularization technique for deep neural networks that works by randomly dropping (i.e. zeroing) units from the network during training. This randomization process allows to implicitly train an ensemble of exponentially many networks sharing the same parametrization, which should be averaged at test time to deliver the final prediction. A typical workaround for this intractable averaging operation consists in scaling the layers undergoing dropout randomization. This simple rule called ’standard dropout’ is efficient, but might degrade the accuracy of the prediction. In this work we introduce a novel approach, coined ’dropout distillation’, that allows us to train a predictor in a way to better approximate the intractable, but preferable, averaging process, while keeping under control its computational efficiency. We are thus able to construct models that are as efficient as standard dropout, or even more efficient, while being more accurate. Experiments on standard benchmark datasets demonstrate the validity of our method, yielding consistent improvements over conventional dropout.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bulo16.html
  PDF: http://proceedings.mlr.press/v48/bulo16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bulo16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Samuel Rota
    family: Bulò
  - given: Lorenzo
    family: Porzi
  - given: Peter
    family: Kontschieder
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 99-107
  id: bulo16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 99
  lastpage: 107
  published: 2016-06-11 00:00:00 +0000
- title: 'Metadata-conscious anonymous messaging'
  abstract: 'Anonymous messaging platforms like Whisper and Yik Yak allow users to spread messages over a network (e.g., a social network) without revealing message authorship to other users. The spread of messages on these platforms can be modeled by a diffusion process over a graph. Recent advances in network analysis have revealed that such diffusion processes are vulnerable to author deanonymization by adversaries with access to metadata, such as timing information. In this work, we ask the fundamental question of how to propagate anonymous messages over a graph to make it difficult for adversaries to infer the source. In particular, we study the performance of a message propagation protocol called adaptive diffusion introduced in (Fanti et al., 2015). We prove that when the adversary has access to metadata at a fraction of corrupted graph nodes, adaptive diffusion achieves asymptotically optimal source-hiding and significantly outperforms standard diffusion. We further demonstrate empirically that adaptive diffusion hides the source effectively on real social networks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/fanti16.html
  PDF: http://proceedings.mlr.press/v48/fanti16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-fanti16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Giulia
    family: Fanti
  - given: Peter
    family: Kairouz
  - given: Sewoong
    family: Oh
  - given: Kannan
    family: Ramchandran
  - given: Pramod
    family: Viswanath
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 108-116
  id: fanti16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 108
  lastpage: 116
  published: 2016-06-11 00:00:00 +0000
- title: 'The Teaching Dimension of Linear Learners'
  abstract: 'Teaching dimension is a learning theoretic quantity that specifies the minimum training set size to teach a target model to a learner. Previous studies on teaching dimension focused on version-space learners which maintain all hypotheses consistent with the training data, and cannot be applied to modern machine learners which select a specific hypothesis via optimization. This paper presents the first known teaching dimension for ridge regression, support vector machines, and logistic regression. We also exhibit optimal training sets that match these teaching dimensions. Our approach generalizes to other linear learners.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/liua16.html
  PDF: http://proceedings.mlr.press/v48/liua16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-liua16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ji
    family: Liu
  - given: Xiaojin
    family: Zhu
  - given: Hrag
    family: Ohannessian
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 117-126
  id: liua16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 117
  lastpage: 126
  published: 2016-06-11 00:00:00 +0000
- title: 'Truthful Univariate Estimators'
  abstract: 'We revisit the classic problem of estimating the population mean of an unknown single-dimensional distribution from samples, taking a game-theoretic viewpoint. In our setting, samples are supplied by strategic agents, who wish to pull the estimate as close as possible to their own value. In this setting, the sample mean gives rise to manipulation opportunities, whereas the sample median does not. Our key question is whether the sample median is the best (in terms of mean squared error) truthful estimator of the population mean. We show that when the underlying distribution is symmetric, there are truthful estimators that dominate the median. Our main result is a characterization of worst-case optimal truthful estimators, which provably outperform the median, for possibly asymmetric distributions with bounded support.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/caragiannis16.html
  PDF: http://proceedings.mlr.press/v48/caragiannis16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-caragiannis16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ioannis
    family: Caragiannis
  - given: Ariel
    family: Procaccia
  - given: Nisarg
    family: Shah
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 127-135
  id: caragiannis16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 127
  lastpage: 135
  published: 2016-06-11 00:00:00 +0000
- title: 'Why Regularized Auto-Encoders learn Sparse Representation?'
  abstract: 'Sparse distributed representation is the key to learning useful features in deep learning algorithms, because not only it is an efficient mode of data representation, but also – more importantly – it captures the generation process of most real world data. While a number of regularized auto-encoders (AE) enforce sparsity explicitly in their learned representation and others don’t, there has been little formal analysis on what encourages sparsity in these models in general. Our objective is to formally study this general problem for regularized auto-encoders. We provide sufficient conditions on both regularization and activation functions that encourage sparsity. We show that multiple popular models (de-noising and contractive auto encoders, e.g.) and activations (rectified linear and sigmoid, e.g.) satisfy these conditions; thus, our conditions help explain sparsity in their learned representation. Thus our theoretical and empirical analysis together shed light on the properties of regularization/activation that are conductive to sparsity and unify a number of existing auto-encoder models and activation functions under the same analytical framework.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/arpita16.html
  PDF: http://proceedings.mlr.press/v48/arpita16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-arpita16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Devansh
    family: Arpit
  - given: Yingbo
    family: Zhou
  - given: Hung
    family: Ngo
  - given: Venu
    family: Govindaraju
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 136-144
  id: arpita16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 136
  lastpage: 144
  published: 2016-06-11 00:00:00 +0000
- title: 'k-variates++: more pluses in the k-means++'
  abstract: 'k-means++ seeding has become a de facto standard for hard clustering algorithms. In this paper, our first contribution is a two-way generalisation of this seeding, k-variates++, that includes the sampling of general densities rather than just a discrete set of Dirac densities anchored at the point locations, *and* a generalisation of the well known Arthur-Vassilvitskii (AV) approximation guarantee, in the form of a *bias+variance* approximation bound of the *global* optimum. This approximation exhibits a reduced dependency on the "noise" component with respect to the optimal potential — actually approaching the statistical lower bound. We show that k-variates++ *reduces* to efficient (biased seeding) clustering algorithms tailored to specific frameworks; these include distributed, streaming and on-line clustering, with *direct* approximation results for these algorithms. Finally, we present a novel application of k-variates++ to differential privacy. For either the specific frameworks considered here, or for the differential privacy setting, there is little to no prior results on the direct application of k-means++ and its approximation bounds — state of the art contenders appear to be significantly more complex and / or display less favorable (approximation) properties. We stress that our algorithms can still be run in cases where there is *no* closed form solution for the population minimizer. We demonstrate the applicability of our analysis via experimental evaluation on several domains and settings, displaying competitive performances vs state of the art.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/nock16.html
  PDF: http://proceedings.mlr.press/v48/nock16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-nock16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Richard
    family: Nock
  - given: Raphael
    family: Canyasse
  - given: Roksana
    family: Boreli
  - given: Frank
    family: Nielsen
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 145-154
  id: nock16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 145
  lastpage: 154
  published: 2016-06-11 00:00:00 +0000
- title: 'Multi-Player Bandits – a Musical Chairs Approach'
  abstract: 'We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free algorithm (Musical Chairs) which attains constant regret with high probability, as well as a sublinear-regret, communication-free algorithm (Dynamic Musical Chairs) for the more difficult setting of players dynamically entering and leaving throughout the game. Moreover, both algorithms do not require prior knowledge of the number of players. To the best of our knowledge, these are the first communication-free algorithms with these types of formal guarantees.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/rosenski16.html
  PDF: http://proceedings.mlr.press/v48/rosenski16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-rosenski16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jonathan
    family: Rosenski
  - given: Ohad
    family: Shamir
  - given: Liran
    family: Szlak
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 155-163
  id: rosenski16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 155
  lastpage: 163
  published: 2016-06-11 00:00:00 +0000
- title: 'The Information Sieve'
  abstract: 'We introduce a new framework for unsupervised learning of representations based on a novel hierarchical decomposition of information. Intuitively, data is passed through a series of progressively fine-grained sieves. Each layer of the sieve recovers a single latent factor that is maximally informative about multivariate dependence in the data. The data is transformed after each pass so that the remaining unexplained information trickles down to the next layer. Ultimately, we are left with a set of latent factors explaining all the dependence in the original data and remainder information consisting of independent noise. We present a practical implementation of this framework for discrete variables and apply it to a variety of fundamental tasks in unsupervised learning including independent component analysis, lossy and lossless compression, and predicting missing values in data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/steeg16.html
  PDF: http://proceedings.mlr.press/v48/steeg16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-steeg16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Greg Ver
    family: Steeg
  - given: Aram
    family: Galstyan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 164-172
  id: steeg16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 164
  lastpage: 172
  published: 2016-06-11 00:00:00 +0000
- title: 'Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin'
  abstract: 'We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech–two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, enabling experiments that previously took weeks to now run in days. This allows us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/amodei16.html
  PDF: http://proceedings.mlr.press/v48/amodei16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-amodei16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Dario
    family: Amodei
  - given: Sundaram
    family: Ananthanarayanan
  - given: Rishita
    family: Anubhai
  - given: Jingliang
    family: Bai
  - given: Eric
    family: Battenberg
  - given: Carl
    family: Case
  - given: Jared
    family: Casper
  - given: Bryan
    family: Catanzaro
  - given: Qiang
    family: Cheng
  - given: Guoliang
    family: Chen
  - given: Jie
    family: Chen
  - given: Jingdong
    family: Chen
  - given: Zhijie
    family: Chen
  - given: Mike
    family: Chrzanowski
  - given: Adam
    family: Coates
  - given: Greg
    family: Diamos
  - given: Ke
    family: Ding
  - given: Niandong
    family: Du
  - given: Erich
    family: Elsen
  - given: Jesse
    family: Engel
  - given: Weiwei
    family: Fang
  - given: Linxi
    family: Fan
  - given: Christopher
    family: Fougner
  - given: Liang
    family: Gao
  - given: Caixia
    family: Gong
  - given: Awni
    family: Hannun
  - given: Tony
    family: Han
  - given: Lappi
    family: Johannes
  - given: Bing
    family: Jiang
  - given: Cai
    family: Ju
  - given: Billy
    family: Jun
  - given: Patrick
    family: LeGresley
  - given: Libby
    family: Lin
  - given: Junjie
    family: Liu
  - given: Yang
    family: Liu
  - given: Weigao
    family: Li
  - given: Xiangang
    family: Li
  - given: Dongpeng
    family: Ma
  - given: Sharan
    family: Narang
  - given: Andrew
    family: Ng
  - given: Sherjil
    family: Ozair
  - given: Yiping
    family: Peng
  - given: Ryan
    family: Prenger
  - given: Sheng
    family: Qian
  - given: Zongfeng
    family: Quan
  - given: Jonathan
    family: Raiman
  - given: Vinay
    family: Rao
  - given: Sanjeev
    family: Satheesh
  - given: David
    family: Seetapun
  - given: Shubho
    family: Sengupta
  - given: Kavya
    family: Srinet
  - given: Anuroop
    family: Sriram
  - given: Haiyuan
    family: Tang
  - given: Liliang
    family: Tang
  - given: Chong
    family: Wang
  - given: Jidong
    family: Wang
  - given: Kaifu
    family: Wang
  - given: Yi
    family: Wang
  - given: Zhijian
    family: Wang
  - given: Zhiqian
    family: Wang
  - given: Shuang
    family: Wu
  - given: Likai
    family: Wei
  - given: Bo
    family: Xiao
  - given: Wen
    family: Xie
  - given: Yan
    family: Xie
  - given: Dani
    family: Yogatama
  - given: Bin
    family: Yuan
  - given: Jun
    family: Zhan
  - given: Zhenyao
    family: Zhu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 173-182
  id: amodei16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 173
  lastpage: 182
  published: 2016-06-11 00:00:00 +0000
- title: 'On the Consistency of Feature Selection With Lasso for Non-linear Targets'
  abstract: 'An important question in feature selection is whether a selection strategy recovers the “true” set of features, given enough data. We study this question in the context of the popular Least Absolute Shrinkage and Selection Operator (Lasso) feature selection strategy. In particular, we consider the scenario when the model is misspecified so that the learned model is linear while the underlying real target is nonlinear. Surprisingly, we prove that under certain conditions, Lasso is still able to recover the correct features in this case. We also carry out numerical studies to empirically verify the theoretical results and explore the necessity of the conditions under which the proof holds.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zhanga16.html
  PDF: http://proceedings.mlr.press/v48/zhanga16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zhanga16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yue
    family: Zhang
  - given: Weihong
    family: Guo
  - given: Soumya
    family: Ray
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 183-191
  id: zhanga16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 183
  lastpage: 191
  published: 2016-06-11 00:00:00 +0000
- title: 'Minimum Regret Search for Single- and Multi-Task Optimization'
  abstract: 'We propose minimum regret search (MRS), a novel acquisition function for Bayesian optimization. MRS bears similarities with information-theoretic approaches such as entropy search (ES). However, while ES aims in each query at maximizing the information gain with respect to the global maximum, MRS aims at minimizing the expected simple regret of its ultimate recommendation for the optimum. While empirically ES and MRS perform similar in most of the cases, MRS produces fewer outliers with high simple regret than ES. We provide empirical results both for a synthetic single-task optimization problem as well as for a simulated multi-task robotic control problem.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/metzen16.html
  PDF: http://proceedings.mlr.press/v48/metzen16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-metzen16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jan Hendrik
    family: Metzen
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 192-200
  id: metzen16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 192
  lastpage: 200
  published: 2016-06-11 00:00:00 +0000
- title: 'CryptoNets: Applying Neural Networks to Encrypted Data with High Throughput and Accuracy'
  abstract: 'Applying machine learning to a problem which involves medical, financial, or other types of sensitive data, not only requires accurate predictions but also careful attention to maintaining data privacy and security. Legal and ethical requirements may prevent the use of cloud-based machine learning solutions for such tasks. In this work, we will present a method to convert learned neural networks to CryptoNets, neural networks that can be applied to encrypted data. This allows a data owner to send their data in an encrypted form to a cloud service that hosts the network. The encryption ensures that the data remains confidential since the cloud does not have access to the keys needed to decrypt it. Nevertheless, we will show that the cloud service is capable of applying the neural network to the encrypted data to make encrypted predictions, and also return them in encrypted form. These encrypted predictions can be sent back to the owner of the secret key who can decrypt them. Therefore, the cloud service does not gain any information about the raw data nor about the prediction it made. We demonstrate CryptoNets on the MNIST optical character recognition tasks. CryptoNets achieve 99% accuracy and can make around 59000 predictions per hour on a single PC. Therefore, they allow high throughput, accurate, and private predictions.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gilad-bachrach16.html
  PDF: http://proceedings.mlr.press/v48/gilad-bachrach16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gilad-bachrach16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ran
    family: Gilad-Bachrach
  - given: Nathan
    family: Dowlin
  - given: Kim
    family: Laine
  - given: Kristin
    family: Lauter
  - given: Michael
    family: Naehrig
  - given: John
    family: Wernsing
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 201-210
  id: gilad-bachrach16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 201
  lastpage: 210
  published: 2016-06-11 00:00:00 +0000
- title: 'The Variational Nystrom method for large-scale spectral problems'
  abstract: 'Spectral methods for dimensionality reduction and clustering require solving an eigenproblem defined by a sparse affinity matrix. When this matrix is large, one seeks an approximate solution. The standard way to do this is the Nystrom method, which first solves a small eigenproblem considering only a subset of landmark points, and then applies an out-of-sample formula to extrapolate the solution to the entire dataset. We show that by constraining the original problem to satisfy the Nystrom formula, we obtain an approximation that is computationally simple and efficient, but achieves a lower approximation error using fewer landmarks and less runtime. We also study the role of normalization in the computational cost and quality of the resulting solution.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/vladymyrov16.html
  PDF: http://proceedings.mlr.press/v48/vladymyrov16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-vladymyrov16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Max
    family: Vladymyrov
  - given: Miguel
    family: Carreira-Perpinan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 211-220
  id: vladymyrov16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 211
  lastpage: 220
  published: 2016-06-11 00:00:00 +0000
- title: 'Multi-Bias Non-linear Activation in Deep Neural Networks'
  abstract: 'As a widely used non-linear activation, Rectified Linear Unit (ReLU) separates noise and signal in a feature map by learning a threshold or bias. However, we argue that the classification of noise and signal not only depends on the magnitude of responses, but also the context of how the feature responses would be used to detect more abstract patterns in higher layers. In order to output multiple response maps with magnitude in different ranges for a particular visual pattern, existing networks employing ReLU and its variants have to learn a large number of redundant filters. In this paper, we propose a multi-bias non-linear activation (MBA) layer to explore the information hidden in the magnitudes of responses. It is placed after the convolution layer to decouple the responses to a convolution kernel into multiple maps by multi-thresholding magnitudes, thus generating more patterns in the feature space at a low computational cost. It provides great flexibility of selecting responses to different visual patterns in different magnitude ranges to form rich representations in higher layers. Such a simple and yet effective scheme achieves the state-of-the-art performance on several benchmarks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lia16.html
  PDF: http://proceedings.mlr.press/v48/lia16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lia16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hongyang
    family: Li
  - given: Wanli
    family: Ouyang
  - given: Xiaogang
    family: Wang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 221-229
  id: lia16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 221
  lastpage: 229
  published: 2016-06-11 00:00:00 +0000
- title: 'Asymmetric Multi-task Learning Based on Task Relatedness and Loss'
  abstract: 'We propose a novel multi-task learning method that can minimize the effect of negative transfer by allowing asymmetric transfer between the tasks based on task relatedness as well as the amount of individual task losses, which we refer to as Asymmetric Multi-task Learning (AMTL). To tackle this problem, we couple multiple tasks via a sparse, directed regularization graph, that enforces each task parameter to be reconstructed as a sparse combination of other tasks, which are selected based on the task-wise loss. We present two different algorithms to solve this joint learning of the task predictors and the regularization graph. The first algorithm solves for the original learning objective using alternative optimization, and the second algorithm solves an approximation of it using curriculum learning strategy, that learns one task at a time. We perform experiments on multiple datasets for classification and regression, on which we obtain significant improvements in performance over the single task learning and symmetric multitask learning baselines.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/leeb16.html
  PDF: http://proceedings.mlr.press/v48/leeb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-leeb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Giwoong
    family: Lee
  - given: Eunho
    family: Yang
  - given: Sung
    family: Hwang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 230-238
  id: leeb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 230
  lastpage: 238
  published: 2016-06-11 00:00:00 +0000
- title: 'Accurate Robust and Efficient Error Estimation for Decision Trees'
  abstract: 'This paper illustrates a novel approach to the estimation of generalization error of decision tree classifiers. We set out the study of decision tree errors in the context of consistency analysis theory, which proved that the Bayes error can be achieved only if when the number of data samples thrown into each leaf node goes to infinity. For the more challenging and practical case where the sample size is finite or small, a novel sampling error term is introduced in this paper to cope with the small sample problem effectively and efficiently. Extensive experimental results show that the proposed error estimate is superior to the well known K-fold cross validation methods in terms of robustness and accuracy. Moreover it is orders of magnitudes more efficient than cross validation methods.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/fan16.html
  PDF: http://proceedings.mlr.press/v48/fan16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-fan16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Lixin
    family: Fan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 239-247
  id: fan16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 239
  lastpage: 247
  published: 2016-06-11 00:00:00 +0000
- title: 'Fast Stochastic Algorithms for SVD and PCA: Convergence Properties and Convexity'
  abstract: 'We study the convergence properties of the VR-PCA algorithm introduced by (Shamir, 2015) for fast computation of leading singular vectors. We prove several new results, including a formal analysis of a block version of the algorithm, and convergence from random initialization. We also make a few observations of independent interest, such as how pre-initializing with just a single exact power iteration can significantly improve the analysis, and what are the convexity and non-convexity properties of the underlying optimization problem.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shamira16.html
  PDF: http://proceedings.mlr.press/v48/shamira16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shamira16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ohad
    family: Shamir
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 248-256
  id: shamira16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 248
  lastpage: 256
  published: 2016-06-11 00:00:00 +0000
- title: 'Convergence of Stochastic Gradient Descent for PCA'
  abstract: 'We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R^d. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the non-convex nature of the problem, analyzing its performance has been a challenge. In particular, existing guarantees rely on a non-trivial eigengap assumption on the covariance matrix, which is intuitively unnecessary. In this paper, we provide (to the best of our knowledge) the first eigengap-free convergence guarantees for SGD in the context of PCA. This also partially resolves an open problem posed in (Hardt & Price, 2014). Moreover, under an eigengap assumption, we show that the same techniques lead to new SGD convergence guarantees with better dependence on the eigengap.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shamirb16.html
  PDF: http://proceedings.mlr.press/v48/shamirb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shamirb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ohad
    family: Shamir
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 257-265
  id: shamirb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 257
  lastpage: 265
  published: 2016-06-11 00:00:00 +0000
- title: 'Dealbreaker: A Nonlinear Latent Variable Model for Educational Data'
  abstract: 'Statistical models of student responses on assessment questions, such as those in homeworks and exams, enable educators and computer-based personalized learning systems to gain insights into students’ knowledge using machine learning. Popular student-response models, including the Rasch model and item response theory models, represent the probability of a student answering a question correctly using an affine function of latent factors. While such models can accurately predict student responses, their ability to interpret the underlying knowledge structure (which is certainly nonlinear) is limited. In response, we develop a new, nonlinear latent variable model that we call the dealbreaker model, in which a student’s success probability is determined by their weakest concept mastery. We develop efficient parameter inference algorithms for this model using novel methods for nonconvex optimization. We show that the dealbreaker model achieves comparable or better prediction performance as compared to affine models with real-world educational datasets. We further demonstrate that the parameters learned by the dealbreaker model are interpretable—they provide key insights into which concepts are critical (i.e., the “dealbreaker”) to answering a question correctly. We conclude by reporting preliminary results for a movie-rating dataset, which illustrate the broader applicability of the dealbreaker model.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lan16.html
  PDF: http://proceedings.mlr.press/v48/lan16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lan16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Andrew
    family: Lan
  - given: Tom
    family: Goldstein
  - given: Richard
    family: Baraniuk
  - given: Christoph
    family: Studer
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 266-275
  id: lan16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 266
  lastpage: 275
  published: 2016-06-11 00:00:00 +0000
- title: 'A Kernelized Stein Discrepancy for Goodness-of-fit Tests'
  abstract: 'We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein’s identity and the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/liub16.html
  PDF: http://proceedings.mlr.press/v48/liub16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-liub16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Qiang
    family: Liu
  - given: Jason
    family: Lee
  - given: Michael
    family: Jordan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 276-284
  id: liub16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 276
  lastpage: 284
  published: 2016-06-11 00:00:00 +0000
- title: 'Variable Elimination in the Fourier Domain'
  abstract: 'The ability to represent complex high dimensional probability distributions in a compact form is one of the key insights in the field of graphical models. Factored representations are ubiquitous in machine learning and lead to major computational advantages. We explore a different type of compact representation based on discrete Fourier representations, complementing the classical approach based on conditional independencies. We show that a large class of probabilistic graphical models have a compact Fourier representation. This theoretical result opens up an entirely new way of approximating a probability distribution. We demonstrate the significance of this approach by applying it to the variable elimination algorithm. Compared with the traditional bucket representation and other approximate inference algorithms, we obtain significant improvements.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/xue16.html
  PDF: http://proceedings.mlr.press/v48/xue16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-xue16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yexiang
    family: Xue
  - given: Stefano
    family: Ermon
  - given: Ronan Le
    family: Bras
  - given: 
    family: Carla
  - given: Bart
    family: Selman
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 285-294
  id: xue16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 285
  lastpage: 294
  published: 2016-06-11 00:00:00 +0000
- title: 'Low-Rank Matrix Approximation with Stability'
  abstract: 'Low-rank matrix approximation has been widely adopted in machine learning applications with sparse data, such as recommender systems. However, the sparsity of the data, incomplete and noisy, introduces challenges to the algorithm stability – small changes in the training data may significantly change the models. As a result, existing low-rank matrix approximation solutions yield low generalization performance, exhibiting high error variance on the training dataset, and minimizing the training error may not guarantee error reduction on the testing dataset. In this paper, we investigate the algorithm stability problem of low-rank matrix approximations. We present a new algorithm design framework, which (1) introduces new optimization objectives to guide stable matrix approximation algorithm design, and (2) solves the optimization problem to obtain stable low-rank approximation solutions with good generalization performance. Experimental results on real-world datasets demonstrate that the proposed work can achieve better prediction accuracy compared with both state-of-the-art low-rank matrix approximation methods and ensemble methods in recommendation task.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lib16.html
  PDF: http://proceedings.mlr.press/v48/lib16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lib16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Dongsheng
    family: Li
  - given: Chao
    family: Chen
  - given: Qin
    family: Lv
  - given: Junchi
    family: Yan
  - given: Li
    family: Shang
  - given: Stephen
    family: Chu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 295-303
  id: lib16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 295
  lastpage: 303
  published: 2016-06-11 00:00:00 +0000
- title: 'Linking losses for density ratio and class-probability estimation'
  abstract: 'Given samples from two densities p and q, density ratio estimation (DRE) is the problem of estimating the ratio p/q. Two popular discriminative approaches to DRE are KL importance estimation (KLIEP), and least squares importance fitting (LSIF). In this paper, we show that KLIEP and LSIF both employ class-probability estimation (CPE) losses. Motivated by this, we formally relate DRE and CPE, and demonstrate the viability of using existing losses from one problem for the other. For the DRE problem, we show that essentially any CPE loss (eg logistic, exponential) can be used, as this equivalently minimises a Bregman divergence to the true density ratio. We show how different losses focus on accurately modelling different ranges of the density ratio, and use this to design new CPE losses for DRE. For the CPE problem, we argue that the LSIF loss is useful in the regime where one wishes to rank instances with maximal accuracy at the head of the ranking. In the course of our analysis, we establish a Bregman divergence identity that may be of independent interest.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/menon16.html
  PDF: http://proceedings.mlr.press/v48/menon16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-menon16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Aditya
    family: Menon
  - given: Cheng Soon
    family: Ong
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 304-313
  id: menon16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 304
  lastpage: 313
  published: 2016-06-11 00:00:00 +0000
- title: 'Stochastic Variance Reduction for Nonconvex Optimization'
  abstract: 'We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient (SVRG) methods for them. SVRG and related methods have recently surged into prominence for convex optimization given their edge over stochastic gradient descent (SGD); but their theoretical analysis almost exclusively assumes convexity. In contrast, we prove non-asymptotic rates of convergence (to stationary points) of SVRG for nonconvex optimization, and show that it is provably faster than SGD and gradient descent. We also analyze a subclass of nonconvex problems on which SVRG attains linear convergence to the global optimum. We extend our analysis to mini-batch variants of SVRG, showing (theoretical) linear speedup due to minibatching in parallel settings.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/reddi16.html
  PDF: http://proceedings.mlr.press/v48/reddi16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-reddi16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sashank J.
    family: Reddi
  - given: Ahmed
    family: Hefny
  - given: Suvrit
    family: Sra
  - given: Barnabas
    family: Poczos
  - given: Alex
    family: Smola
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 314-323
  id: reddi16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 314
  lastpage: 323
  published: 2016-06-11 00:00:00 +0000
- title: 'Hierarchical Variational Models'
  abstract: 'Black box variational inference allows researchers to easily prototype and evaluate an array of models. Recent advances allow such algorithms to scale to high dimensions. However, a central question remains: How to specify an expressive variational distribution that maintains efficient computation? To address this, we develop hierarchical variational models (HVMs). HVMs augment a variational approximation with a prior on its parameters, which allows it to capture complex structure for both discrete and continuous latent variables. The algorithm we develop is black box, can be used for any HVM, and has the same computational efficiency as the original approximation. We study HVMs on a variety of deep discrete latent variable models. HVMs generalize other expressive variational distributions and maintains higher fidelity to the posterior.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ranganath16.html
  PDF: http://proceedings.mlr.press/v48/ranganath16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ranganath16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Rajesh
    family: Ranganath
  - given: Dustin
    family: Tran
  - given: David
    family: Blei
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 324-333
  id: ranganath16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 324
  lastpage: 333
  published: 2016-06-11 00:00:00 +0000
- title: 'Hierarchical Span-Based Conditional Random Fields for Labeling and Segmenting Events in Wearable Sensor Data Streams'
  abstract: 'The field of mobile health (mHealth) has the potential to yield new insights into health and behavior through the analysis of continuously recorded data from wearable health and activity sensors. In this paper, we present a hierarchical span-based conditional random field model for the key problem of jointly detecting discrete events in such sensor data streams and segmenting these events into high-level activity sessions. Our model includes higher-order cardinality factors and inter-event duration factors to capture domain-specific structure in the label space. We show that our model supports exact MAP inference in quadratic time via dynamic programming, which we leverage to perform learning in the structured support vector machine framework. We apply the model to the problems of smoking and eating detection using four real data sets. Our results show statistically significant improvements in segmentation performance relative to a hierarchical pairwise CRF.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/adams16.html
  PDF: http://proceedings.mlr.press/v48/adams16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-adams16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Roy
    family: Adams
  - given: Nazir
    family: Saleheen
  - given: Edison
    family: Thomaz
  - given: Abhinav
    family: Parate
  - given: Santosh
    family: Kumar
  - given: Benjamin
    family: Marlin
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 334-343
  id: adams16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 334
  lastpage: 343
  published: 2016-06-11 00:00:00 +0000
- title: 'Binary embeddings with structured hashed projections'
  abstract: 'We consider the hashing mechanism for constructing binary embeddings, that involves pseudo-random projections followed by nonlinear (sign function) mappings. The pseudo-random projection is described by a matrix, where not all entries are independent random variables but instead a fixed “budget of randomness” is distributed across the matrix. Such matrices can be edfficiently stored in sub-quadratic or even linear space, provide reduction in randomness usage (i.e. number of required random values), and very often lead to computational speed ups. We prove several theoretical results showing that projections via various structured matrices followed by nonlinear mappings accurately preserve the angular distance between input high-dimensional vectors. To the best of our knowledge, these results are the first that give theoretical ground for the use of general structured matrices in the nonlinear setting. We empirically verify our theoretical findings and show the dependence of learning via structured hashed projections on the performance of neural network as well as nearest neighbor classifier.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/choromanska16.html
  PDF: http://proceedings.mlr.press/v48/choromanska16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-choromanska16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Anna
    family: Choromanska
  - given: Krzysztof
    family: Choromanski
  - given: Mariusz
    family: Bojarski
  - given: Tony
    family: Jebara
  - given: Sanjiv
    family: Kumar
  - given: Yann
    family: LeCun
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 344-353
  id: choromanska16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 344
  lastpage: 353
  published: 2016-06-11 00:00:00 +0000
- title: 'A Variational Analysis of Stochastic Gradient Algorithms'
  abstract: 'Stochastic Gradient Descent (SGD) is an important algorithm in machine learning. With constant learning rates, it is a stochastic process that, after an initial phase of convergence, generates samples from a stationary distribution. We show that SGD with constant rates can be effectively used as an approximate posterior inference algorithm for probabilistic modeling. Specifically, we show how to adjust the tuning parameters of SGD such as to match the resulting stationary distribution to the posterior. This analysis rests on interpreting SGD as a continuous-time stochastic process and then minimizing the Kullback-Leibler divergence between its stationary distribution and the target posterior. (This is in the spirit of variational inference.) In more detail, we model SGD as a multivariate Ornstein-Uhlenbeck process and then use properties of this process to derive the optimal parameters. This theoretical framework also connects SGD to modern scalable inference algorithms; we analyze the recently proposed stochastic gradient Fisher scoring under this perspective. We demonstrate that SGD with properly chosen constant rates gives a new way to optimize hyperparameters in probabilistic models.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/mandt16.html
  PDF: http://proceedings.mlr.press/v48/mandt16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-mandt16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Stephan
    family: Mandt
  - given: Matthew
    family: Hoffman
  - given: David
    family: Blei
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 354-363
  id: mandt16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 354
  lastpage: 363
  published: 2016-06-11 00:00:00 +0000
- title: 'Adaptive Sampling for SGD by Exploiting Side Information'
  abstract: 'This paper proposes a new mechanism for sampling training instances for stochastic gradient descent (SGD) methods by exploiting any side-information associated with the instances (for e.g. class-labels) to improve convergence. Previous methods have either relied on sampling from a distribution defined over training instances or from a static distribution that fixed before training. This results in two problems a) any distribution that is set apriori is independent of how the optimization progresses and b) maintaining a distribution over individual instances could be infeasible in large-scale scenarios. In this paper, we exploit the side information associated with the instances to tackle both problems. More specifically, we maintain a distribution over classes (instead of individual instances) that is adaptively estimated during the course of optimization to give the maximum reduction in the variance of the gradient. Intuitively, we sample more from those regions in space that have a \textitlarger gradient contribution. Our experiments on highly multiclass datasets show that our proposal converge significantly faster than existing techniques.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gopal16.html
  PDF: http://proceedings.mlr.press/v48/gopal16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gopal16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Siddharth
    family: Gopal
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 364-372
  id: gopal16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 364
  lastpage: 372
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning from Multiway Data: Simple and Efficient Tensor Regression'
  abstract: 'Tensor regression has shown to be advantageous in learning tasks with multi-directional relatedness. Given massive multiway data, traditional methods are often too slow to operate on or suffer from memory bottleneck. In this paper, we introduce subsampled tensor projected gradient to solve the problem. Our algorithm is impressively simple and efficient. It is built upon projected gradient method with fast tensor power iterations, leveraging randomized sketching for further acceleration. Theoretical analysis shows that our algorithm converges to the correct solution in fixed number of iterations. The memory requirement grows linearly with the size of the problem. We demonstrate superior empirical performance on both multi-linear multi-task learning and spatio-temporal applications.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/yu16.html
  PDF: http://proceedings.mlr.press/v48/yu16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-yu16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Rose
    family: Yu
  - given: Yan
    family: Liu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 373-381
  id: yu16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 373
  lastpage: 381
  published: 2016-06-11 00:00:00 +0000
- title: 'A Distributed Variational Inference Framework for Unifying Parallel Sparse Gaussian Process Regression Models'
  abstract: 'This paper presents a novel distributed variational inference framework that unifies many parallel sparse Gaussian process regression (SGPR) models for scalable hyperparameter learning with big data. To achieve this, our framework exploits a structure of correlated noise process model that represents the observation noises as a finite realization of a high-order Gaussian Markov random process. By varying the Markov order and covariance function for the noise process model, different variational SGPR models result. This consequently allows the correlation structure of the noise process model to be characterized for which a particular variational SGPR model is optimal. We empirically evaluate the predictive performance and scalability of the distributed variational SGPR models unified by our framework on two real-world datasets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hoang16.html
  PDF: http://proceedings.mlr.press/v48/hoang16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hoang16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Trong Nghia
    family: Hoang
  - given: Quang Minh
    family: Hoang
  - given: Bryan Kian Hsiang
    family: Low
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 382-391
  id: hoang16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 382
  lastpage: 391
  published: 2016-06-11 00:00:00 +0000
- title: 'Online Stochastic Linear Optimization under One-bit Feedback'
  abstract: 'In this paper, we study a special bandit setting of online stochastic linear optimization, where only one-bit of information is revealed to the learner at each round. This problem has found many applications including online advertisement and online recommendation. We assume the binary feedback is a random variable generated from the logit model, and aim to minimize the regret defined by the unknown linear function. Although the existing method for generalized linear bandit can be applied to our problem, the high computational cost makes it impractical for real-world applications. To address this challenge, we develop an efficient online learning algorithm by exploiting particular structures of the observation model. Specifically, we adopt online Newton step to estimate the unknown parameter and derive a tight confidence region based on the exponential concavity of the logistic loss. Our analysis shows that the proposed algorithm achieves a regret bound of O(d\sqrtT), which matches the optimal result of stochastic linear bandits.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zhangb16.html
  PDF: http://proceedings.mlr.press/v48/zhangb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zhangb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Lijun
    family: Zhang
  - given: Tianbao
    family: Yang
  - given: Rong
    family: Jin
  - given: Yichi
    family: Xiao
  - given: Zhi-hua
    family: Zhou
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 392-401
  id: zhangb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 392
  lastpage: 401
  published: 2016-06-11 00:00:00 +0000
- title: 'Adaptive Algorithms for Online Convex Optimization with Long-term Constraints'
  abstract: 'We present an adaptive online gradient descent algorithm to solve online convex optimization problems with long-term constraints, which are constraints that need to be satisfied when accumulated over a finite number of rounds T, but can be violated in intermediate rounds. For some user-defined trade-off parameter βin (0, 1), the proposed algorithm achieves cumulative regret bounds of O(T^maxβ,1_β) and O(T^1_β/2), respectively for the loss and the constraint violations. Our results hold for convex losses, can handle arbitrary convex constraints and rely on a single computationally efficient algorithm. Our contributions improve over the best known cumulative regret bounds of Mahdavi et al. (2012), which are respectively O(T^1/2) and O(T^3/4) for general convex domains, and respectively O(T^2/3) and O(T^2/3) when the domain is further restricted to be a polyhedral set. We supplement the analysis with experiments validating the performance of our algorithm in practice.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/jenatton16.html
  PDF: http://proceedings.mlr.press/v48/jenatton16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-jenatton16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Rodolphe
    family: Jenatton
  - given: Jim
    family: Huang
  - given: Cedric
    family: Archambeau
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 402-411
  id: jenatton16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 402
  lastpage: 411
  published: 2016-06-11 00:00:00 +0000
- title: 'Actively Learning Hemimetrics with Applications to Eliciting User Preferences'
  abstract: 'Motivated by an application of eliciting users’ preferences, we investigate the problem of learning hemimetrics, i.e., pairwise distances among a set of n items that satisfy triangle inequalities and non-negativity constraints. In our application, the (asymmetric) distances quantify private costs a user incurs when substituting one item by another. We aim to learn these distances (costs) by asking the users whether they are willing to switch from one item to another for a given incentive offer. Without exploiting structural constraints of the hemimetric polytope, learning the distances between each pair of items requires Θ(n^2) queries. We propose an active learning algorithm that substantially reduces this sample complexity by exploiting the structural constraints on the version space of hemimetrics. Our proposed algorithm achieves provably-optimal sample complexity for various instances of the task. For example, when the items are embedded into K tight clusters, the sample complexity of our algorithm reduces to O(n K). Extensive experiments on a restaurant recommendation data set support the conclusions of our theoretical analysis.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/singla16.html
  PDF: http://proceedings.mlr.press/v48/singla16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-singla16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Adish
    family: Singla
  - given: Sebastian
    family: Tschiatschek
  - given: Andreas
    family: Krause
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 412-420
  id: singla16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 412
  lastpage: 420
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning Simple Algorithms from Examples'
  abstract: 'We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their ability to abstract the underlying algorithm from training instances and generalize to test examples with many thousands of digits. The controller is trained using Q-learning with several enhancements and we show that the bottleneck is in the capabilities of the controller rather than in the search incurred by Q-learning.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zaremba16.html
  PDF: http://proceedings.mlr.press/v48/zaremba16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zaremba16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Wojciech
    family: Zaremba
  - given: Tomas
    family: Mikolov
  - given: Armand
    family: Joulin
  - given: Rob
    family: Fergus
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 421-429
  id: zaremba16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 421
  lastpage: 429
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning Physical Intuition of Block Towers by Example'
  abstract: 'Wooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data allows us to train large convolutional network models which can accurately predict the outcome, as well as estimating the trajectories of the blocks. The models are also able to generalize in two important ways: (i) to new physical scenarios, e.g. towers with an additional block and (ii) to images of real wooden blocks, where it obtains a performance comparable to human subjects.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lerer16.html
  PDF: http://proceedings.mlr.press/v48/lerer16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lerer16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Adam
    family: Lerer
  - given: Sam
    family: Gross
  - given: Rob
    family: Fergus
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 430-438
  id: lerer16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 430
  lastpage: 438
  published: 2016-06-11 00:00:00 +0000
- title: 'Structure Learning of Partitioned Markov Networks'
  abstract: 'We learn the structure of a Markov Network between two groups of random variables from joint observations. Since modelling and learning the full MN structure may be hard, learning the links between two groups directly may be a preferable option. We introduce a novel concept called the \emphpartitioned ratio whose factorization directly associates with the Markovian properties of random variables across two groups. A simple one-shot convex optimization procedure is proposed for learning the \emphsparse factorizations of the partitioned ratio and it is theoretically guaranteed to recover the correct inter-group structure under mild conditions. The performance of the proposed method is experimentally compared with the state of the art MN structure learning methods using ROC curves. Real applications on analyzing bipartisanship in US congress and pairwise DNA/time-series alignments are also reported.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/liuc16.html
  PDF: http://proceedings.mlr.press/v48/liuc16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-liuc16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Song
    family: Liu
  - given: Taiji
    family: Suzuki
  - given: Masashi
    family: Sugiyama
  - given: Kenji
    family: Fukumizu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 439-448
  id: liuc16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 439
  lastpage: 448
  published: 2016-06-11 00:00:00 +0000
- title: 'Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient'
  abstract: 'This work focuses on dynamic regret of online convex optimization that compares the performance of online learning to a clairvoyant who knows the sequence of loss functions in advance and hence selects the minimizer of the loss function at each step. By assuming that the clairvoyant moves slowly (i.e., the minimizers change slowly), we present several improved variation-based upper bounds of the dynamic regret under the true and noisy gradient feedback, which are \it optimal in light of the presented lower bounds. The key to our analysis is to explore a regularity metric that measures the temporal changes in the clairvoyant’s minimizers, to which we refer as path variation. Firstly, we present a general lower bound in terms of the path variation, and then show that under full information or gradient feedback we are able to achieve an optimal dynamic regret. Secondly, we present a lower bound with noisy gradient feedback and then show that we can achieve optimal dynamic regrets under a stochastic gradient feedback and two-point bandit feedback. Moreover, for a sequence of smooth loss functions that admit a small variation in the gradients, our dynamic regret under the two-point bandit feedback matches that is achieved with full information.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/yangb16.html
  PDF: http://proceedings.mlr.press/v48/yangb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-yangb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tianbao
    family: Yang
  - given: Lijun
    family: Zhang
  - given: Rong
    family: Jin
  - given: Jinfeng
    family: Yi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 449-457
  id: yangb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 449
  lastpage: 457
  published: 2016-06-11 00:00:00 +0000
- title: 'Beyond CCA: Moment Matching for Multi-View Models'
  abstract: 'We introduce three novel semi-parametric extensions of probabilistic canonical correlation analysis with identifiability guarantees. We consider moment matching techniques for estimation in these models. For that, by drawing explicit links between the new models and a discrete version of independent component analysis (DICA), we first extend the DICA cumulant tensors to the new discrete version of CCA. By further using a close connection with independent component analysis, we introduce generalized covariance matrices, which can replace the cumulant tensors in the moment matching framework, and, therefore, improve sample complexity and simplify derivations and algorithms significantly. As the tensor power method or orthogonal joint diagonalization are not applicable in the new setting, we use non-orthogonal joint diagonalization techniques for matching the cumulants. We demonstrate performance of the proposed models and estimation techniques on experiments with both synthetic and real datasets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/podosinnikova16.html
  PDF: http://proceedings.mlr.press/v48/podosinnikova16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-podosinnikova16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Anastasia
    family: Podosinnikova
  - given: Francis
    family: Bach
  - given: Simon
    family: Lacoste-Julien
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 458-467
  id: podosinnikova16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 458
  lastpage: 467
  published: 2016-06-11 00:00:00 +0000
- title: 'Fast methods for estimating the Numerical rank of large matrices'
  abstract: 'We present two computationally inexpensive techniques for estimating the numerical rank of a matrix, combining powerful tools from computational linear algebra. These techniques exploit three key ingredients. The first is to approximate the projector on the non-null invariant subspace of the matrix by using a polynomial filter. Two types of filters are discussed, one based on Hermite interpolation and the other based on Chebyshev expansions. The second ingredient employs stochastic trace estimators to compute the rank of this wanted eigen-projector, which yields the desired rank of the matrix. In order to obtain a good filter, it is necessary to detect a gap between the eigenvalues that correspond to noise and the relevant eigenvalues that correspond to the non-null invariant subspace. The third ingredient of the proposed approaches exploits the idea of spectral density, popular in physics, and the Lanczos spectroscopic method to locate this gap.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ubaru16.html
  PDF: http://proceedings.mlr.press/v48/ubaru16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ubaru16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shashanka
    family: Ubaru
  - given: Yousef
    family: Saad
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 468-477
  id: ubaru16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 468
  lastpage: 477
  published: 2016-06-11 00:00:00 +0000
- title: 'Unsupervised Deep Embedding for Clustering Analysis'
  abstract: 'Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective. Our experimental evaluations on image and text corpora show significant improvement over state-of-the-art methods.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/xieb16.html
  PDF: http://proceedings.mlr.press/v48/xieb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-xieb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Junyuan
    family: Xie
  - given: Ross
    family: Girshick
  - given: Ali
    family: Farhadi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 478-487
  id: xieb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 478
  lastpage: 487
  published: 2016-06-11 00:00:00 +0000
- title: 'Efficient Private Empirical Risk Minimization for High-dimensional Learning'
  abstract: 'Dimensionality reduction is a popular approach for dealing with high dimensional data that leads to substantial computational savings. Random projections are a simple and effective method for universal dimensionality reduction with rigorous theoretical guarantees. In this paper, we theoretically study the problem of differentially private empirical risk minimization in the projected subspace (compressed domain). We ask: is it possible to design differentially private algorithms with small excess risk given access to only projected data? In this paper, we answer this question in affirmative, by showing that for the class of generalized linear functions, given only the projected data and the projection matrix, we can obtain excess risk bounds of $O(w(Theta)^2/3/n^1/3) under eps-differential privacy, and O((w(Theta)/n)^1/2)$ under (eps,delta)-differential privacy, where n is the sample size and w(Theta) is the Gaussian width of the parameter space that we optimize over. A simple consequence of these results is that, for a large class of ERM problems, in the traditional setting (i.e., with access to the original data), under eps-differential privacy, we improve the worst-case risk bounds of Bassily et al. (FOCS 2014).'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kasiviswanathan16.html
  PDF: http://proceedings.mlr.press/v48/kasiviswanathan16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kasiviswanathan16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shiva Prasad
    family: Kasiviswanathan
  - given: Hongxia
    family: Jin
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 488-497
  id: kasiviswanathan16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 488
  lastpage: 497
  published: 2016-06-11 00:00:00 +0000
- title: 'Parameter Estimation for Generalized Thurstone Choice Models'
  abstract: 'We consider the maximum likelihood parameter estimation problem for a generalized Thurstone choice model, where choices are from comparison sets of two or more items. We provide tight characterizations of the mean square error, as well as necessary and sufficient conditions for correct classification when each item belongs to one of two classes. These results provide insights into how the estimation accuracy depends on the choice of a generalized Thurstone choice model and the structure of comparison sets. We find that for a priori unbiased structures of comparisons, e.g., when comparison sets are drawn independently and uniformly at random, the number of observations needed to achieve a prescribed estimation accuracy depends on the choice of a generalized Thurstone choice model. For a broad set of generalized Thurstone choice models, which includes all popular instances used in practice, the estimation error is shown to be largely insensitive to the cardinality of comparison sets. On the other hand, we found that there exist generalized Thurstone choice models for which the estimation error decreases much faster with the cardinality of comparison sets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/vojnovic16.html
  PDF: http://proceedings.mlr.press/v48/vojnovic16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-vojnovic16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Milan
    family: Vojnovic
  - given: Seyoung
    family: Yun
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 498-506
  id: vojnovic16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 498
  lastpage: 506
  published: 2016-06-11 00:00:00 +0000
- title: 'Large-Margin Softmax Loss for Convolutional Neural Networks'
  abstract: 'Cross-entropy loss together with softmax is arguably one of the most common used supervision components in convolutional neural networks (CNNs). Despite its simplicity, popularity and excellent performance, the component does not explicitly encourage discriminative learning of features. In this paper, we propose a generalized large-margin softmax (L-Softmax) loss which explicitly encourages intra-class compactness and inter-class separability between learned features. Moreover, L-Softmax not only can adjust the desired margin but also can avoid overfitting. We also show that the L-Softmax loss can be optimized by typical stochastic gradient descent. Extensive experiments on four benchmark datasets demonstrate that the deeply-learned features with L-softmax loss become more discriminative, hence significantly boosting the performance on a variety of visual classification and verification tasks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/liud16.html
  PDF: http://proceedings.mlr.press/v48/liud16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-liud16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Weiyang
    family: Liu
  - given: Yandong
    family: Wen
  - given: Zhiding
    family: Yu
  - given: Meng
    family: Yang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 507-516
  id: liud16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 507
  lastpage: 516
  published: 2016-06-11 00:00:00 +0000
- title: 'A Random Matrix Approach to Echo-State Neural Networks'
  abstract: 'Recurrent neural networks, especially in their linear version, have provided many qualitative insights on their performance under different configurations. This article provides, through a novel random matrix framework, the quantitative counterpart of these performance results, specifically in the case of echo-state networks. Beyond mere insights, our approach conveys a deeper understanding on the core mechanism under play for both training and testing.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/couillet16.html
  PDF: http://proceedings.mlr.press/v48/couillet16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-couillet16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Romain
    family: Couillet
  - given: Gilles
    family: Wainrib
  - given: Hafiz Tiomoko
    family: Ali
  - given: Harry
    family: Sevi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 517-525
  id: couillet16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 517
  lastpage: 525
  published: 2016-06-11 00:00:00 +0000
- title: 'Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings'
  abstract: 'One-hot CNN (convolutional neural network) has been shown to be effective for text categorization (Johnson & Zhang, 2015). We view it as a special case of a general framework which jointly trains a linear model with a non-linear feature generator consisting of ‘text region embedding + pooling’. Under this framework, we explore a more sophisticated region embedding method using Long Short-Term Memory (LSTM). LSTM can embed text regions of variable (and possibly large) sizes, whereas the region size needs to be fixed in a CNN. We seek effective and efficient use of LSTM for this purpose in the supervised and semi-supervised settings. The best results were obtained by combining region embeddings in the form of LSTM and convolution layers trained on unlabeled data. The results indicate that on this task, embeddings of text regions, which can convey complex concepts, are more useful than embeddings of single words in isolation. We report performances exceeding the previous best results on four benchmark datasets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/johnson16.html
  PDF: http://proceedings.mlr.press/v48/johnson16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-johnson16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Rie
    family: Johnson
  - given: Tong
    family: Zhang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 526-534
  id: johnson16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 526
  lastpage: 534
  published: 2016-06-11 00:00:00 +0000
- title: 'Optimality of Belief Propagation for Crowdsourced Classification'
  abstract: 'Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid (or even non-paid) workers. We study the problem of recovering the true labels from noisy crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap under a simple but canonical scenario where each worker is assigned at most two tasks. In particular, we introduce a tighter lower bound on the fundamental limit and prove that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly la- bel a larger fraction of the tasks. In the general setting, when more than two tasks are assigned to each worker, we establish the dominance result on BP that it outperforms other existing algorithms with known provable guarantees. Experimental results suggest that BP is close to optimal for all regimes considered, while existing state-of-the-art algorithms exhibit suboptimal performances.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ok16.html
  PDF: http://proceedings.mlr.press/v48/ok16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ok16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jungseul
    family: Ok
  - given: Sewoong
    family: Oh
  - given: Jinwoo
    family: Shin
  - given: Yung
    family: Yi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 535-544
  id: ok16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 535
  lastpage: 544
  published: 2016-06-11 00:00:00 +0000
- title: 'Stability of Controllers for Gaussian Process Forward Models'
  abstract: 'Learning control has become an appealing alternative to the derivation of control laws based on classic control theory. However, a major shortcoming of learning control is the lack of performance guarantees which prevents its application in many real-world scenarios. As a step in this direction, we provide a stability analysis tool for controllers acting on dynamics represented by Gaussian processes (GPs). We consider arbitrary Markovian control policies and system dynamics given as (i) the mean of a GP, and (ii) the full GP distribution. For the first case, our tool finds a state space region, where the closed-loop system is provably stable. In the second case, it is well known that infinite horizon stability guarantees cannot exist. Instead, our tool analyzes finite time stability. Empirical evaluations on simulated benchmark problems support our theoretical results.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/vinogradska16.html
  PDF: http://proceedings.mlr.press/v48/vinogradska16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-vinogradska16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Julia
    family: Vinogradska
  - given: Bastian
    family: Bischoff
  - given: Duy
    family: Nguyen-Tuong
  - given: Anne
    family: Romer
  - given: Henner
    family: Schmidt
  - given: Jan
    family: Peters
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 545-554
  id: vinogradska16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 545
  lastpage: 554
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning privately from multiparty data'
  abstract: 'Learning a classifier from private data distributed across multiple parties is an important problem that has many potential applications. How can we build an accurate and differentially private global classifier by combining locally-trained classifiers from different parties, without access to any party’s private data? We propose to transfer the “knowledge” of the local classifier ensemble by first creating labeled data from auxiliary unlabeled data, and then train a global differentially private classifier. We show that majority voting is too sensitive and therefore propose a new risk weighted by class probabilities estimated from the ensemble. Relative to a non-private solution, our private solution has a generalization error bounded by O(ε^-2 M^-2). This allows strong privacy without performance loss when the number of participating parties M is large, such as in crowdsensing applications. We demonstrate the performance of our framework with realistic tasks of activity recognition, network intrusion detection, and malicious URL detection.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hamm16.html
  PDF: http://proceedings.mlr.press/v48/hamm16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hamm16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jihun
    family: Hamm
  - given: Yingjun
    family: Cao
  - given: Mikhail
    family: Belkin
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 555-563
  id: hamm16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 555
  lastpage: 563
  published: 2016-06-11 00:00:00 +0000
- title: 'Network Morphism'
  abstract: 'We present a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. We define this as network morphism in this research. After morphing a parent network, the child network is expected to inherit the knowledge from its parent network and also has the potential to continue growing into a more powerful one with much shortened training time. The first requirement for this network morphism is its ability to handle diverse morphing types of networks, including changes of depth, width, kernel size, and even subnet. To meet this requirement, we first introduce the network morphism equations, and then develop novel morphing algorithms for all these morphing types for both classic and convolutional neural networks. The second requirement is its ability to deal with non-linearity in a network. We propose a family of parametric-activation functions to facilitate the morphing of any continuous non-linear activation neurons. Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wei16.html
  PDF: http://proceedings.mlr.press/v48/wei16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wei16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tao
    family: Wei
  - given: Changhu
    family: Wang
  - given: Yong
    family: Rui
  - given: Chang Wen
    family: Chen
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 564-572
  id: wei16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 564
  lastpage: 572
  published: 2016-06-11 00:00:00 +0000
- title: 'A Kronecker-factored approximate Fisher matrix for convolution layers'
  abstract: 'Second-order optimization methods such as natural gradient descent have the potential to speed up training of neural networks by correcting for the curvature of the loss function. Unfortunately, the exact natural gradient is impractical to compute for large models, and most approximations either require an expensive iterative procedure or make crude approximations to the curvature. We present Kronecker Factors for Convolution (KFC), a tractable approximation to the Fisher matrix for convolutional networks based on a structured probabilistic model for the distribution over backpropagated derivatives. Similarly to the recently proposed Kronecker-Factored Approximate Curvature (K-FAC), each block of the approximate Fisher matrix decomposes as the Kronecker product of small matrices, allowing for efficient inversion. KFC captures important curvature information while still yielding comparably efficient updates to stochastic gradient descent (SGD). We show that the updates are invariant to commonly used reparameterizations, such as centering of the activations. In our experiments, approximate natural gradient descent with KFC was able to train convolutional networks several times faster than carefully tuned SGD. Furthermore, it was able to train the networks in 10-20 times fewer iterations than SGD, suggesting its potential applicability in a distributed setting.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/grosse16.html
  PDF: http://proceedings.mlr.press/v48/grosse16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-grosse16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Roger
    family: Grosse
  - given: James
    family: Martens
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 573-582
  id: grosse16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 573
  lastpage: 582
  published: 2016-06-11 00:00:00 +0000
- title: 'Experimental Design on a Budget for Sparse Linear Models and Applications'
  abstract: 'Budget constrained optimal design of experiments is a classical problem in statistics. Although the optimal design literature is very mature, few efficient strategies are available when these design problems appear in the context of sparse linear models commonly encountered in high dimensional machine learning and statistics. In this work, we study experimental design for the setting where the underlying regression model is characterized by a \ell_1-regularized linear function. We propose two novel strategies: the first is motivated geometrically whereas the second is algebraic in nature. We obtain tractable algorithms for this problem and also hold for a more general class of sparse linear models. We perform an extensive set of experiments, on benchmarks and a large multi-site neuroscience study, showing that the proposed models are effective in practice. The latter experiment suggests that these ideas may play a small role in informing enrollment strategies for similar scientific studies in the short-to-medium term future.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ravi16.html
  PDF: http://proceedings.mlr.press/v48/ravi16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ravi16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sathya Narayanan
    family: Ravi
  - given: Vamsi
    family: Ithapu
  - given: Sterling
    family: Johnson
  - given: Vikas
    family: Singh
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 583-592
  id: ravi16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 583
  lastpage: 592
  published: 2016-06-11 00:00:00 +0000
- title: 'Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs'
  abstract: 'In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the block suboptimality that can be used as an *adaptive* criterion. First, we sample objects at each iteration of BCFW in an adaptive non-uniform way via gap-based sampling. Second, we incorporate pairwise and away-step variants of Frank-Wolfe into the block-coordinate setting. Third, we cache oracle calls with a cache-hit criterion based on the block gaps. Fourth, we provide the first method to compute an approximate regularization path for SSVM. Finally, we provide an exhaustive empirical evaluation of all our methods on four structured prediction datasets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/osokin16.html
  PDF: http://proceedings.mlr.press/v48/osokin16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-osokin16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Anton
    family: Osokin
  - given: Jean-Baptiste
    family: Alayrac
  - given: Isabella
    family: Lukasewitz
  - given: Puneet
    family: Dokania
  - given: Simon
    family: Lacoste-Julien
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 593-602
  id: osokin16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 593
  lastpage: 602
  published: 2016-06-11 00:00:00 +0000
- title: 'Exact Exponent in Optimal Rates for Crowdsourcing'
  abstract: 'Crowdsourcing has become a popular tool for labeling large datasets. This paper studies the optimal error rate for aggregating crowdsourced labels provided by a collection of amateur workers. Under the Dawid-Skene probabilistic model, we establish matching upper and lower bounds with an exact exponent mI(\pi), where m is the number of workers and I(\pi) is the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement m \ge \frac1I(\pi)\log\frac1ε in order to achieve an εmisclassification error. In addition, our results imply optimality of various forms of EM algorithms given accurate initializers of the model parameters.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gaoa16.html
  PDF: http://proceedings.mlr.press/v48/gaoa16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gaoa16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chao
    family: Gao
  - given: Yu
    family: Lu
  - given: Dengyong
    family: Zhou
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 603-611
  id: gaoa16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 603
  lastpage: 611
  published: 2016-06-11 00:00:00 +0000
- title: 'Augmenting Supervised Neural Networks with Unsupervised Objectives for Large-scale Image Classification'
  abstract: 'Unsupervised learning and supervised learning are key research topics in deep learning. However, as high-capacity supervised neural networks trained with a large amount of labels have achieved remarkable success in many computer vision tasks, the availability of large-scale labeled images reduced the significance of unsupervised learning. Inspired by the recent trend toward revisiting the importance of unsupervised learning, we investigate joint supervised and unsupervised learning in a large-scale setting by augmenting existing neural networks with decoding pathways for reconstruction. First, we demonstrate that the intermediate activations of pretrained large-scale classification networks preserve almost all the information of input images except a portion of local spatial details. Then, by end-to-end training of the entire augmented architecture with the reconstructive objective, we show improvement of the network performance for supervised tasks. We evaluate several variants of autoencoders, including the recently proposed “what-where" autoencoder that uses the encoder pooling switches, to study the importance of the architecture design. Taking the 16-layer VGGNet trained under the ImageNet ILSVRC 2012 protocol as a strong baseline for image classification, our methods improve the validation-set accuracy by a noticeable margin.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zhangc16.html
  PDF: http://proceedings.mlr.press/v48/zhangc16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zhangc16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yuting
    family: Zhang
  - given: Kibok
    family: Lee
  - given: Honglak
    family: Lee
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 612-621
  id: zhangc16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 612
  lastpage: 621
  published: 2016-06-11 00:00:00 +0000
- title: 'Online Low-Rank Subspace Clustering by Basis Dictionary Pursuit'
  abstract: 'Low-Rank Representation (LRR) has been a significant method for segmenting data that are generated from a union of subspaces. It is also known that solving LRR is challenging in terms of time complexity and memory footprint, in that the size of the nuclear norm regularized matrix is n-by-n (where n is the number of samples). In this paper, we thereby develop a novel online implementation of LRR that reduces the memory cost from O(n^2) to O(pd), with p being the ambient dimension and d being some estimated rank (d < p < n). We also establish the theoretical guarantee that the sequence of solutions produced by our algorithm converges to a stationary point of the expected loss function asymptotically. Extensive experiments on synthetic and realistic datasets further substantiate that our algorithm is fast, robust and memory efficient.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shen16.html
  PDF: http://proceedings.mlr.press/v48/shen16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shen16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jie
    family: Shen
  - given: Ping
    family: Li
  - given: Huan
    family: Xu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 622-631
  id: shen16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 622
  lastpage: 631
  published: 2016-06-11 00:00:00 +0000
- title: 'A Self-Correcting Variable-Metric Algorithm for Stochastic Optimization'
  abstract: 'An algorithm for stochastic (convex or nonconvex) optimization is presented. The algorithm is variable-metric in the sense that, in each iteration, the step is computed through the product of a symmetric positive definite scaling matrix and a stochastic (mini-batch) gradient of the objective function, where the sequence of scaling matrices is updated dynamically by the algorithm. A key feature of the algorithm is that it does not overly restrict the manner in which the scaling matrices are updated. Rather, the algorithm exploits fundamental self-correcting properties of BFGS-type updating—properties that have been over-looked in other attempts to devise quasi-Newton methods for stochastic optimization. Numerical experiments illustrate that the method and a limited memory variant of it are stable and outperform (mini-batch) stochastic gradient and other quasi-Newton methods when employed to solve a few machine learning problems.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/curtis16.html
  PDF: http://proceedings.mlr.press/v48/curtis16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-curtis16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Frank
    family: Curtis
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 632-641
  id: curtis16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 632
  lastpage: 641
  published: 2016-06-11 00:00:00 +0000
- title: 'Stochastic Quasi-Newton Langevin Monte Carlo'
  abstract: 'Recently, Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) methods have been proposed for scaling up Monte Carlo computations to large data problems. Whilst these approaches have proven useful in many applications, vanilla SG-MCMC might suffer from poor mixing rates when random variables exhibit strong couplings under the target densities or big scale differences. In this study, we propose a novel SG-MCMC method that takes the local geometry into account by using ideas from Quasi-Newton optimization methods. These second order methods directly approximate the inverse Hessian by using a limited history of samples and their gradients. Our method uses dense approximations of the inverse Hessian while keeping the time and memory complexities linear with the dimension of the problem. We provide a formal theoretical analysis where we show that the proposed method is asymptotically unbiased and consistent with the posterior expectations. We illustrate the effectiveness of the approach on both synthetic and real datasets. Our experiments on two challenging applications show that our method achieves fast convergence rates similar to Riemannian approaches while at the same time having low computational requirements similar to diagonal preconditioning approaches.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/simsekli16.html
  PDF: http://proceedings.mlr.press/v48/simsekli16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-simsekli16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Umut
    family: Simsekli
  - given: Roland
    family: Badeau
  - given: Taylan
    family: Cemgil
  - given: Gaël
    family: Richard
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 642-651
  id: simsekli16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 642
  lastpage: 651
  published: 2016-06-11 00:00:00 +0000
- title: 'Doubly Robust Off-policy Value Evaluation for Reinforcement Learning'
  abstract: 'We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL to real-world problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the doubly robust estimator for bandits to sequential decision-making problems, which gets the best of both worlds: it is guaranteed to be unbiased and can have a much lower variance than the popular importance sampling estimators. We demonstrate the estimator’s accuracy in several benchmark problems, and illustrate its use as a subroutine in safe policy improvement. We also provide theoretical results on the inherent hardness of the problem, and show that our estimator can match the lower bound in certain scenarios.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/jiang16.html
  PDF: http://proceedings.mlr.press/v48/jiang16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-jiang16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nan
    family: Jiang
  - given: Lihong
    family: Li
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 652-661
  id: jiang16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 652
  lastpage: 661
  published: 2016-06-11 00:00:00 +0000
- title: 'Fast Rate Analysis of Some Stochastic Optimization Algorithms'
  abstract: 'In this paper, we revisit three fundamental and popular stochastic optimization algorithms (namely, Online Proximal Gradient, Regularized Dual Averaging method and ADMM with online proximal gradient) and analyze their convergence speed under conditions weaker than those in literature. In particular, previous works showed that these algorithms converge at a rate of O (\ln T/T) when the loss function is strongly convex, and O (1 /\sqrtT) in the weakly convex case. In contrast, we relax the strong convexity assumption of the loss function, and show that the algorithms converge at a rate O (\ln T/T) if the \em expectation of the loss function is \em locally strongly convex. This is a much weaker assumption and is satisfied by many practical formulations including Lasso and Logistic Regression. Our analysis thus extends the applicability of these three methods, as well as provides a general recipe for improving analysis of convergence rate for stochastic and online optimization algorithms.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/qua16.html
  PDF: http://proceedings.mlr.press/v48/qua16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-qua16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chao
    family: Qu
  - given: Huan
    family: Xu
  - given: Chong
    family: Ong
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 662-670
  id: qua16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 662
  lastpage: 670
  published: 2016-06-11 00:00:00 +0000
- title: 'Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing'
  abstract: 'Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by most existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality of the space and sub-linear in the intrinsic dimensionality and the size of the dataset and takes space constant in dimensionality of the space and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in data density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lic16.html
  PDF: http://proceedings.mlr.press/v48/lic16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lic16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ke
    family: Li
  - given: Jitendra
    family: Malik
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 671-679
  id: lic16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 671
  lastpage: 679
  published: 2016-06-11 00:00:00 +0000
- title: 'Smooth Imitation Learning for Online Sequence Prediction'
  abstract: 'We study the problem of smooth imitation learning for online sequence prediction, where the goal is to train a policy that can smoothly imitate demonstrated behavior in a dynamic and continuous environment in response to online, sequential context input. Since the mapping from context to behavior is often complex, we take a learning reduction approach to reduce smooth imitation learning to a regression problem using complex function classes that are regularized to ensure smoothness. We present a learning meta-algorithm that achieves fast and stable convergence to a good policy. Our approach enjoys several attractive properties, including being fully deterministic, employing an adaptive learning rate that can provably yield larger policy improvements compared to previous approaches, and the ability to ensure stable convergence. Our empirical results demonstrate significant performance gains over previous approaches.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/le16.html
  PDF: http://proceedings.mlr.press/v48/le16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-le16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hoang
    family: Le
  - given: Andrew
    family: Kang
  - given: Yisong
    family: Yue
  - given: Peter
    family: Carr
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 680-688
  id: le16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 680
  lastpage: 688
  published: 2016-06-11 00:00:00 +0000
- title: 'Community Recovery in Graphs with Locality'
  abstract: 'Motivated by applications in domains such as social networks and computational biology, we study the problem of community recovery in graphs with locality. In this problem, pairwise noisy measurements of whether two nodes are in the same community or different communities come mainly or exclusively from nearby nodes rather than uniformly sampled between all node pairs, as in most existing models. We present two algorithms that run nearly linearly in the number of measurements and which achieve the information limits for exact recovery.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/chena16.html
  PDF: http://proceedings.mlr.press/v48/chena16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-chena16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yuxin
    family: Chen
  - given: Govinda
    family: Kamath
  - given: Changho
    family: Suh
  - given: David
    family: Tse
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 689-698
  id: chena16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 689
  lastpage: 698
  published: 2016-06-11 00:00:00 +0000
- title: 'Variance Reduction for Faster Non-Convex Optimization'
  abstract: 'We consider the fundamental problem in non-convex optimization of efficiently reaching a stationary point. In contrast to the convex case, in the long history of this basic problem, the only known theoretical results on first-order non-convex optimization remain to be full gradient descent that converges in O(1/\varepsilon) iterations for smooth objectives, and stochastic gradient descent that converges in O(1/\varepsilon^2) iterations for objectives that are sum of smooth functions. We provide the first improvement in this line of research. Our result is based on the variance reduction trick recently introduced to convex optimization, as well as a brand new analysis of variance reduction that is suitable for non-convex optimization. For objectives that are sum of smooth functions, our first-order minibatch stochastic method converges with an O(1/\varepsilon) rate, and is faster than full gradient descent by Ω(n^1/3). We demonstrate the effectiveness of our methods on empirical risk minimizations with non-convex loss functions and training neural nets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/allen-zhua16.html
  PDF: http://proceedings.mlr.press/v48/allen-zhua16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-allen-zhua16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zeyuan
    family: Allen-Zhu
  - given: Elad
    family: Hazan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 699-707
  id: allen-zhua16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 699
  lastpage: 707
  published: 2016-06-11 00:00:00 +0000
- title: 'Loss factorization, weakly supervised learning and label noise robustness'
  abstract: 'We prove that the empirical risk of most well-known loss functions factors into a linear term aggregating all labels with a term that is label free, and can further be expressed by sums of the same loss. This holds true even for non-smooth, non-convex losses and in any RKHS. The first term is a (kernel) mean operator — the focal quantity of this work — which we characterize as the sufficient statistic for the labels. The result tightens known generalization bounds and sheds new light on their interpretation. Factorization has a direct application on weakly supervised learning. In particular, we demonstrate that algorithms like SGD and proximal methods can be adapted with minimal effort to handle weak supervision, once the mean operator has been estimated. We apply this idea to learning with asymmetric noisy labels, connecting and extending prior work. Furthermore, we show that most losses enjoy a data-dependent (by the mean operator) form of noise robustness, in contrast with known negative results.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/patrini16.html
  PDF: http://proceedings.mlr.press/v48/patrini16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-patrini16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Giorgio
    family: Patrini
  - given: Frank
    family: Nielsen
  - given: Richard
    family: Nock
  - given: Marcello
    family: Carioni
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 708-717
  id: patrini16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 708
  lastpage: 717
  published: 2016-06-11 00:00:00 +0000
- title: 'Analysis of Deep Neural Networks with Extended Data Jacobian Matrix'
  abstract: 'Deep neural networks have achieved great successes on various machine learning tasks, however, there are many open fundamental questions to be answered. In this paper, we tackle the problem of quantifying the quality of learned wights of different networks with possibly different architectures, going beyond considering the final classification error as the only metric. We introduce \emphExtended Data Jacobian Matrix to help analyze properties of networks of various structures, finding that, the spectrum of the extended data jacobian matrix is a strong discriminating factor for networks of different structures and performance. Based on such observation, we propose a novel regularization method, which manages to improve the network performance comparably to dropout, which in turn verifies the observation.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wanga16.html
  PDF: http://proceedings.mlr.press/v48/wanga16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wanga16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shengjie
    family: Wang
  - given: Abdel-rahman
    family: Mohamed
  - given: Rich
    family: Caruana
  - given: Jeff
    family: Bilmes
  - given: Matthai
    family: Plilipose
  - given: Matthew
    family: Richardson
  - given: Krzysztof
    family: Geras
  - given: Gregor
    family: Urban
  - given: Ozlem
    family: Aslan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 718-726
  id: wanga16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 718
  lastpage: 726
  published: 2016-06-11 00:00:00 +0000
- title: 'Doubly Decomposing Nonparametric Tensor Regression'
  abstract: 'Nonparametric extension of tensor regression is proposed. Nonlinearity in a high-dimensional tensor space is broken into simple local functions by incorporating low-rank tensor decomposition. Compared to naive nonparametric approaches, our formulation considerably improves the convergence rate of estimation while maintaining consistency with the same function class under specific conditions. To estimate local functions, we develop a Bayesian estimator with the Gaussian process prior. Experimental results show its theoretical properties and high performance in terms of predicting a summary statistic of a real complex network.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/imaizumi16.html
  PDF: http://proceedings.mlr.press/v48/imaizumi16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-imaizumi16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Masaaki
    family: Imaizumi
  - given: Kohei
    family: Hayashi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 727-736
  id: imaizumi16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 727
  lastpage: 736
  published: 2016-06-11 00:00:00 +0000
- title: 'Hyperparameter optimization with approximate gradient'
  abstract: 'Most models in machine learning contain at least one hyperparameter to control for model complexity. Choosing an appropriate set of hyperparameters is both crucial in terms of model accuracy and computationally challenging. In this work we propose an algorithm for the optimization of continuous hyperparameters using inexact gradient information. An advantage of this method is that hyperparameters can be updated before model parameters have fully converged. We also give sufficient conditions for the global convergence of this method, based on regularity conditions of the involved functions and summability of errors. Finally, we validate the empirical performance of this method on the estimation of regularization constants of L2-regularized logistic regression and kernel Ridge regression. Empirical benchmarks indicate that our approach is highly competitive with respect to state of the art methods.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/pedregosa16.html
  PDF: http://proceedings.mlr.press/v48/pedregosa16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-pedregosa16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Fabian
    family: Pedregosa
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 737-746
  id: pedregosa16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 737
  lastpage: 746
  published: 2016-06-11 00:00:00 +0000
- title: 'SDCA without Duality, Regularization, and Individual Convexity'
  abstract: 'Stochastic Dual Coordinate Ascent is a popular method for solving regularized loss minimization for the case of convex losses. We describe variants of SDCA that do not require explicit regularization and do not rely on duality. We prove linear convergence rates even if individual loss functions are non-convex, as long as the expected loss is strongly convex.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shalev-shwartza16.html
  PDF: http://proceedings.mlr.press/v48/shalev-shwartza16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shalev-shwartza16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shai
    family: Shalev-Shwartz
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 747-754
  id: shalev-shwartza16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 747
  lastpage: 754
  published: 2016-06-11 00:00:00 +0000
- title: 'Heteroscedastic Sequences: Beyond Gaussianity'
  abstract: 'We address the problem of sequential prediction in the heteroscedastic setting, when both the signal and its variance are assumed to depend on explanatory variables. By applying regret minimization techniques, we devise an efficient online learning algorithm for the problem, without assuming that the error terms comply with a specific distribution. We show that our algorithm can be adjusted to provide confidence bounds for its predictions, and provide an application to ARCH models. The theoretic results are corroborated by an empirical study.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/anava16.html
  PDF: http://proceedings.mlr.press/v48/anava16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-anava16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Oren
    family: Anava
  - given: Shie
    family: Mannor
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 755-763
  id: anava16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 755
  lastpage: 763
  published: 2016-06-11 00:00:00 +0000
- title: 'A Neural Autoregressive Approach to Collaborative Filtering'
  abstract: 'This paper proposes CF-NADE, a neural autoregressive architecture for collaborative filtering (CF) tasks, which is inspired by the Restricted Boltzmann Machine (RBM) based CF model and the Neural Autoregressive Distribution Estimator (NADE). We first describe the basic CF-NADE model for CF tasks. Then we propose to improve the model by sharing parameters between different ratings. A factored version of CF-NADE is also proposed for better scalability. Furthermore, we take the ordinal nature of the preferences into consideration and propose an ordinal cost to optimize CF-NADE, which shows superior performance. Finally, CF-NADE can be extended to a deep model, with only moderately increased computational complexity. Experimental results show that CF-NADE with a single hidden layer beats all previous state-of-the-art methods on MovieLens 1M, MovieLens 10M, and Netflix datasets, and adding more hidden layers can further improve the performance.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zheng16.html
  PDF: http://proceedings.mlr.press/v48/zheng16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zheng16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yin
    family: Zheng
  - given: Bangsheng
    family: Tang
  - given: Wenkui
    family: Ding
  - given: Hanning
    family: Zhou
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 764-773
  id: zheng16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 764
  lastpage: 773
  published: 2016-06-11 00:00:00 +0000
- title: 'On the Quality of the Initial Basin in Overspecified Neural Networks'
  abstract: 'Deep learning, in the form of artificial neural networks, has achieved remarkable practical success in recent years, for a variety of difficult machine learning applications. However, a theoretical explanation for this remains a major open problem, since training neural networks involves optimizing a highly non-convex objective function, and is known to be computationally hard in the worst case. In this work, we study the \emphgeometric structure of the associated non-convex objective function, in the context of ReLU networks and starting from a random initialization of the network parameters. We identify some conditions under which it becomes more favorable to optimization, in the sense of (i) High probability of initializing at a point from which there is a monotonically decreasing path to a global minimum; and (ii) High probability of initializing at a basin (suitably defined) with a small minimal objective value. A common theme in our results is that such properties are more likely to hold for larger (“overspecified”) networks, which accords with some recent empirical and theoretical observations.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/safran16.html
  PDF: http://proceedings.mlr.press/v48/safran16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-safran16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Itay
    family: Safran
  - given: Ohad
    family: Shamir
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 774-782
  id: safran16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 774
  lastpage: 782
  published: 2016-06-11 00:00:00 +0000
- title: 'Primal-Dual Rates and Certificates'
  abstract: 'We propose an algorithm-independent framework to equip existing optimization methods with primal-dual certificates. Such certificates and corresponding rate of convergence guarantees are important for practitioners to diagnose progress, in particular in machine learning applications. We obtain new primal-dual convergence rates, e.g., for the Lasso as well as many L1, Elastic Net, group Lasso and TV-regularized problems. The theory applies to any norm-regularized generalized linear model. Our approach provides efficiently computable duality gaps which are globally defined, without modifying the original problems in the region of interest.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/dunner16.html
  PDF: http://proceedings.mlr.press/v48/dunner16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-dunner16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Celestine
    family: Dünner
  - given: Simone
    family: Forte
  - given: Martin
    family: Takac
  - given: Martin
    family: Jaggi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 783-792
  id: dunner16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 783
  lastpage: 792
  published: 2016-06-11 00:00:00 +0000
- title: 'Minimizing the Maximal Loss: How and Why'
  abstract: 'A commonly used learning rule is to approximately minimize the \emphaverage loss over the training set. Other learning algorithms, such as AdaBoost and hard-SVM, aim at minimizing the \emphmaximal loss over the training set. The average loss is more popular, particularly in deep learning, due to three main reasons. First, it can be conveniently minimized using online algorithms, that process few examples at each iteration. Second, it is often argued that there is no sense to minimize the loss on the training set too much, as it will not be reflected in the generalization loss. Last, the maximal loss is not robust to outliers. In this paper we describe and analyze an algorithm that can convert any online algorithm to a minimizer of the maximal loss. We show, theoretically and empirically, that in some situations better accuracy on the training set is crucial to obtain good performance on unseen examples. Last, we propose robust versions of the approach that can handle outliers.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shalev-shwartzb16.html
  PDF: http://proceedings.mlr.press/v48/shalev-shwartzb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shalev-shwartzb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shai
    family: Shalev-Shwartz
  - given: Yonatan
    family: Wexler
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 793-801
  id: shalev-shwartzb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 793
  lastpage: 801
  published: 2016-06-11 00:00:00 +0000
- title: 'The Information-Theoretic Requirements of Subspace Clustering with Missing Data'
  abstract: 'Subspace clustering with missing data (SCMD) is a useful tool for analyzing incomplete datasets. Let d be the ambient dimension, and r the dimension of the subspaces. Existing theory shows that Nk = O(r d) columns per subspace are necessary for SCMD, and Nk =O(min d^(log d), d^(r+1) ) are sufficient. We close this gap, showing that Nk =O(r d) is also sufficient. To do this we derive deterministic sampling conditions for SCMD, which give precise information theoretic requirements and determine sampling regimes. These results explain the performance of SCMD algorithms from the literature. Finally, we give a practical algorithm to certify the output of any SCMD method deterministically.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/pimentel-alarcon16.html
  PDF: http://proceedings.mlr.press/v48/pimentel-alarcon16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-pimentel-alarcon16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Daniel
    family: Pimentel-Alarcon
  - given: Robert
    family: Nowak
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 802-810
  id: pimentel-alarcon16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 802
  lastpage: 810
  published: 2016-06-11 00:00:00 +0000
- title: 'Online Learning with Feedback Graphs Without the Graphs'
  abstract: 'We study an online learning framework introduced by Mannor and Shamir (2011) in which the feedback is specified by a graph, in a setting where the graph may vary from round to round and is \emphnever fully revealed to the learner. We show a large gap between the adversarial and the stochastic cases. In the adversarial case, we prove that even for dense feedback graphs, the learner cannot improve upon a trivial regret bound obtained by ignoring any additional feedback besides her own loss. In contrast, in the stochastic case we give an algorithm that achieves \widetildeΘ(\sqrtαT) regret over T rounds, provided that the independence numbers of the hidden feedback graphs are at most α. We also extend our results to a more general feedback model, in which the learner does not necessarily observe her own loss, and show that, even in simple cases, concealing the feedback graphs might render the problem unlearnable.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/cohena16.html
  PDF: http://proceedings.mlr.press/v48/cohena16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-cohena16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Alon
    family: Cohen
  - given: Tamir
    family: Hazan
  - given: Tomer
    family: Koren
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 811-819
  id: cohena16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 811
  lastpage: 819
  published: 2016-06-11 00:00:00 +0000
- title: 'PAC learning of Probabilistic Automaton based on the Method of Moments'
  abstract: 'Probabilitic Finite Automata (PFA) are generative graphical models that define distributions with latent variables over finite sequences of symbols, a.k.a. stochastic languages. Traditionally, unsupervised learning of PFA is performed through algorithms that iteratively improves the likelihood like the Expectation-Maximization (EM) algorithm. Recently, learning algorithms based on the so-called Method of Moments (MoM) have been proposed as a much faster alternative that comes with PAC-style guarantees. However, these algorithms do not ensure the learnt automata to model a proper distribution, limiting their applicability and preventing them to serve as an initialization to iterative algorithms. In this paper, we propose a new MoM-based algorithm with PAC-style guarantees that learns automata defining proper distributions. We assess its performances on synthetic problems from the PAutomaC challenge and real datasets extracted from Wikipedia against previous MoM-based algorithms and EM algorithm.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/glaude16.html
  PDF: http://proceedings.mlr.press/v48/glaude16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-glaude16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hadrien
    family: Glaude
  - given: Olivier
    family: Pietquin
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 820-829
  id: glaude16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 820
  lastpage: 829
  published: 2016-06-11 00:00:00 +0000
- title: 'Estimating Structured Vector Autoregressive Models'
  abstract: 'While considerable advances have been made in estimating high-dimensional structured models from independent data using Lasso-type models, limited progress has been made for settings when the samples are dependent. We consider estimating structured VAR (vector auto-regressive model), where the structure can be captured by any suitable norm, e.g., Lasso, group Lasso, order weighted Lasso, etc. In VAR setting with correlated noise, although there is strong dependence over time and covariates, we establish bounds on the non-asymptotic estimation error of structured VAR parameters. The estimation error is of the same order as that of the corresponding Lasso-type estimator with independent samples, and the analysis holds for any norm. Our analysis relies on results in generic chaining, sub-exponential martingales, and spectral representation of VAR models. Experimental results on synthetic and real data with a variety of structures are presented, validating theoretical results.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/melnyk16.html
  PDF: http://proceedings.mlr.press/v48/melnyk16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-melnyk16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Igor
    family: Melnyk
  - given: Arindam
    family: Banerjee
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 830-839
  id: melnyk16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 830
  lastpage: 839
  published: 2016-06-11 00:00:00 +0000
- title: 'Mixing Rates for the Alternating Gibbs Sampler over Restricted Boltzmann Machines and Friends'
  abstract: 'Alternating Gibbs sampling is a modification of classical Gibbs sampling where several variables are simultaneously sampled from their joint conditional distribution. In this work, we investigate the mixing rate of alternating Gibbs sampling with a particular emphasis on Restricted Boltzmann Machines (RBMs) and variants.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/tosh16.html
  PDF: http://proceedings.mlr.press/v48/tosh16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-tosh16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Christopher
    family: Tosh
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 840-849
  id: tosh16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 840
  lastpage: 849
  published: 2016-06-11 00:00:00 +0000
- title: 'Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms'
  abstract: 'Polynomial networks and factorization machines are two recently-proposed models that can efficiently use feature interactions in classification and regression tasks. In this paper, we revisit both models from a unified perspective. Based on this new view, we study the properties of both models and propose new efficient training algorithms. Key to our approach is to cast parameter learning as a low-rank symmetric tensor estimation problem, which we solve by multi-convex optimization. We demonstrate our approach on regression and recommender system tasks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/blondel16.html
  PDF: http://proceedings.mlr.press/v48/blondel16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-blondel16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mathieu
    family: Blondel
  - given: Masakazu
    family: Ishihata
  - given: Akinori
    family: Fujino
  - given: Naonori
    family: Ueda
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 850-858
  id: blondel16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 850
  lastpage: 858
  published: 2016-06-11 00:00:00 +0000
- title: 'A New PAC-Bayesian Perspective on Domain Adaptation'
  abstract: 'We study the issue of PAC-Bayesian domain adaptation: We want to learn, from a source domain, a majority vote model dedicated to a target one. Our theoretical contribution brings a new perspective by deriving an upper-bound on the target risk where the distributions’ divergence - expressed as a ratio - controls the trade-off between a source error measure and the target voters’ disagreement. Our bound suggests that one has to focus on regions where the source data is informative. From this result, we derive a PAC-Bayesian generalization bound, and specialize it to linear classifiers. Then, we infer a learning algorithm and perform experiments on real data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/germain16.html
  PDF: http://proceedings.mlr.press/v48/germain16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-germain16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Pascal
    family: Germain
  - given: Amaury
    family: Habrard
  - given: François
    family: Laviolette
  - given: Emilie
    family: Morvant
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 859-868
  id: germain16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 859
  lastpage: 868
  published: 2016-06-11 00:00:00 +0000
- title: 'Correlation Clustering and Biclustering with Locally Bounded Errors'
  abstract: 'We consider a generalized version of the correlation clustering problem, defined as follows. Given a complete graph G whose edges are labeled with + or -, we wish to partition the graph into clusters while trying to avoid errors: + edges between clusters or - edges within clusters. Classically, one seeks to minimize the total number of such errors. We introduce a new framework that allows the objective to be a more general function of the number of errors at each vertex (for example, we may wish to minimize the number of errors at the worst vertex) and provide a rounding algorithm which converts “fractional clusterings” into discrete clusterings while causing only a constant-factor blowup in the number of errors at each vertex. This rounding algorithm yields constant-factor approximation algorithms for the discrete problem under a wide variety of objective functions.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/puleo16.html
  PDF: http://proceedings.mlr.press/v48/puleo16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-puleo16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Gregory
    family: Puleo
  - given: Olgica
    family: Milenkovic
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 869-877
  id: puleo16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 869
  lastpage: 877
  published: 2016-06-11 00:00:00 +0000
- title: 'PAC Lower Bounds and Efficient Algorithms for The Max K-Armed Bandit Problem'
  abstract: 'We consider the Max K-Armed Bandit problem, where a learning agent is faced with several stochastic arms, each a source of i.i.d. rewards of unknown distribution. At each time step the agent chooses an arm, and observes the reward of the obtained sample. Each sample is considered here as a separate item with the reward designating its value, and the goal is to find an item with the highest possible value. Our basic assumption is a known lower bound on the \em tail function of the reward distributions. Under the PAC framework, we provide a lower bound on the sample complexity of any (ε,δ)-correct algorithm, and propose an algorithm that attains this bound up to logarithmic factors. We provide an analysis of the robustness of the proposed algorithm to the model assumptions, and further compare its performance to the simple non-adaptive variant, in which the arms are chosen randomly at each stage.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/david16.html
  PDF: http://proceedings.mlr.press/v48/david16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-david16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yahel
    family: David
  - given: Nahum
    family: Shimkin
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 878-887
  id: david16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 878
  lastpage: 887
  published: 2016-06-11 00:00:00 +0000
- title: 'A Comparative Analysis and Study of Multiview CNN Models for Joint Object Categorization and Pose Estimation'
  abstract: 'In the Object Recognition task, there exists a dichotomy between the categorization of objects and estimating object pose, where the former necessitates a view-invariant representation, while the latter requires a representation capable of capturing pose information over different categories of objects. With the rise of deep architectures, the prime focus has been on object category recognition. Deep learning methods have achieved wide success in this task. In contrast, object pose estimation using these approaches has received relatively less attention. In this work, we study how Convolutional Neural Networks (CNN) architectures can be adapted to the task of simultaneous object recognition and pose estimation. We investigate and analyze the layers of various CNN models and extensively compare between them with the goal of discovering how the layers of distributed representations within CNNs represent object pose information and how this contradicts with object category representations. We extensively experiment on two recent large and challenging multi-view datasets and we achieve better than the state-of-the-art.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/elhoseiny16.html
  PDF: http://proceedings.mlr.press/v48/elhoseiny16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-elhoseiny16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mohamed
    family: Elhoseiny
  - given: Tarek
    family: El-Gaaly
  - given: Amr
    family: Bakry
  - given: Ahmed
    family: Elgammal
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 888-897
  id: elhoseiny16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 888
  lastpage: 897
  published: 2016-06-11 00:00:00 +0000
- title: 'BASC: Applying Bayesian Optimization to the Search for Global Minima on Potential Energy Surfaces'
  abstract: 'We present a novel application of Bayesian optimization to the field of surface science: rapidly and accurately searching for the global minimum on potential energy surfaces. Controlling molecule-surface interactions is key for applications ranging from environmental catalysis to gas sensing. We present pragmatic techniques, including exploration/exploitation scheduling and a custom covariance kernel that encodes the properties of our objective function. Our method, the Bayesian Active Site Calculator (BASC), outperforms differential evolution and constrained minima hopping – two state-of-the-art approaches – in trial examples of carbon monoxide adsorption on a hematite substrate, both with and without a defect.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/carr16.html
  PDF: http://proceedings.mlr.press/v48/carr16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-carr16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shane
    family: Carr
  - given: Roman
    family: Garnett
  - given: Cynthia
    family: Lo
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 898-907
  id: carr16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 898
  lastpage: 907
  published: 2016-06-11 00:00:00 +0000
- title: 'On the Iteration Complexity of Oblivious First-Order Optimization Algorithms'
  abstract: 'We consider a broad class of first-order optimization algorithms which are \emphoblivious, in the sense that their step sizes are scheduled regardless of the function under consideration, except for limited side-information such as smoothness or strong convexity parameters. With the knowledge of these two parameters, we show that any such algorithm attains an iteration complexity lower bound of Ω(\sqrtL/ε) for L-smooth convex functions, and \tildeΩ(\sqrtL/μ\ln(1/ε)) for L-smooth μ-strongly convex functions. These lower bounds are stronger than those in the traditional oracle model, as they hold independently of the dimension. To attain these, we abandon the oracle model in favor of a structure-based approach which builds upon a framework recently proposed in Arjevani et al. (2015). We further show that without knowing the strong convexity parameter, it is impossible to attain an iteration complexity better than \tildeΩ\sqrt(L/μ)\ln(1/ε). This result is then used to formalize an observation regarding L-smooth convex functions, namely, that the iteration complexity of algorithms employing time-invariant step sizes must be at least Ω(L/ε).'
  volume: 48
  URL: https://proceedings.mlr.press/v48/arjevani16.html
  PDF: http://proceedings.mlr.press/v48/arjevani16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-arjevani16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yossi
    family: Arjevani
  - given: Ohad
    family: Shamir
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 908-916
  id: arjevani16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 908
  lastpage: 916
  published: 2016-06-11 00:00:00 +0000
- title: 'Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning'
  abstract: 'We propose a stochastic variance reduced optimization algorithm for solving a class of large-scale nonconvex optimization problems with cardinality constraints, and provide sufficient conditions under which the proposed algorithm enjoys strong linear convergence guarantees and optimal estimation accuracy in high dimensions. Numerical experiments demonstrate the efficiency of our method in terms of both parameter estimation and computational performance.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lid16.html
  PDF: http://proceedings.mlr.press/v48/lid16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lid16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Xingguo
    family: Li
  - given: Tuo
    family: Zhao
  - given: Raman
    family: Arora
  - given: Han
    family: Liu
  - given: Jarvis
    family: Haupt
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 917-925
  id: lid16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 917
  lastpage: 925
  published: 2016-06-11 00:00:00 +0000
- title: 'Analysis of Variational Bayesian Factorizations for Sparse and Low-Rank Estimation'
  abstract: 'Variational Bayesian (VB) approximations anchor a wide variety of probabilistic models, where tractable posterior inference is almost never possible. Typically based on the so-called VB mean-field approximation to the Kullback-Leibler divergence, a posterior distribution is sought that factorizes across groups of latent variables such that, with the distributions of all but one group of variables held fixed, an optimal closed-form distribution can be obtained for the remaining group, with differing algorithms distinguished by how different variables are grouped and ultimately factored. This basic strategy is particularly attractive when estimating structured low-dimensional models of high-dimensional data, exemplified by the search for minimal rank and/or sparse approximations to observed data. To this end, VB models are frequently deployed across applications including multi-task learning, robust PCA, subspace clustering, matrix completion, affine rank minimization, source localization, compressive sensing, and assorted combinations thereof. Perhaps surprisingly however, there exists almost no attendant theoretical explanation for how various VB factorizations operate, and in which situations one may be preferable to another. We address this relative void by comparing arguably two of the most popular factorizations, one built upon Gaussian scale mixture priors, the other bilinear Gaussian priors, both of which can favor minimal rank or sparsity depending on the context. More specifically, by reexpressing the respective VB objective functions, we weigh multiple factors related to local minima avoidance, feature transformation invariance and correlation, and computational complexity to arrive at insightful conclusions useful in explaining performance and deciding which VB flavor is advantageous. We also envision that the principles explored here are quite relevant to other structured inverse problems where VB serves as a viable solution.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wipf16.html
  PDF: http://proceedings.mlr.press/v48/wipf16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wipf16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Wipf
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 926-935
  id: wipf16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 926
  lastpage: 935
  published: 2016-06-11 00:00:00 +0000
- title: 'Fast k-means with accurate bounds'
  abstract: 'We propose a novel accelerated exact k-means algorithm, which outperforms the current state-of-the-art low-dimensional algorithm in 18 of 22 experiments, running up to 3 times faster. We also propose a general improvement of existing state-of-the-art accelerated exact k-means algorithms through better estimates of the distance bounds used to reduce the number of distance calculations, obtaining speedups in 36 of 44 experiments, of up to 1.8 times. We have conducted experiments with our own implementations of existing methods to ensure homogeneous evaluation of performance, and we show that our implementations perform as well or better than existing available implementations. Finally, we propose simplified variants of standard approaches and show that they are faster than their fully-fledged counterparts in 59 of 62 experiments.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/newling16.html
  PDF: http://proceedings.mlr.press/v48/newling16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-newling16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: James
    family: Newling
  - given: Francois
    family: Fleuret
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 936-944
  id: newling16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 936
  lastpage: 944
  published: 2016-06-11 00:00:00 +0000
- title: 'Boolean Matrix Factorization and Noisy Completion via Message Passing'
  abstract: 'Boolean matrix factorization and Boolean matrix completion from noisy observations are desirable unsupervised data-analysis methods due to their interpretability, but hard to perform due to their NP-hardness. We treat these problems as maximum a posteriori inference problems in a graphical model and present a message passing approach that scales linearly with the number of observations and factors. Our empirical study demonstrates that message passing is able to recover low-rank Boolean matrices, in the boundaries of theoretically possible recovery and compares favorably with state-of-the-art in real-world applications, such collaborative filtering with large-scale Boolean data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ravanbakhsha16.html
  PDF: http://proceedings.mlr.press/v48/ravanbakhsha16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ravanbakhsha16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Siamak
    family: Ravanbakhsh
  - given: Barnabas
    family: Poczos
  - given: Russell
    family: Greiner
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 945-954
  id: ravanbakhsha16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 945
  lastpage: 954
  published: 2016-06-11 00:00:00 +0000
- title: 'Convolutional Rectifier Networks as Generalized Tensor Decompositions'
  abstract: 'Convolutional rectifier networks, i.e. convolutional neural networks with rectified linear activation and max or average pooling, are the cornerstone of modern deep learning. However, despite their wide use and success, our theoretical understanding of the expressive properties that drive these networks is partial at best. On the other hand, we have a much firmer grasp of these issues in the world of arithmetic circuits. Specifically, it is known that convolutional arithmetic circuits possess the property of "complete depth efficiency", meaning that besides a negligible set, all functions realizable by a deep network of polynomial size, require exponential size in order to be realized (or approximated) by a shallow network. In this paper we describe a construction based on generalized tensor decompositions, that transforms convolutional arithmetic circuits into convolutional rectifier networks. We then use mathematical tools available from the world of arithmetic circuits to prove new results. First, we show that convolutional rectifier networks are universal with max pooling but not with average pooling. Second, and more importantly, we show that depth efficiency is weaker with convolutional rectifier networks than it is with convolutional arithmetic circuits. This leads us to believe that developing effective methods for training convolutional arithmetic circuits, thereby fulfilling their expressive potential, may give rise to a deep learning architecture that is provably superior to convolutional rectifier networks but has so far been overlooked by practitioners.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/cohenb16.html
  PDF: http://proceedings.mlr.press/v48/cohenb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-cohenb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nadav
    family: Cohen
  - given: Amnon
    family: Shashua
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 955-963
  id: cohenb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 955
  lastpage: 963
  published: 2016-06-11 00:00:00 +0000
- title: 'Low-rank Solutions of Linear Matrix Equations via Procrustes Flow'
  abstract: 'In this paper we study the problem of recovering a low-rank matrix from linear measurements. Our algorithm, which we call Procrustes Flow, starts from an initial estimate obtained by a thresholding scheme followed by gradient descent on a non-convex objective. We show that as long as the measurements obey a standard restricted isometry property, our algorithm converges to the unknown matrix at a geometric rate. In the case of Gaussian measurements, such convergence occurs for a n1 \times n2 matrix of rank r when the number of measurements exceeds a constant times (n1 + n2)r.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/tu16.html
  PDF: http://proceedings.mlr.press/v48/tu16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-tu16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Stephen
    family: Tu
  - given: Ross
    family: Boczar
  - given: Max
    family: Simchowitz
  - given: Mahdi
    family: Soltanolkotabi
  - given: Ben
    family: Recht
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 964-973
  id: tu16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 964
  lastpage: 973
  published: 2016-06-11 00:00:00 +0000
- title: 'Anytime Exploration for Multi-armed Bandits using Confidence Information'
  abstract: 'We introduce anytime Explore-m, a pure exploration problem for multi-armed bandits (MAB) that requires making a prediction of the top-m arms at every time step. Anytime Explore-m is more practical than fixed budget or fixed confidence formulations of the top-m problem, since many applications involve a finite, but unpredictable, budget. However, the development and analysis of anytime algorithms present many challenges. We propose AT-LUCB (AnyTime Lower and Upper Confidence Bound), the first nontrivial algorithm that provably solves anytime Explore-m. Our analysis shows that the sample complexity of AT-LUCB is competitive to anytime variants of existing algorithms. Moreover, our empirical evaluation on AT-LUCB shows that AT-LUCB performs as well as or better than state-of-the-art baseline methods for anytime Explore-m.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/jun16.html
  PDF: http://proceedings.mlr.press/v48/jun16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-jun16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Kwang-Sung
    family: Jun
  - given: Robert
    family: Nowak
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 974-982
  id: jun16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 974
  lastpage: 982
  published: 2016-06-11 00:00:00 +0000
- title: 'Structured Prediction Energy Networks'
  abstract: 'We introduce structured prediction energy networks (SPENs), a flexible framework for structured prediction. A deep architecture is used to define an energy function of candidate labels, and then predictions are produced by using back-propagation to iteratively optimize the energy with respect to the labels. This deep architecture captures dependencies between labels that would lead to intractable graphical models, and performs structure learning by automatically learning discriminative features of the structured output. One natural application of our technique is multi-label classification, which traditionally has required strict prior assumptions about the interactions between labels to ensure tractable learning and prediction. We are able to apply SPENs to multi-label problems with substantially larger label sets than previous applications of structured prediction, while modeling high-order interactions using minimal structural assumptions. Overall, deep learning provides remarkable tools for learning features of the inputs to a prediction problem, and this work extends these techniques to learning features of structured outputs. Our experiments provide impressive performance on a variety of benchmark multi-label classification tasks, demonstrate that our technique can be used to provide interpretable structure learning, and illuminate fundamental trade-offs between feed-forward and iterative structured prediction.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/belanger16.html
  PDF: http://proceedings.mlr.press/v48/belanger16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-belanger16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Belanger
  - given: Andrew
    family: McCallum
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 983-992
  id: belanger16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 983
  lastpage: 992
  published: 2016-06-11 00:00:00 +0000
- title: 'L1-regularized Neural Networks are Improperly Learnable in Polynomial Time'
  abstract: 'We study the improper learning of multi-layer neural networks. Suppose that the neural network to be learned has k hidden layers and that the \ell_1-norm of the incoming weights of any neuron is bounded by L. We present a kernel-based method, such that with probability at least 1 - δ, it learns a predictor whose generalization error is at most εworse than that of the neural network. The sample complexity and the time complexity of the presented method are polynomial in the input dimension and in (1/ε,\log(1/δ),F(k,L)), where F(k,L) is a function depending on (k,L) and on the activation function, independent of the number of neurons. The algorithm applies to both sigmoid-like activation functions and ReLU-like activation functions. It implies that any sufficiently sparse neural network is learnable in polynomial time.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zhangd16.html
  PDF: http://proceedings.mlr.press/v48/zhangd16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zhangd16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yuchen
    family: Zhang
  - given: Jason D.
    family: Lee
  - given: Michael I.
    family: Jordan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 993-1001
  id: zhangd16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 993
  lastpage: 1001
  published: 2016-06-11 00:00:00 +0000
- title: 'Compressive Spectral Clustering'
  abstract: 'Spectral clustering has become a popular technique due to its high performance in many contexts. It comprises three main steps: create a similarity graph between N objects to cluster, compute the first k eigenvectors of its Laplacian matrix to define a feature vector for each object, and run k-means on these features to separate objects into k classes. Each of these three steps becomes computationally intensive for large N and/or k. We propose to speed up the last two steps based on recent results in the emerging field of graph signal processing: graph filtering of random signals, and random sampling of bandlimited graph signals. We prove that our method, with a gain in computation time that can reach several orders of magnitude, is in fact an approximation of spectral clustering, for which we are able to control the error. We test the performance of our method on artificial and real-world network data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/tremblay16.html
  PDF: http://proceedings.mlr.press/v48/tremblay16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-tremblay16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nicolas
    family: Tremblay
  - given: Gilles
    family: Puy
  - given: Remi
    family: Gribonval
  - given: Pierre
    family: Vandergheynst
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1002-1011
  id: tremblay16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1002
  lastpage: 1011
  published: 2016-06-11 00:00:00 +0000
- title: 'Low-rank tensor completion: a Riemannian manifold preconditioning approach'
  abstract: 'We propose a novel Riemannian manifold preconditioning approach for the tensor completion problem with rank constraint. A novel Riemannian metric or inner product is proposed that exploits the least-squares structure of the cost function and takes into account the structured symmetry that exists in Tucker decomposition. The specific metric allows to use the versatile framework of Riemannian optimization on quotient manifolds to develop preconditioned nonlinear conjugate gradient and stochastic gradient descent algorithms in batch and online setups, respectively. Concrete matrix representations of various optimization-related ingredients are listed. Numerical comparisons suggest that our proposed algorithms robustly outperform state-of-the-art algorithms across different synthetic and real-world datasets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kasai16.html
  PDF: http://proceedings.mlr.press/v48/kasai16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kasai16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hiroyuki
    family: Kasai
  - given: Bamdev
    family: Mishra
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1012-1021
  id: kasai16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1012
  lastpage: 1021
  published: 2016-06-11 00:00:00 +0000
- title: 'Provable Non-convex Phase Retrieval with Outliers: Median TruncatedWirtinger Flow'
  abstract: 'Solving systems of quadratic equations is a central problem in machine learning and signal processing. One important example is phase retrieval, which aims to recover a signal from only magnitudes of its linear measurements. This paper focuses on the situation when the measurements are corrupted by arbitrary outliers, for which the recently developed non-convex gradient descent Wirtinger flow (WF) and truncated Wirtinger flow (TWF) algorithms likely fail. We develop a novel median-TWF algorithm that exploits robustness of sample median to resist arbitrary outliers in the initialization and the gradient update in each iteration. We show that such a non-convex algorithm provably recovers the signal from a near-optimal number of measurements composed of i.i.d. Gaussian entries, up to a logarithmic factor, even when a constant portion of the measurements are corrupted by arbitrary outliers. We further show that median-TWF is also robust when measurements are corrupted by both arbitrary outliers and bounded noise. Our analysis of performance guarantee is accomplished by development of non-trivial concentration measures of median-related quantities, which may be of independent interest. We further provide numerical experiments to demonstrate the effectiveness of the approach.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zhange16.html
  PDF: http://proceedings.mlr.press/v48/zhange16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zhange16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Huishuai
    family: Zhang
  - given: Yuejie
    family: Chi
  - given: Yingbin
    family: Liang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1022-1031
  id: zhange16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1022
  lastpage: 1031
  published: 2016-06-11 00:00:00 +0000
- title: 'Estimating Maximum Expected Value through Gaussian Approximation'
  abstract: 'This paper is about the estimation of the maximum expected value of a set of independent random variables. The performance of several learning algorithms (e.g., Q-learning) is affected by the accuracy of such estimation. Unfortunately, no unbiased estimator exists. The usual approach of taking the maximum of the sample means leads to large overestimates that may significantly harm the performance of the learning algorithm. Recent works have shown that the cross validation estimator—which is negatively biased—outperforms the maximum estimator in many sequential decision-making scenarios. On the other hand, the relative performance of the two estimators is highly problem-dependent. In this paper, we propose a new estimator for the maximum expected value, based on a weighted average of the sample means, where the weights are computed using Gaussian approximations for the distributions of the sample means. We compare the proposed estimator with the other state-of-the-art methods both theoretically, by deriving upper bounds to the bias and the variance of the estimator, and empirically, by testing the performance on different sequential learning problems.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/deramo16.html
  PDF: http://proceedings.mlr.press/v48/deramo16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-deramo16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Carlo
    family: D’Eramo
  - given: Marcello
    family: Restelli
  - given: Alessandro
    family: Nuara
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1032-1040
  id: deramo16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1032
  lastpage: 1040
  published: 2016-06-11 00:00:00 +0000
- title: 'Representational Similarity Learning with Application to Brain Networks'
  abstract: 'Representational Similarity Learning (RSL) aims to discover features that are important in representing (human-judged) similarities among objects. RSL can be posed as a sparsity-regularized multi-task regression problem. Standard methods, like group lasso, may not select important features if they are strongly correlated with others. To address this shortcoming we present a new regularizer for multitask regression called Group Ordered Weighted \ell_1 (GrOWL). Another key contribution of our paper is a novel application to fMRI brain imaging. Representational Similarity Analysis (RSA) is a tool for testing whether localized brain regions encode perceptual similarities. Using GrOWL, we propose a new approach called Network RSA that can discover arbitrarily structured brain networks (possibly widely distributed and non-local) that encode similarity information. We show, in theory and fMRI experiments, how GrOWL deals with strongly correlated covariates.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/oswal16.html
  PDF: http://proceedings.mlr.press/v48/oswal16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-oswal16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Urvashi
    family: Oswal
  - given: Christopher
    family: Cox
  - given: Matthew
    family: Lambon-Ralph
  - given: Timothy
    family: Rogers
  - given: Robert
    family: Nowak
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1041-1049
  id: oswal16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1041
  lastpage: 1049
  published: 2016-06-11 00:00:00 +0000
- title: 'Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning'
  abstract: 'Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. A direct result of this theory gives us tools to model uncertainty with dropout NNs – extracting information from existing models that has been thrown away so far. This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. We perform an extensive study of the properties of dropout’s uncertainty. Various network architectures and non-linearities are assessed on tasks of regression and classification, using MNIST as an example. We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout’s uncertainty in deep reinforcement learning.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gal16.html
  PDF: http://proceedings.mlr.press/v48/gal16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gal16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yarin
    family: Gal
  - given: Zoubin
    family: Ghahramani
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1050-1059
  id: gal16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1050
  lastpage: 1059
  published: 2016-06-11 00:00:00 +0000
- title: 'Generative Adversarial Text to Image Synthesis'
  abstract: 'Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal. However, in recent years generic and powerful recurrent neural network architectures have been developed to learn discriminative text feature representations. Meanwhile, deep convolutional generative adversarial networks (GANs) have begun to generate highly compelling images of specific categories such as faces, album covers, room interiors and flowers. In this work, we develop a novel deep architecture and GAN formulation to effectively bridge these advances in text and image modeling, translating visual concepts from characters to pixels. We demonstrate the capability of our model to generate plausible images of birds and flowers from detailed text descriptions.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/reed16.html
  PDF: http://proceedings.mlr.press/v48/reed16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-reed16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Scott
    family: Reed
  - given: Zeynep
    family: Akata
  - given: Xinchen
    family: Yan
  - given: Lajanugen
    family: Logeswaran
  - given: Bernt
    family: Schiele
  - given: Honglak
    family: Lee
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1060-1069
  id: reed16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1060
  lastpage: 1069
  published: 2016-06-11 00:00:00 +0000
- title: 'Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data'
  abstract: 'We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/prabhakaran16.html
  PDF: http://proceedings.mlr.press/v48/prabhakaran16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-prabhakaran16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sandhya
    family: Prabhakaran
  - given: Elham
    family: Azizi
  - given: Ambrose
    family: Carr
  - given: Dana
    family: Pe’er
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1070-1079
  id: prabhakaran16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1070
  lastpage: 1079
  published: 2016-06-11 00:00:00 +0000
- title: 'Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives'
  abstract: 'Many classical algorithms are found until several years later to outlive the confines in which they were conceived, and continue to be relevant in unforeseen settings. In this paper, we show that SVRG is one such method: being originally designed for strongly convex objectives, it is also very robust in non-strongly convex or sum-of-non-convex settings. More precisely, we provide new analysis to improve the state-of-the-art running times in both settings by either applying SVRG or its novel variant. Since non-strongly convex objectives include important examples such as Lasso or logistic regression, and sum-of-non-convex objectives include famous examples such as stochastic PCA and is even believed to be related to training deep neural nets, our results also imply better performances in these applications.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/allen-zhub16.html
  PDF: http://proceedings.mlr.press/v48/allen-zhub16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-allen-zhub16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zeyuan
    family: Allen-Zhu
  - given: Yang
    family: Yuan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1080-1089
  id: allen-zhub16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1080
  lastpage: 1089
  published: 2016-06-11 00:00:00 +0000
- title: 'Sparse Parameter Recovery from Aggregated Data'
  abstract: 'Data aggregation is becoming an increasingly common technique for sharing sensitive information, and for reducing data size when storage and/or communication costs are high. Aggregate quantities such as group-average are a form of semi-supervision as they do not directly provide information of individual values, but despite their wide-spread use, prior literature on learning individual-level models from aggregated data is extremely limited. This paper investigates the effect of data aggregation on parameter recovery for a sparse linear model, when known results are no longer applicable. In particular, we consider a scenario where the data are collected into groups e.g. aggregated patient records, and first-order empirical moments are available only at the group level. Despite this obfuscation of individual data values, we can show that the true parameter is recoverable with high probability using these aggregates when the collection of true group moments is an incoherent matrix, and the empirical moment estimates have been computed from a sufficiently large number of samples. To the best of our knowledge, ours are the first results on structured parameter recovery using only aggregated data. Experimental results on synthetic data are provided in support of these theoretical claims. We also show that parameter estimation from aggregated data approaches the accuracy of parameter estimation obtainable from non-aggregated or “individual" samples, when applied to two real world healthcare applications- predictive modeling of CMS Medicare reimbursement claims, and modeling of Texas State healthcare charges.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bhowmik16.html
  PDF: http://proceedings.mlr.press/v48/bhowmik16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bhowmik16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Avradeep
    family: Bhowmik
  - given: Joydeep
    family: Ghosh
  - given: Oluwasanmi
    family: Koyejo
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1090-1099
  id: bhowmik16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1090
  lastpage: 1099
  published: 2016-06-11 00:00:00 +0000
- title: 'Deep Structured Energy Based Models for Anomaly Detection'
  abstract: 'In this paper, we attack the anomaly detection problem by directly modeling the data distribution with deep architectures. We hence propose deep structured energy based models (DSEBMs), where the energy function is the output of a deterministic deep neural network with structure. We develop novel model architectures to integrate EBMs with different types of data such as static data, sequential data, and spatial data, and apply appropriate model architectures to adapt to the data structure. Our training algorithm is built upon the recent development of score matching (Hyvarinen, 2005), which connects an EBM with a regularized autoencoder, eliminating the need for complicated sampling method. Statistically sound decision criterion can be derived for anomaly detection purpose from the perspective of the energy landscape of the data distribution. We investigate two decision criteria for performing anomaly detection: the energy score and the reconstruction error. Extensive empirical studies on benchmark anomaly detection tasks demonstrate that our proposed model consistently matches or outperforms all the competing methods.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zhai16.html
  PDF: http://proceedings.mlr.press/v48/zhai16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zhai16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shuangfei
    family: Zhai
  - given: Yu
    family: Cheng
  - given: Weining
    family: Lu
  - given: Zhongfei
    family: Zhang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1100-1109
  id: zhai16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1100
  lastpage: 1109
  published: 2016-06-11 00:00:00 +0000
- title: 'Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling'
  abstract: 'Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems. Up to a primal-dual transformation, it is also the same as accelerated stochastic gradient descent that is one of the central methods used in machine learning. In this paper, we improve the best known running time of accelerated coordinate descent by a factor up to \sqrtn. Our improvement is based on a clean, novel non-uniform sampling that selects each coordinate with a probability proportional to the square root of its smoothness parameter. Our proof technique also deviates from the classical estimation sequence technique used in prior work. Our speed-up applies to important problems such as empirical risk minimization and solving linear systems, both in theory and in practice.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/allen-zhuc16.html
  PDF: http://proceedings.mlr.press/v48/allen-zhuc16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-allen-zhuc16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zeyuan
    family: Allen-Zhu
  - given: Zheng
    family: Qu
  - given: Peter
    family: Richtarik
  - given: Yang
    family: Yuan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1110-1119
  id: allen-zhuc16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1110
  lastpage: 1119
  published: 2016-06-11 00:00:00 +0000
- title: 'Unitary Evolution Recurrent Neural Networks'
  abstract: 'Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very long-term dependencies.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/arjovsky16.html
  PDF: http://proceedings.mlr.press/v48/arjovsky16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-arjovsky16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Martin
    family: Arjovsky
  - given: Amar
    family: Shah
  - given: Yoshua
    family: Bengio
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1120-1128
  id: arjovsky16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1120
  lastpage: 1128
  published: 2016-06-11 00:00:00 +0000
- title: 'Markov Latent Feature Models'
  abstract: 'We introduce Markov latent feature models (MLFM), a sparse latent feature model that arises naturally from a simple sequential construction. The key idea is to interpret each state of a sequential process as corresponding to a latent feature, and the set of states visited between two null-state visits as picking out features for an observation. We show that, given some natural constraints, we can represent this stochastic process as a mixture of recurrent Markov chains. In this way we can perform correlated latent feature modeling for the sparse coding problem. We demonstrate two cases in which we define finite and infinite latent feature models constructed from first-order Markov chains, and derive their associated scalable inference algorithms. We show empirical results on a genome analysis task and an image denoising task.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zhangf16.html
  PDF: http://proceedings.mlr.press/v48/zhangf16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zhangf16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Aonan
    family: Zhang
  - given: John
    family: Paisley
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1129-1137
  id: zhangf16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1129
  lastpage: 1137
  published: 2016-06-11 00:00:00 +0000
- title: 'The Knowledge Gradient for Sequential Decision Making with Stochastic Binary Feedbacks'
  abstract: 'We consider the problem of sequentially making decisions that are rewarded by “successes” and “failures” which can be predicted through an unknown relationship that depends on a partially controllable vector of attributes for each instance. The learner takes an active role in selecting samples from the instance pool. The goal is to maximize the probability of success, either after the offline training phase or minimizing regret in online learning. Our problem is motivated by real-world applications where observations are time consuming and/or expensive. With the adaptation of an online Bayesian linear classifier, we develop a knowledge-gradient type policy to guide the experiment by maximizing the expected value of information of labeling each alternative, in order to reduce the number of expensive physical experiments. We provide a finite-time analysis of the estimated error and demonstrate the performance of the proposed algorithm on both synthetic problems and benchmark UCI datasets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wangb16.html
  PDF: http://proceedings.mlr.press/v48/wangb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wangb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yingfei
    family: Wang
  - given: Chu
    family: Wang
  - given: Warren
    family: Powell
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1138-1147
  id: wangb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1138
  lastpage: 1147
  published: 2016-06-11 00:00:00 +0000
- title: 'A Simple and Provable Algorithm for Sparse Diagonal CCA'
  abstract: 'Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced \emphcanonical variables are maximally correlated. Sparse CCA is NP-hard. We propose a novel combinatorial algorithm for sparse diagonal CCA, \textiti.e., sparse CCA under the additional assumption that variables within each set are standardized and uncorrelated. Our algorithm operates on a low rank approximation of the input data and its computational complexity scales linearly with the number of input variables. It is simple to implement, and parallelizable. In contrast to most existing approaches, our algorithm administers precise control on the sparsity of the extracted canonical vectors, and comes with theoretical data-dependent global approximation guarantees, that hinge on the spectrum of the input data. Finally, it can be straightforwardly adapted to other constrained variants of CCA enforcing structure beyond sparsity. We empirically evaluate the proposed scheme and apply it on a real neuroimaging dataset to investigate associations between brain activity and behavior measurements.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/asteris16.html
  PDF: http://proceedings.mlr.press/v48/asteris16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-asteris16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Megasthenis
    family: Asteris
  - given: Anastasios
    family: Kyrillidis
  - given: Oluwasanmi
    family: Koyejo
  - given: Russell
    family: Poldrack
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1148-1157
  id: asteris16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1148
  lastpage: 1157
  published: 2016-06-11 00:00:00 +0000
- title: 'Quadratic Optimization with Orthogonality Constraints: Explicit Lojasiewicz Exponent and Linear Convergence of Line-Search Methods'
  abstract: 'A fundamental class of matrix optimization problems that arise in many areas of science and engineering is that of quadratic optimization with orthogonality constraints. Such problems can be solved using line-search methods on the Stiefel manifold, which are known to converge globally under mild conditions. To determine the convergence rates of these methods, we give an explicit estimate of the exponent in a Lojasiewicz inequality for the (non-convex) set of critical points of the aforementioned class of problems. This not only allows us to establish the linear convergence of a large class of line-search methods but also answers an important and intriguing problem in mathematical analysis and numerical optimization. A key step in our proof is to establish a local error bound for the set of critical points, which may be of independent interest.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/liue16.html
  PDF: http://proceedings.mlr.press/v48/liue16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-liue16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Huikang
    family: Liu
  - given: Weijie
    family: Wu
  - given: Anthony Man-Cho
    family: So
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1158-1167
  id: liue16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1158
  lastpage: 1167
  published: 2016-06-11 00:00:00 +0000
- title: 'Normalization Propagation: A Parametric Technique for Removing Internal Covariate Shift in Deep Networks'
  abstract: 'While the authors of Batch Normalization (BN) identify and address an important problem involved in training deep networks– \textitInternal Covariate Shift– the current solution has certain drawbacks. For instance, BN depends on batch statistics for layerwise input normalization during training which makes the estimates of mean and standard deviation of input (distribution) to hidden layers inaccurate due to shifting parameter values (especially during initial training epochs). Another fundamental problem with BN is that it cannot be used with batch-size  1  during training. We address these drawbacks of BN by proposing a non-adaptive normalization technique for removing covariate shift, that we call \textitNormalization Propagation. Our approach does not depend on batch statistics, but rather uses a data-independent parametric estimate of mean and standard-deviation in every layer thus being computationally faster compared with BN. We exploit the observation that the pre-activation before Rectified Linear Units follow Gaussian distribution in deep networks, and that once the first and second order statistics of any given dataset are normalized, we can forward propagate this normalization without the need for recalculating the approximate statistics for hidden layers.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/arpitb16.html
  PDF: http://proceedings.mlr.press/v48/arpitb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-arpitb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Devansh
    family: Arpit
  - given: Yingbo
    family: Zhou
  - given: Bhargava
    family: Kota
  - given: Venu
    family: Govindaraju
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1168-1176
  id: arpitb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1168
  lastpage: 1176
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning to Generate with Memory'
  abstract: 'Memory units have been widely used to enrich the capabilities of deep networks on capturing long-term dependencies in reasoning and prediction tasks, but little investigation exists on deep generative models (DGMs) which are good at inferring high-level invariant representations from unlabeled data. This paper presents a deep generative model with a possibly large external memory and an attention mechanism to capture the local detail information that is often lost in the bottom-up abstraction process in representation learning. By adopting a smooth attention model, the whole network is trained end-to-end by optimizing a variational bound of data likelihood via auto-encoding variational Bayesian methods, where an asymmetric recognition network is learnt jointly to infer high-level invariant representations. The asymmetric architecture can reduce the competition between bottom-up invariant feature extraction and top-down generation of instance details. Our experiments on several datasets demonstrate that memory can significantly boost the performance of DGMs on various tasks, including density estimation, image generation, and missing value imputation, and DGMs with memory can achieve state-of-the-art quantitative results.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lie16.html
  PDF: http://proceedings.mlr.press/v48/lie16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lie16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chongxuan
    family: Li
  - given: Jun
    family: Zhu
  - given: Bo
    family: Zhang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1177-1186
  id: lie16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1177
  lastpage: 1186
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning End-to-end Video Classification with Rank-Pooling'
  abstract: 'We introduce a new model for representation learning and classification of video sequences. Our model is based on a convolutional neural network coupled with a novel temporal pooling layer. The temporal pooling layer relies on an inner-optimization problem to efficiently encode temporal semantics over arbitrarily long video clips into a fixed-length vector representation. Importantly, the representation and classification parameters of our model can be estimated jointly in an end-to-end manner by formulating learning as a bilevel optimization problem. Furthermore, the model can make use of any existing convolutional neural network architecture (e.g., AlexNet or VGG) without modification or introduction of additional parameters. We demonstrate our approach on action and activity recognition tasks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/fernando16.html
  PDF: http://proceedings.mlr.press/v48/fernando16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-fernando16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Basura
    family: Fernando
  - given: Stephen
    family: Gould
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1187-1196
  id: fernando16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1187
  lastpage: 1196
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning to Filter with Predictive State Inference Machines'
  abstract: 'Latent state space models are a fundamental and widely used tool for modeling dynamical systems. However, they are difficult to learn from data and learned models often lack performance guarantees on inference tasks such as filtering and prediction. In this work, we present the PREDICTIVE STATE INFERENCE MACHINE (PSIM), a data-driven method that considers the inference procedure on a dynamical system as a composition of predictors. The key idea is that rather than first learning a latent state space model, and then using the learned model for inference, PSIM directly learns predictors for inference in predictive state space. We provide theoretical guarantees for inference, in both realizable and agnostic settings, and showcase practical performance on a variety of simulated and real world robotics benchmarks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/sun16.html
  PDF: http://proceedings.mlr.press/v48/sun16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-sun16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Wen
    family: Sun
  - given: Arun
    family: Venkatraman
  - given: Byron
    family: Boots
  - given: J.Andrew
    family: Bagnell
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1197-1205
  id: sun16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1197
  lastpage: 1205
  published: 2016-06-11 00:00:00 +0000
- title: 'A Subspace Learning Approach for High Dimensional Matrix Decomposition with Efficient Column/Row Sampling'
  abstract: 'This paper presents a new randomized approach to high-dimensional low rank (LR) plus sparse matrix decomposition. For a data matrix D ∈R^N_1 \times N_2, the complexity of conventional decomposition methods is O(N_1 N_2 r), which limits their usefulness in big data settings (r is the rank of the LR component). In addition, the existing randomized approaches rely for the most part on uniform random sampling, which may be inefficient for many real world data matrices. The proposed subspace learning based approach recovers the LR component using only a small subset of the columns/rows of data and reduces complexity to O(\max(N_1,N_2) r^2). Even when the columns/rows are sampled uniformly at random, the sufficient number of sampled columns/rows is shown to be roughly O(r μ), where μis the coherency parameter of the LR component. In addition, efficient sampling algorithms are proposed to address the problem of column/row sampling from structured data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/rahmani16.html
  PDF: http://proceedings.mlr.press/v48/rahmani16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-rahmani16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mostafa
    family: Rahmani
  - given: Geroge
    family: Atia
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1206-1214
  id: rahmani16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1206
  lastpage: 1214
  published: 2016-06-11 00:00:00 +0000
- title: 'DCM Bandits: Learning to Rank with Multiple Clicks'
  abstract: 'A search engine recommends to the user a list of web pages. The user examines this list, from the first page to the last, and clicks on all attractive pages until the user is satisfied. This behavior of the user can be described by the dependent click model (DCM). We propose DCM bandits, an online learning variant of the DCM where the goal is to maximize the probability of recommending satisfactory items, such as web pages. The main challenge of our learning problem is that we do not observe which attractive item is satisfactory. We propose a computationally-efficient learning algorithm for solving our problem, dcmKL-UCB; derive gap-dependent upper bounds on its regret under reasonable assumptions; and also prove a matching lower bound up to logarithmic factors. We evaluate our algorithm on synthetic and real-world problems, and show that it performs well even when our model is misspecified. This work presents the first practical and regret-optimal online algorithm for learning to rank with multiple clicks in a cascade-like click model.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/katariya16.html
  PDF: http://proceedings.mlr.press/v48/katariya16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-katariya16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sumeet
    family: Katariya
  - given: Branislav
    family: Kveton
  - given: Csaba
    family: Szepesvari
  - given: Zheng
    family: Wen
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1215-1224
  id: katariya16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1215
  lastpage: 1224
  published: 2016-06-11 00:00:00 +0000
- title: 'Train faster, generalize better: Stability of stochastic gradient descent'
  abstract: 'We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex and continuous optimization. We derive stability bounds for both convex and non-convex optimization under standard Lipschitz and smoothness assumptions. Applying our results to the convex case, we provide new insights for why multiple epochs of stochastic gradient methods generalize well in practice. In the non-convex case, we give a new interpretation of common practices in neural networks, and formally show that popular techniques for training large deep models are indeed stability-promoting. Our findings conceptually underscore the importance of reducing training time beyond its obvious benefit.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hardt16.html
  PDF: http://proceedings.mlr.press/v48/hardt16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hardt16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Moritz
    family: Hardt
  - given: Ben
    family: Recht
  - given: Yoram
    family: Singer
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1225-1234
  id: hardt16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1225
  lastpage: 1234
  published: 2016-06-11 00:00:00 +0000
- title: 'Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm'
  abstract: 'We study the K-armed dueling bandit problem, a variation of the standard stochastic bandit problem where the feedback is limited to relative comparisons of a pair of arms. The hardness of recommending Copeland winners, the arms that beat the greatest number of other arms, is characterized by deriving an asymptotic regret bound. We propose Copeland Winners Deterministic Minimum Empirical Divergence (CW-RMED), an algorithm inspired by the DMED algorithm (Honda and Takemura, 2010), and derive an asymptotically optimal regret bound for it. However, it is not known whether the algorithm can be efficiently computed or not. To address this issue, we devise an efficient version (ECW-RMED) and derive its asymptotic regret bound. Experimental comparisons of dueling bandit algorithms show that ECW-RMED significantly outperforms existing ones.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/komiyama16.html
  PDF: http://proceedings.mlr.press/v48/komiyama16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-komiyama16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Junpei
    family: Komiyama
  - given: Junya
    family: Honda
  - given: Hiroshi
    family: Nakagawa
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1235-1244
  id: komiyama16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1235
  lastpage: 1244
  published: 2016-06-11 00:00:00 +0000
- title: 'Contextual Combinatorial Cascading Bandits'
  abstract: 'We propose the contextual combinatorial cascading bandits, a combinatorial online learning game, where at each time step a learning agent is given a set of contextual information, then selects a list of items, and observes stochastic outcomes of a prefix in the selected items by some stopping criterion. In online recommendation, the stopping criterion might be the first item a user selects; in network routing, the stopping criterion might be the first edge blocked in a path. We consider position discounts in the list order, so that the agent’s reward is discounted depending on the position where the stopping criterion is met. We design a UCB-type algorithm, C^3-UCB, for this problem, prove an n-step regret bound \tildeO(\sqrtn) in the general setting, and give finer analysis for two special cases. Our work generalizes existing studies in several directions, including contextual information, position discounts, and a more general cascading bandit model. Experiments on synthetic and real datasets demonstrate the advantage of involving contextual information and position discounts.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lif16.html
  PDF: http://proceedings.mlr.press/v48/lif16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lif16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shuai
    family: Li
  - given: Baoxiang
    family: Wang
  - given: Shengyu
    family: Zhang
  - given: Wei
    family: Chen
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1245-1253
  id: lif16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1245
  lastpage: 1253
  published: 2016-06-11 00:00:00 +0000
- title: 'Conservative Bandits'
  abstract: 'We study a novel multi-armed bandit problem that models the challenge faced by a company wishing to explore new strategies to maximize revenue whilst simultaneously maintaining their revenue above a fixed baseline, uniformly over time. While previous work addressed the problem under the weaker requirement of maintaining the revenue constraint only at a given fixed time in the future, the design of those algorithms makes them unsuitable under the more stringent constraints. We consider both the stochastic and the adversarial settings, where we propose natural yet novel strategies and analyze the price for maintaining the constraints. Amongst other things, we prove both high probability and expectation bounds on the regret, while we also consider both the problem of maintaining the constraints with high probability or expectation. For the adversarial setting the price of maintaining the constraint appears to be higher, at least for the algorithm considered. A lower bound is given showing that the algorithm for the stochastic setting is almost optimal. Empirical results obtained in synthetic environments complement our theoretical findings.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wu16.html
  PDF: http://proceedings.mlr.press/v48/wu16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wu16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yifan
    family: Wu
  - given: Roshan
    family: Shariff
  - given: Tor
    family: Lattimore
  - given: Csaba
    family: Szepesvari
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1254-1262
  id: wu16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1254
  lastpage: 1262
  published: 2016-06-11 00:00:00 +0000
- title: 'Variance-Reduced and Projection-Free Stochastic Optimization'
  abstract: 'The Frank-Wolfe optimization algorithm has recently regained popularity for machine learning applications due to its projection-free property and its ability to handle structured constraints. However, in the stochastic learning setting, it is still relatively understudied compared to the gradient descent counterpart. In this work, leveraging a recent variance reduction technique, we propose two stochastic Frank-Wolfe variants which substantially improve previous results in terms of the number of stochastic gradient evaluations needed to achieve 1-εaccuracy. For example, we improve from O(\frac1ε) to O(\ln\frac1ε) if the objective function is smooth and strongly convex, and from O(\frac1ε^2) to O(\frac1ε^1.5) if the objective function is smooth and Lipschitz. The theoretical improvement is also observed in experiments on real-world datasets for a multiclass classification application.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hazana16.html
  PDF: http://proceedings.mlr.press/v48/hazana16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hazana16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Elad
    family: Hazan
  - given: Haipeng
    family: Luo
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1263-1271
  id: hazana16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1263
  lastpage: 1271
  published: 2016-06-11 00:00:00 +0000
- title: 'Factored Temporal Sigmoid Belief Networks for Sequence Learning'
  abstract: 'Deep conditional generative models are developed to simultaneously learn the temporal dependencies of multiple sequences. The model is designed by introducing a three-way weight tensor to capture the multiplicative interactions between side information and sequences. The proposed model builds on the Temporal Sigmoid Belief Network (TSBN), a sequential stack of Sigmoid Belief Networks (SBNs). The transition matrices are further factored to reduce the number of parameters and improve generalization. When side information is not available, a general framework for semi-supervised learning based on the proposed model is constituted, allowing robust sequence classification. Experimental results show that the proposed approach achieves state-of-the-art predictive and classification performance on sequential data, and has the capacity to synthesize sequences, with controlled style transitioning and blending.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/songa16.html
  PDF: http://proceedings.mlr.press/v48/songa16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-songa16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jiaming
    family: Song
  - given: Zhe
    family: Gan
  - given: Lawrence
    family: Carin
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1272-1281
  id: songa16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1272
  lastpage: 1281
  published: 2016-06-11 00:00:00 +0000
- title: 'False Discovery Rate Control and Statistical Quality Assessment of Annotators in Crowdsourced Ranking'
  abstract: 'With the rapid growth of crowdsourcing platforms it has become easy and relatively inexpensive to collect a dataset labeled by multiple annotators in a short time. However due to the lack of control over the quality of the annotators, some abnormal annotators may be affected by position bias which can potentially degrade the quality of the final consensus labels. In this paper we introduce a statistical framework to model and detect annotator’s position bias in order to control the false discovery rate (FDR) without a prior knowledge on the amount of biased annotators–the expected fraction of false discoveries among all discoveries being not too high, in order to assure that most of the discoveries are indeed true and replicable. The key technical development relies on some new knockoff filters adapted to our problem and new algorithms based on the Inverse Scale Space dynamics whose discretization is potentially suitable for large scale crowdsourcing data analysis. Our studies are supported by experiments with both simulated examples and real-world data. The proposed framework provides us a useful tool for quantitatively studying annotator’s abnormal behavior in crowdsourcing.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/xua16.html
  PDF: http://proceedings.mlr.press/v48/xua16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-xua16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: QianQian
    family: Xu
  - given: Jiechao
    family: Xiong
  - given: Xiaochun
    family: Cao
  - given: Yuan
    family: Yao
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1282-1291
  id: xua16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1282
  lastpage: 1291
  published: 2016-06-11 00:00:00 +0000
- title: 'Strongly-Typed Recurrent Neural Networks'
  abstract: 'Recurrent neural networks are increasing popular models for sequential learning. Unfortunately, although the most effective RNN architectures are perhaps excessively complicated, extensive searches have not found simpler alternatives. This paper imports ideas from physics and functional programming into RNN design to provide guiding principles. From physics, we introduce type constraints, analogous to the constraints that forbids adding meters to seconds. From functional programming, we require that strongly-typed architectures factorize into stateless learnware and state-dependent firmware, reducing the impact of side-effects. The features learned by strongly-typed nets have a simple semantic interpretation via dynamic average-pooling on one-dimensional convolutions. We also show that strongly-typed gradients are better behaved than in classical architectures, and characterize the representational power of strongly-typed nets. Finally, experiments show that, despite being more constrained, strongly-typed architectures achieve lower training and comparable generalization error to classical architectures.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/balduzzi16.html
  PDF: http://proceedings.mlr.press/v48/balduzzi16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-balduzzi16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Balduzzi
  - given: Muhammad
    family: Ghifary
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1292-1300
  id: balduzzi16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1292
  lastpage: 1300
  published: 2016-06-11 00:00:00 +0000
- title: 'Distributed Clustering of Linear Bandits in Peer to Peer Networks'
  abstract: 'We provide two distributed confidence ball algorithms for solving linear bandit problems in peer to peer networks with limited communication capabilities. For the first, we assume that all the peers are solving the same linear bandit problem, and prove that our algorithm achieves the optimal asymptotic regret rate of any centralised algorithm that can instantly communicate information between the peers. For the second, we assume that there are clusters of peers solving the same bandit problem within each cluster, and we prove that our algorithm discovers these clusters, while achieving the optimal asymptotic regret rate within each one. Through experiments on several real-world datasets, we demonstrate the performance of proposed algorithms compared to the state-of-the-art.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/korda16.html
  PDF: http://proceedings.mlr.press/v48/korda16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-korda16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nathan
    family: Korda
  - given: Balazs
    family: Szorenyi
  - given: Shuai
    family: Li
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1301-1309
  id: korda16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1301
  lastpage: 1309
  published: 2016-06-11 00:00:00 +0000
- title: 'Collapsed Variational Inference for Sum-Product Networks'
  abstract: 'Sum-Product Networks (SPNs) are probabilistic inference machines that admit exact inference in linear time in the size of the network. Existing parameter learning approaches for SPNs are largely based on the maximum likelihood principle and are subject to overfitting compared to more Bayesian approaches. Exact Bayesian posterior inference for SPNs is computationally intractable. Even approximation techniques such as standard variational inference and posterior sampling for SPNs are computationally infeasible even for networks of moderate size due to the large number of local latent variables per instance. In this work, we propose a novel deterministic collapsed variational inference algorithm for SPNs that is computationally efficient, easy to implement and at the same time allows us to incorporate prior information into the optimization formulation. Extensive experiments show a significant improvement in accuracy compared with a maximum likelihood based approach.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zhaoa16.html
  PDF: http://proceedings.mlr.press/v48/zhaoa16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zhaoa16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Han
    family: Zhao
  - given: Tameem
    family: Adel
  - given: Geoff
    family: Gordon
  - given: Brandon
    family: Amos
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1310-1318
  id: zhaoa16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1310
  lastpage: 1318
  published: 2016-06-11 00:00:00 +0000
- title: 'On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search'
  abstract: 'Over the past decade, Monte Carlo Tree Search (MCTS) and specifically Upper Confidence Bound in Trees (UCT) have proven to be quite effective in large probabilistic planning domains. In this paper, we focus on how values are backpropagated in the MCTS tree, and apply complex return strategies from the Reinforcement Learning (RL) literature to MCTS, producing 4 new MCTS variants. We demonstrate that in some probabilistic planning benchmarks from the International Planning Competition (IPC), selecting a MCTS variant with a backup strategy different from Monte Carlo averaging can lead to substantially better results. We also propose a hypothesis for why different backup strategies lead to different performance in particular environments, and manipulate a carefully structured grid-world domain to provide empirical evidence supporting our hypothesis.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/khandelwal16.html
  PDF: http://proceedings.mlr.press/v48/khandelwal16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-khandelwal16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Piyush
    family: Khandelwal
  - given: Elad
    family: Liebman
  - given: Scott
    family: Niekum
  - given: Peter
    family: Stone
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1319-1328
  id: khandelwal16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1319
  lastpage: 1328
  published: 2016-06-11 00:00:00 +0000
- title: 'Benchmarking Deep Reinforcement Learning for Continuous Control'
  abstract: 'Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/duan16.html
  PDF: http://proceedings.mlr.press/v48/duan16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-duan16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yan
    family: Duan
  - given: Xi
    family: Chen
  - given: Rein
    family: Houthooft
  - given: John
    family: Schulman
  - given: Pieter
    family: Abbeel
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1329-1338
  id: duan16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1329
  lastpage: 1338
  published: 2016-06-11 00:00:00 +0000
- title: 'K-Means Clustering with Distributed Dimensions'
  abstract: 'Distributed clustering has attracted significant attention in recent years. In this paper, we study the k-means problem in the distributed dimension setting, where the dimensions of the data are partitioned across multiple machines. We provide new approximation algorithms, which incur low communication costs and achieve constant approximation ratios. The communication complexity of our algorithms significantly improve on existing algorithms. We also provide the first communication lower bound, which nearly matches our upper bound in a certain range of parameter setting. Our experimental results show that our algorithms outperform existing algorithms on real data-sets in the distributed dimension setting.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ding16.html
  PDF: http://proceedings.mlr.press/v48/ding16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ding16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hu
    family: Ding
  - given: Yu
    family: Liu
  - given: Lingxiao
    family: Huang
  - given: Jian
    family: Li
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1339-1348
  id: ding16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1339
  lastpage: 1348
  published: 2016-06-11 00:00:00 +0000
- title: 'Texture Networks: Feed-forward Synthesis of Textures and Stylized Images'
  abstract: 'Gatys et al. recently demonstrated that deep networks can generate beautiful textures and stylized images from a single texture example. However, their methods requires a slow and memory-consuming optimization process. We propose here an alternative approach that moves the computational burden to a learning stage. Given a single example of a texture, our approach trains compact feed-forward convolutional networks to generate multiple samples of the same texture of arbitrary size and to transfer artistic style from a given image to any other image. The resulting networks are remarkably light-weight and can generate textures of quality comparable to Gatys et al., but hundreds of times faster. More generally, our approach highlights the power and flexibility of generative feed-forward models trained with complex and expressive loss functions.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ulyanov16.html
  PDF: http://proceedings.mlr.press/v48/ulyanov16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ulyanov16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Dmitry
    family: Ulyanov
  - given: Vadim
    family: Lebedev
  - given: 
    family: Andrea
  - given: Victor
    family: Lempitsky
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1349-1357
  id: ulyanov16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1349
  lastpage: 1357
  published: 2016-06-11 00:00:00 +0000
- title: 'Fast Constrained Submodular Maximization: Personalized Data Summarization'
  abstract: 'Can we summarize multi-category data based on user preferences in a scalable manner? Many utility functions used for data summarization satisfy submodularity, a natural diminishing returns property. We cast personalized data summarization as an instance of a general submodular maximization problem subject to multiple constraints. We develop the first practical and FAst coNsTrained submOdular Maximization algorithm, FANTOM, with strong theoretical guarantees. FANTOM maximizes a submodular function (not necessarily monotone) subject to intersection of a p-system and l knapsacks constrains. It achieves a (1 + ε)(p + 1)(2p + 2l + 1)/p approximation guarantee with only O(nrp log(n)/ε) query complexity (n and r indicate the size of the ground set and the size of the largest feasible solution, respectively). We then show how we can use FANTOM for personalized data summarization. In particular, a p-system can model different aspects of data, such as categories or time stamps, from which the users choose. In addition, knapsacks encode users’ constraints including budget or time. In our set of experiments, we consider several concrete applications: movie recommendation over 11K movies, personalized image summarization with 10K images, and revenue maximization on the YouTube social networks with 5000 communities. We observe that FANTOM constantly provides the highest utility against all the baselines.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/mirzasoleiman16.html
  PDF: http://proceedings.mlr.press/v48/mirzasoleiman16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-mirzasoleiman16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Baharan
    family: Mirzasoleiman
  - given: Ashwinkumar
    family: Badanidiyuru
  - given: Amin
    family: Karbasi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1358-1367
  id: mirzasoleiman16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1358
  lastpage: 1367
  published: 2016-06-11 00:00:00 +0000
- title: 'On the Statistical Limits of Convex Relaxations'
  abstract: 'Many high dimensional sparse learning problems are formulated as nonconvex optimization. A popular approach to solve these nonconvex optimization problems is through convex relaxations such as linear and semidefinite programming. In this paper, we study the statistical limits of convex relaxations. Particularly, we consider two problems: Mean estimation for sparse principal submatrix and edge probability estimation for stochastic block model. We exploit the sum-of-squares relaxation hierarchy to sharply characterize the limits of a broad class of convex relaxations. Our result shows statistical optimality needs to be compromised for achieving computational tractability using convex relaxations. Compared with existing results on computational lower bounds for statistical problems, which consider general polynomial-time algorithms and rely on computational hardness hypotheses on problems like planted clique detection, our theory focuses on a broad class of convex relaxations and does not rely on unproven hypotheses.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wangc16.html
  PDF: http://proceedings.mlr.press/v48/wangc16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wangc16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhaoran
    family: Wang
  - given: Quanquan
    family: Gu
  - given: Han
    family: Liu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1368-1377
  id: wangc16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1368
  lastpage: 1377
  published: 2016-06-11 00:00:00 +0000
- title: 'Ask Me Anything: Dynamic Memory Networks for Natural Language Processing'
  abstract: 'Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state-of-the-art results on several types of tasks and datasets: question answering (Facebook’s bAbI dataset), text classification for sentiment analysis (Stanford Sentiment Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The training for these different tasks relies exclusively on trained word vector representations and input-question-answer triplets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kumar16.html
  PDF: http://proceedings.mlr.press/v48/kumar16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kumar16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ankit
    family: Kumar
  - given: Ozan
    family: Irsoy
  - given: Peter
    family: Ondruska
  - given: Mohit
    family: Iyyer
  - given: James
    family: Bradbury
  - given: Ishaan
    family: Gulrajani
  - given: Victor
    family: Zhong
  - given: Romain
    family: Paulus
  - given: Richard
    family: Socher
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1378-1387
  id: kumar16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1378
  lastpage: 1387
  published: 2016-06-11 00:00:00 +0000
- title: 'Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions'
  abstract: 'In decentralized networks (of sensors, connected objects, etc.), there is an important need for efficient algorithms to optimize a global cost function, for instance to learn a global model from the local data collected by each computing unit. In this paper, we address the problem of decentralized minimization of pairwise functions of the data points, where these points are distributed over the nodes of a graph defining the communication topology of the network. This general problem finds applications in ranking, distance metric learning and graph inference, among others. We propose new gossip algorithms based on dual averaging which aims at solving such problems both in synchronous and asynchronous settings. The proposed framework is flexible enough to deal with constrained and regularized variants of the optimization problem. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. We present numerical simulations on Area Under the ROC Curve (AUC) maximization and metric learning problems which illustrate the practical interest of our approach.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/colin16.html
  PDF: http://proceedings.mlr.press/v48/colin16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-colin16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Igor
    family: Colin
  - given: Aurelien
    family: Bellet
  - given: Joseph
    family: Salmon
  - given: Stéphan
    family: Clémençon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1388-1396
  id: colin16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1388
  lastpage: 1396
  published: 2016-06-11 00:00:00 +0000
- title: 'Solving Ridge Regression using Sketched Preconditioned SVRG'
  abstract: 'We develop a novel preconditioning method for ridge regression, based on recent linear sketching methods. By equipping Stochastic Variance Reduced Gradient (SVRG) with this preconditioning process, we obtain a significant speed-up relative to fast stochastic methods such as SVRG, SDCA and SAG.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gonen16.html
  PDF: http://proceedings.mlr.press/v48/gonen16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gonen16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Alon
    family: Gonen
  - given: Francesco
    family: Orabona
  - given: Shai
    family: Shalev-Shwartz
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1397-1405
  id: gonen16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1397
  lastpage: 1405
  published: 2016-06-11 00:00:00 +0000
- title: 'Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control'
  abstract: 'Cumulative prospect theory (CPT) is known to model human decisions well, with substantial empirical evidence supporting this claim. CPT works by distorting probabilities and is more general than the classic expected utility and coherent risk measures. We bring this idea to a risk-sensitive reinforcement learning (RL) setting and design algorithms for both estimation and control. The RL setting presents two particular challenges when CPT is applied: estimating the CPT objective requires estimations of the entire distribution of the value function and finding a randomized optimal policy. The estimation scheme that we propose uses the empirical distribution to estimate the CPT-value of a random variable. We then use this scheme in the inner loop of a CPT-value optimization procedure that is based on the well-known simulation optimization idea of simultaneous perturbation stochastic approximation (SPSA). We provide theoretical convergence guarantees for all the proposed algorithms and also empirically demonstrate the usefulness of our algorithms.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/la16.html
  PDF: http://proceedings.mlr.press/v48/la16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-la16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Prashanth
    family: L.A.
  - given: Cheng
    family: Jie
  - given: Michael
    family: Fu
  - given: Steve
    family: Marcus
  - given: Csaba
    family: Szepesvari
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1406-1415
  id: la16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1406
  lastpage: 1415
  published: 2016-06-11 00:00:00 +0000
- title: 'Estimating Accuracy from Unlabeled Data: A Bayesian Approach'
  abstract: 'We consider the question of how unlabeled data can be used to estimate the true accuracy of learned classifiers, and the related question of how outputs from several classifiers performing the same task can be combined based on their estimated accuracies. To answer these questions, we first present a simple graphical model that performs well in practice. We then provide two nonparametric extensions to it that improve its performance. Experiments on two real-world data sets produce accuracy estimates within a few percent of the true accuracy, using solely unlabeled data. Our models also outperform existing state-of-the-art solutions in both estimating accuracies, and combining multiple classifier outputs.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/platanios16.html
  PDF: http://proceedings.mlr.press/v48/platanios16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-platanios16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Emmanouil Antonios
    family: Platanios
  - given: Avinava
    family: Dubey
  - given: Tom
    family: Mitchell
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1416-1425
  id: platanios16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1416
  lastpage: 1425
  published: 2016-06-11 00:00:00 +0000
- title: 'Non-negative Matrix Factorization under Heavy Noise'
  abstract: 'The Noisy Non-negative Matrix factorization (NMF) is: given a data matrix A (d x n), find non-negative matrices B;C (d x k, k x n respy.) so that A = BC +N, where N is a noise matrix. Existing polynomial time algorithms with proven error guarantees require EACH column N_⋅j to have l1 norm much smaller than ||(BC)_⋅j ||_1, which could be very restrictive. In important applications of NMF such as Topic Modeling as well as theoretical noise models (e.g. Gaussian with high sigma), almost EVERY column of N_.j violates this condition. We introduce the heavy noise model which only requires the average noise over large subsets of columns to be small. We initiate a study of Noisy NMF under the heavy noise model. We show that our noise model subsumes noise models of theoretical and practical interest (for e.g. Gaussian noise of maximum possible sigma). We then devise an algorithm TSVDNMF which under certain assumptions on B,C, solves the problem under heavy noise. Our error guarantees match those of previous algorithms. Our running time of O(k.(d+n)^2) is substantially better than the O(d.n^3) for the previous best. Our assumption on B is weaker than the “Separability” assumption made by all previous results. We provide empirical justification for our assumptions on C. We also provide the first proof of identifiability (uniqueness of B) for noisy NMF which is not based on separability and does not use hard to check geometric conditions. Our algorithm outperforms earlier polynomial time algorithms both in time and error, particularly in the presence of high noise.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bhattacharya16.html
  PDF: http://proceedings.mlr.press/v48/bhattacharya16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bhattacharya16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chiranjib
    family: Bhattacharya
  - given: Navin
    family: Goyal
  - given: Ravindran
    family: Kannan
  - given: Jagdeep
    family: Pani
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1426-1434
  id: bhattacharya16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1426
  lastpage: 1434
  published: 2016-06-11 00:00:00 +0000
- title: 'Extreme F-measure Maximization using Sparse Probability Estimates'
  abstract: 'We consider the problem of (macro) F-measure maximization in the context of extreme multi-label classification (XMLC), i.e., multi-label classification with extremely large label spaces. We investigate several approaches based on recent results on the maximization of complex performance measures in binary classification. According to these results, the F-measure can be maximized by properly thresholding conditional class probability estimates. We show that a naive adaptation of this approach can be very costly for XMLC and propose to solve the problem by classifiers that efficiently deliver sparse probability estimates (SPEs), that is, probability estimates restricted to the most probable labels. Empirical results provide evidence for the strong practical performance of this approach.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/jasinska16.html
  PDF: http://proceedings.mlr.press/v48/jasinska16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-jasinska16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Kalina
    family: Jasinska
  - given: Krzysztof
    family: Dembczynski
  - given: Robert
    family: Busa-Fekete
  - given: Karlson
    family: Pfannschmidt
  - given: Timo
    family: Klerx
  - given: Eyke
    family: Hullermeier
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1435-1444
  id: jasinska16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1435
  lastpage: 1444
  published: 2016-06-11 00:00:00 +0000
- title: 'Auxiliary Deep Generative Models'
  abstract: 'Deep generative models parameterized by neural networks have recently achieved state-of-the-art performance in unsupervised and semi-supervised learning. We extend deep generative models with auxiliary variables which improves the variational approximation. The auxiliary variables leave the generative model unchanged but make the variational distribution more expressive. Inspired by the structure of the auxiliary variable we also propose a model with two stochastic layers and skip connections. Our findings suggest that more expressive and properly specified deep generative models converge faster with better results. We show state-of-the-art performance within semi-supervised learning on MNIST, SVHN and NORB datasets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/maaloe16.html
  PDF: http://proceedings.mlr.press/v48/maaloe16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-maaloe16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Lars
    family: Maaløe
  - given: Casper Kaae
    family: Sønderby
  - given: Søren Kaae
    family: Sønderby
  - given: Ole
    family: Winther
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1445-1453
  id: maaloe16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1445
  lastpage: 1453
  published: 2016-06-11 00:00:00 +0000
- title: 'Importance Sampling Tree for Large-scale Empirical Expectation'
  abstract: 'We propose a tree-based procedure inspired by the Monte-Carlo Tree Search that dynamically modulates an importance-based sampling to prioritize computation, while getting unbiased estimates of weighted sums. We apply this generic method to learning on very large training sets, and to the evaluation of large-scale SVMs. The core idea is to reformulate the estimation of a score - whether a loss or a prediction estimate - as an empirical expectation, and to use such a tree whose leaves carry the samples to focus efforts over the problematic "heavy weight" ones. We illustrate the potential of this approach on three problems: to improve Adaboost and a multi-layer perceptron on 2D synthetic tasks with several million points, to train a large-scale convolution network on several millions deformations of the CIFAR data-set, and to compute the response of a SVM with several hundreds of thousands of support vectors. In each case, we show how it either cuts down computation by more than one order of magnitude and/or allows to get better loss estimates.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/canevet16.html
  PDF: http://proceedings.mlr.press/v48/canevet16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-canevet16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Olivier
    family: Canevet
  - given: Cijo
    family: Jose
  - given: Francois
    family: Fleuret
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1454-1462
  id: canevet16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1454
  lastpage: 1462
  published: 2016-06-11 00:00:00 +0000
- title: 'Starting Small - Learning with Adaptive Sample Sizes'
  abstract: 'For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set. In this context, we investigate strategies for dynamically increasing the effective sample size, when using iterative methods such as stochastic gradient descent. Our interest is motivated by the rise of variance-reduced methods, which achieve linear convergence rates that scale favorably for smaller sample sizes. Exploiting this feature, we show - theoretically and empirically - how to obtain significant speed-ups with a novel algorithm that reaches statistical accuracy on an n-sample in 2n, instead of n log n steps.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/daneshmand16.html
  PDF: http://proceedings.mlr.press/v48/daneshmand16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-daneshmand16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hadi
    family: Daneshmand
  - given: Aurelien
    family: Lucchi
  - given: Thomas
    family: Hofmann
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1463-1471
  id: daneshmand16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1463
  lastpage: 1471
  published: 2016-06-11 00:00:00 +0000
- title: 'Deep Gaussian Processes for Regression using Approximate Expectation Propagation'
  abstract: 'Deep Gaussian processes (DGPs) are multi-layer hierarchical generalisations of Gaussian processes (GPs) and are formally equivalent to neural networks with multiple, infinitely wide hidden layers. DGPs are nonparametric probabilistic models and as such are arguably more flexible, have a greater capacity to generalise, and provide better calibrated uncertainty estimates than alternative deep models. This paper develops a new approximate Bayesian learning scheme that enables DGPs to be applied to a range of medium to large scale regression problems for the first time. The new method uses an approximate Expectation Propagation procedure and a novel and efficient extension of the probabilistic backpropagation algorithm for learning. We evaluate the new method for non-linear regression on eleven real-world datasets, showing that it always outperforms GP regression and is almost always better than state-of-the-art deterministic and sampling-based approximate inference methods for Bayesian neural networks. As a by-product, this work provides a comprehensive analysis of six approximate Bayesian methods for training neural networks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bui16.html
  PDF: http://proceedings.mlr.press/v48/bui16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bui16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Thang
    family: Bui
  - given: Daniel
    family: Hernandez-Lobato
  - given: Jose
    family: Hernandez-Lobato
  - given: Yingzhen
    family: Li
  - given: Richard
    family: Turner
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1472-1481
  id: bui16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1472
  lastpage: 1481
  published: 2016-06-11 00:00:00 +0000
- title: 'DR-ABC: Approximate Bayesian Computation with Kernel-Based Distribution Regression'
  abstract: 'Performing exact posterior inference in complex generative models is often difficult or impossible due to an expensive to evaluate or intractable likelihood function. Approximate Bayesian computation (ABC) is an inference framework that constructs an approximation to the true likelihood based on the similarity between the observed and simulated data as measured by a predefined set of summary statistics. Although the choice of informative problem-specific summary statistics crucially influences the quality of the likelihood approximation and hence also the quality of the posterior sample in ABC, there are only few principled general-purpose approaches to the selection or construction of such summary statistics. In this paper, we develop a novel framework for solving this problem. We model the functional relationship between the data and the optimal choice (with respect to a loss function) of summary statistics using kernel-based distribution regression. Furthermore, we extend our approach to incorporate kernel-based regression from conditional distributions, thus appropriately taking into account the specific structure of the posited generative model. We show that our approach can be implemented in a computationally and statistically efficient way using the random Fourier features framework for large-scale kernel learning. In addition to that, our framework outperforms related methods by a large margin on toy and real-world data, including hierarchical and time series models.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/mitrovic16.html
  PDF: http://proceedings.mlr.press/v48/mitrovic16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-mitrovic16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jovana
    family: Mitrovic
  - given: Dino
    family: Sejdinovic
  - given: Yee-Whye
    family: Teh
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1482-1491
  id: mitrovic16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1482
  lastpage: 1491
  published: 2016-06-11 00:00:00 +0000
- title: 'Predictive Entropy Search for Multi-objective Bayesian Optimization'
  abstract: 'We present \small PESMO, a Bayesian method for identifying the Pareto set of multi-objective optimization problems, when the functions are expensive to evaluate. \small PESMO chooses the evaluation points to maximally reduce the entropy of the posterior distribution over the Pareto set. The \small PESMO acquisition function is decomposed as a sum of objective-specific acquisition functions, which makes it possible to use the algorithm in \emphdecoupled scenarios in which the objectives can be evaluated separately and perhaps with different costs. This decoupling capability is useful to identify difficult objectives that require more evaluations. \small PESMO also offers gains in efficiency, as its cost scales linearly with the number of objectives, in comparison to the exponential cost of other methods. We compare \small PESMO with other methods on synthetic and real-world problems. The results show that \small PESMO produces better recommendations with a smaller number of evaluations, and that a decoupled evaluation can lead to improvements in performance, particularly when the number of objectives is large.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hernandez-lobatoa16.html
  PDF: http://proceedings.mlr.press/v48/hernandez-lobatoa16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hernandez-lobatoa16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Daniel
    family: Hernandez-Lobato
  - given: Jose
    family: Hernandez-Lobato
  - given: Amar
    family: Shah
  - given: Ryan
    family: Adams
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1492-1501
  id: hernandez-lobatoa16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1492
  lastpage: 1501
  published: 2016-06-11 00:00:00 +0000
- title: 'Rich Component Analysis'
  abstract: 'In many settings, we have multiple data sets (also called views) that capture different and overlapping aspects of the same phenomenon. We are often interested in finding patterns that are unique to one or to a subset of the views. For example, we might have one set of molecular observations and one set of physiological observations on the same group of individuals, and we want to quantify molecular patterns that are uncorrelated with physiology. Despite being a common problem, this is highly challenging when the correlations come from complex distributions. In this paper, we develop the general framework of Rich Component Analysis (RCA) to model settings where the observations from different views are driven by different sets of latent components, and each component can be a complex, high-dimensional distribution. We introduce algorithms based on cumulant extraction that provably learn each of the components without having to model the other components. We show how to integrate RCA with stochastic gradient descent into a meta-algorithm for learning general models, and demonstrate substantial improvement in accuracy on several synthetic and real datasets in both supervised and unsupervised tasks. Our method makes it possible to learn latent variable models when we don’t have samples from the true model but only samples after complex perturbations.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gea16.html
  PDF: http://proceedings.mlr.press/v48/gea16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gea16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Rong
    family: Ge
  - given: James
    family: Zou
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1502-1510
  id: gea16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1502
  lastpage: 1510
  published: 2016-06-11 00:00:00 +0000
- title: 'Black-Box Alpha Divergence Minimization'
  abstract: 'Black-box alpha (BB-α) is a new approximate inference method based on the minimization of α-divergences. BB-αscales to large datasets because it can be implemented using stochastic gradient descent. BB-αcan be applied to complex probabilistic models with little effort since it only requires as input the likelihood function and its gradients. These gradients can be easily obtained using automatic differentiation. By changing the divergence parameter α, the method is able to interpolate between variational Bayes (VB) (α→0) and an algorithm similar to expectation propagation (EP) (α= 1). Experiments on probit regression and neural network regression and classification problems show that BB-αwith non-standard settings of α, such as α= 0.5, usually produces better predictions than with α→0 (VB) or α= 1 (EP).'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hernandez-lobatob16.html
  PDF: http://proceedings.mlr.press/v48/hernandez-lobatob16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hernandez-lobatob16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jose
    family: Hernandez-Lobato
  - given: Yingzhen
    family: Li
  - given: Mark
    family: Rowland
  - given: Thang
    family: Bui
  - given: Daniel
    family: Hernandez-Lobato
  - given: Richard
    family: Turner
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1511-1520
  id: hernandez-lobatob16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1511
  lastpage: 1520
  published: 2016-06-11 00:00:00 +0000
- title: 'One-Shot Generalization in Deep Generative Models'
  abstract: 'Humans have an impressive ability to reason about new concepts and experiences from just a single example. In particular, humans have an ability for one-shot generalization: an ability to encounter a new concept, understand its structure, and then be able to generate compelling alternative variations of the concept. We develop machine learning systems with this important capacity by developing new deep generative models, models that combine the representational power of deep learning with the inferential power of Bayesian reasoning. We develop a class of sequential generative models that are built on the principles of feedback and attention. These two characteristics lead to generative models that are among the state-of-the art in density estimation and image generation. We demonstrate the one-shot generalization ability of our models using three tasks: unconditional sampling, generating new exemplars of a given concept, and generating new exemplars of a family of concepts. In all cases our models are able to generate compelling and diverse samples—having seen new examples just once—providing an important class of general-purpose models for one-shot machine learning.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/rezende16.html
  PDF: http://proceedings.mlr.press/v48/rezende16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-rezende16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Danilo
    family: Rezende
  - given: 
    family: Shakir
  - given: Ivo
    family: Danihelka
  - given: Karol
    family: Gregor
  - given: Daan
    family: Wierstra
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1521-1529
  id: rezende16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1521
  lastpage: 1529
  published: 2016-06-11 00:00:00 +0000
- title: 'Optimal Classification with Multivariate Losses'
  abstract: 'Multivariate loss functions are extensively employed in several prediction tasks arising in Information Retrieval. Often, the goal in the tasks is to minimize expected loss when retrieving relevant items from a presented set of items, where the expectation is with respect to the joint distribution over item sets. Our key result is that for most multivariate losses, the expected loss is provably optimized by sorting the items by the conditional probability of label being positive and then selecting top k items. Such a result was previously known only for the F-measure. Leveraging on the optimality characterization, we give an algorithm for estimating optimal predictions in practice with runtime quadratic in size of item sets for many losses. We provide empirical results on benchmark datasets, comparing the proposed algorithm to state-of-the-art methods for optimizing multivariate losses.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/natarajan16.html
  PDF: http://proceedings.mlr.press/v48/natarajan16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-natarajan16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nagarajan
    family: Natarajan
  - given: Oluwasanmi
    family: Koyejo
  - given: Pradeep
    family: Ravikumar
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1530-1538
  id: natarajan16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1530
  lastpage: 1538
  published: 2016-06-11 00:00:00 +0000
- title: 'A ranking approach to global optimization'
  abstract: 'We consider the problem of maximizing an unknown function f over a compact and convex set using as few observations f(x) as possible. We observe that the optimization of the function f essentially relies on learning the induced bipartite ranking rule of f. Based on this idea, we relate global optimization to bipartite ranking which allows to address problems with high dimensional input space, as well as cases of functions with weak regularity properties. The paper introduces novel meta-algorithms for global optimization which rely on the choice of any bipartite ranking method. Theoretical properties are provided as well as convergence guarantees and equivalences between various optimization methods are obtained as a by-product. Eventually, numerical evidence is given to show that the main algorithm of the paper which adapts empirically to the underlying ranking structure essentially outperforms existing state-of-the-art global optimization algorithms in typical benchmarks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/malherbe16.html
  PDF: http://proceedings.mlr.press/v48/malherbe16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-malherbe16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Cedric
    family: Malherbe
  - given: Emile
    family: Contal
  - given: Nicolas
    family: Vayatis
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1539-1547
  id: malherbe16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1539
  lastpage: 1547
  published: 2016-06-11 00:00:00 +0000
- title: 'Parallel and Distributed Block-Coordinate Frank-Wolfe Algorithms'
  abstract: 'We study parallel and distributed Frank-Wolfe algorithms; the former on shared memory machines with mini-batching, and the latter in a delayed update framework. In both cases, we perform computations asynchronously whenever possible. We assume block-separable constraints as in Block-Coordinate Frank-Wolfe (BCFW) method (Lacoste et. al., 2013) , but our analysis subsumes BCFW and reveals problem-dependent quantities that govern the speedups of our methods over BCFW. A notable feature of our algorithms is that they do not depend on worst-case bounded delays, but only (mildly) on **expected** delays, making them robust to stragglers and faulty worker threads. We present experiments on structural SVM and Group Fused Lasso, and observe significant speedups over competing state-of-the-art (and synchronous) methods.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wangd16.html
  PDF: http://proceedings.mlr.press/v48/wangd16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wangd16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yu-Xiang
    family: Wang
  - given: Veeranjaneyulu
    family: Sadhanala
  - given: Wei
    family: Dai
  - given: Willie
    family: Neiswanger
  - given: Suvrit
    family: Sra
  - given: Eric
    family: Xing
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1548-1557
  id: wangd16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1548
  lastpage: 1557
  published: 2016-06-11 00:00:00 +0000
- title: 'Autoencoding beyond pixels using a learned similarity metric'
  abstract: 'We present an autoencoder that leverages learned representations to better measure similarities in data space. By combining a variational autoencoder (VAE) with a generative adversarial network (GAN) we can use learned feature representations in the GAN discriminator as basis for the VAE reconstruction objective. Thereby, we replace element-wise errors with feature-wise errors to better capture the data distribution while offering invariance towards e.g. translation. We apply our method to images of faces and show that it outperforms VAEs with element-wise similarity measures in terms of visual fidelity. Moreover, we show that the method learns an embedding in which high-level abstract visual features (e.g. wearing glasses) can be modified using simple arithmetic.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/larsen16.html
  PDF: http://proceedings.mlr.press/v48/larsen16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-larsen16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Anders Boesen Lindbo
    family: Larsen
  - given: Søren Kaae
    family: Sønderby
  - given: Hugo
    family: Larochelle
  - given: Ole
    family: Winther
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1558-1566
  id: larsen16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1558
  lastpage: 1566
  published: 2016-06-11 00:00:00 +0000
- title: 'Ensuring Rapid Mixing and Low Bias for Asynchronous Gibbs Sampling'
  abstract: 'Gibbs sampling is a Markov chain Monte Carlo technique commonly used for estimating marginal distributions. To speed up Gibbs sampling, there has recently been interest in parallelizing it by executing asynchronously. While empirical results suggest that many models can be efficiently sampled asynchronously, traditional Markov chain analysis does not apply to the asynchronous case, and thus asynchronous Gibbs sampling is poorly understood. In this paper, we derive a better understanding of the two main challenges of asynchronous Gibbs: bias and mixing time. We show experimentally that our theoretical results match practical outcomes.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/sa16.html
  PDF: http://proceedings.mlr.press/v48/sa16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-sa16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Christopher De
    family: Sa
  - given: Chris
    family: Re
  - given: Kunle
    family: Olukotun
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1567-1576
  id: sa16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1567
  lastpage: 1576
  published: 2016-06-11 00:00:00 +0000
- title: 'Simultaneous Safe Screening of Features and Samples in Doubly Sparse Modeling'
  abstract: 'The problem of learning a sparse model is conceptually interpreted as the process of identifying active features/samples and then optimizing the model over them. Recently introduced safe screening allows us to identify a part of non-active features/samples. So far, safe screening has been individually studied either for feature screening or for sample screening. In this paper, we introduce a new approach for safely screening features and samples simultaneously by alternatively iterating feature and sample screening steps. A significant advantage of considering them simultaneously rather than individually is that they have a synergy effect in the sense that the results of the previous safe feature screening can be exploited for improving the next safe sample screening performances, and vice-versa. We first theoretically investigate the synergy effect, and then illustrate the practical advantage through intensive numerical experiments for problems with large numbers of features and samples.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shibagaki16.html
  PDF: http://proceedings.mlr.press/v48/shibagaki16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shibagaki16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Atsushi
    family: Shibagaki
  - given: Masayuki
    family: Karasuyama
  - given: Kohei
    family: Hatano
  - given: Ichiro
    family: Takeuchi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1577-1586
  id: shibagaki16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1577
  lastpage: 1586
  published: 2016-06-11 00:00:00 +0000
- title: 'Anytime optimal algorithms in stochastic multi-armed bandits'
  abstract: 'We introduce an anytime algorithm for stochastic multi-armed bandit with optimal distribution free and distribution dependent bounds (for a specific family of parameters). The performances of this algorithm (as well as another one motivated by the conjectured optimal bound) are evaluated empirically. A similar analysis is provided with full information, to serve as a benchmark.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/degenne16.html
  PDF: http://proceedings.mlr.press/v48/degenne16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-degenne16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Rémy
    family: Degenne
  - given: Vianney
    family: Perchet
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1587-1595
  id: degenne16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1587
  lastpage: 1595
  published: 2016-06-11 00:00:00 +0000
- title: 'Bounded Off-Policy Evaluation with Missing Data for Course Recommendation and Curriculum Design'
  abstract: 'Successfully recommending personalized course schedules is a difficult problem given the diversity of students knowledge, learning behaviour, and goals. This paper presents personalized course recommendation and curriculum design algorithms that exploit logged student data. The algorithms are based on the regression estimator for contextual multi-armed bandits with a penalized variance term. Guarantees on the predictive performance of the algorithms are provided using empirical Bernstein bounds. We also provide guidelines for including expert domain knowledge into the recommendations. Using undergraduate engineering logged data from a post-secondary institution we illustrate the performance of these algorithms.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hoiles16.html
  PDF: http://proceedings.mlr.press/v48/hoiles16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hoiles16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: William
    family: Hoiles
  - given: Mihaela
    family: Schaar
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1596-1604
  id: hoiles16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1596
  lastpage: 1604
  published: 2016-06-11 00:00:00 +0000
- title: 'On collapsed representation of hierarchical Completely Random Measures'
  abstract: 'The aim of the paper is to provide an exact approach for generating a Poisson process sampled from a hierarchical CRM, without having to instantiate the infinitely many atoms of the random measures. We use completely random measures (CRM) and hierarchical CRM to define a prior for Poisson processes. We derive the marginal distribution of the resultant point process, when the underlying CRM is marginalized out. Using well known properties unique to Poisson processes, we were able to derive an exact approach for instantiating a Poisson process with a hierarchical CRM prior. Furthermore, we derive Gibbs sampling strategies for hierarchical CRM models based on Chinese restaurant franchise sampling scheme. As an example, we present the sum of generalized gamma process (SGGP), and show its application in topic-modelling. We show that one can determine the power-law behaviour of the topics and words in a Bayesian fashion, by defining a prior on the parameters of SGGP.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/pandey16.html
  PDF: http://proceedings.mlr.press/v48/pandey16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-pandey16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Gaurav
    family: Pandey
  - given: Ambedkar
    family: Dukkipati
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1605-1613
  id: pandey16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1605
  lastpage: 1613
  published: 2016-06-11 00:00:00 +0000
- title: 'From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification'
  abstract: 'We propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, we show how its Jacobian can be efficiently computed, enabling its use in a network trained with backpropagation. Then, we propose a new smooth and convex loss function which is the sparsemax analogue of the logistic loss. We reveal an unexpected connection between this new loss and the Huber classification loss. We obtain promising empirical results in multi-label classification problems and in attention-based neural networks for natural language inference. For the latter, we achieve a similar performance as the traditional softmax, but with a selective, more compact, attention focus.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/martins16.html
  PDF: http://proceedings.mlr.press/v48/martins16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-martins16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Andre
    family: Martins
  - given: Ramon
    family: Astudillo
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1614-1623
  id: martins16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1614
  lastpage: 1623
  published: 2016-06-11 00:00:00 +0000
- title: 'Black-box Optimization with a Politician'
  abstract: 'We propose a new framework for black-box convex optimization which is well-suited for situations where gradient computations are expensive. We derive a new method for this framework which leverages several concepts from convex optimization, from standard first-order methods (e.g. gradient descent or quasi-Newton methods) to analytical centers (i.e. minimizers of self-concordant barriers). We demonstrate empirically that our new technique compares favorably with state of the art algorithms (such as BFGS).'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bubeck16.html
  PDF: http://proceedings.mlr.press/v48/bubeck16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bubeck16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sebastien
    family: Bubeck
  - given: Yin Tat
    family: Lee
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1624-1631
  id: bubeck16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1624
  lastpage: 1631
  published: 2016-06-11 00:00:00 +0000
- title: 'Gaussian process nonparametric tensor estimator and its minimax optimality'
  abstract: 'We investigate the statistical efficiency of a nonparametric Gaussian process method for a nonlinear tensor estimation problem. Low-rank tensor estimation has been used as a method to learn higher order relations among several data sources in a wide range of applications, such as multi-task learning, recommendation systems, and spatiotemporal analysis. We consider a general setting where a common linear tensor learning is extended to a nonlinear learning problem in reproducing kernel Hilbert space and propose a nonparametric Bayesian method based on the Gaussian process method. We prove its statistical convergence rate without assuming any strong convexity, such as restricted strong convexity. Remarkably, it is shown that our convergence rate achieves the minimax optimal rate. We apply our proposed method to multi-task learning and show that our method significantly outperforms existing methods through numerical experiments on real-world data sets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kanagawa16.html
  PDF: http://proceedings.mlr.press/v48/kanagawa16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kanagawa16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Heishiro
    family: Kanagawa
  - given: Taiji
    family: Suzuki
  - given: Hayato
    family: Kobayashi
  - given: Nobuyuki
    family: Shimizu
  - given: Yukihiro
    family: Tagami
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1632-1641
  id: kanagawa16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1632
  lastpage: 1641
  published: 2016-06-11 00:00:00 +0000
- title: 'No-Regret Algorithms for Heavy-Tailed Linear Bandits'
  abstract: 'We analyze the problem of linear bandits under heavy tailed noise. Most of of the work on linear bandits has been based on the assumption of bounded or sub-Gaussian noise. However, this assumption is often violated in common scenarios such as financial markets. We present two algorithms to tackle this problem: one based on dynamic truncation and one based on a median of means estimator. We show that, when the noise admits admits only a 1 + εmoment, these algorithms are still able to achieve regret in \widetildeO(T^\frac2 + ε2(1 + ε)) and \widetildeO(T^\frac1+ 2ε1 + 3 ε) respectively. In particular, they guarantee sublinear regret as long as the noise has finite variance. We also present empirical results showing that our algorithms achieve a better performance than the current state of the art for bounded noise when the L_∞bound on the noise is large yet the 1 + εmoment of the noise is small.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/medina16.html
  PDF: http://proceedings.mlr.press/v48/medina16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-medina16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Andres Munoz
    family: Medina
  - given: Scott
    family: Yang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1642-1650
  id: medina16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1642
  lastpage: 1650
  published: 2016-06-11 00:00:00 +0000
- title: 'Extended and Unscented Kitchen Sinks'
  abstract: 'We propose a scalable multiple-output generalization of unscented and extended Gaussian processes. These algorithms have been designed to handle general likelihood models by linearizing them using a Taylor series or the Unscented Transform in a variational inference framework. We build upon random feature approximations of Gaussian process covariance functions and show that, on small-scale single-task problems, our methods can attain similar performance as the original algorithms while having less computational cost. We also evaluate our methods at a larger scale on MNIST and on a seismic inversion which is inherently a multi-task problem.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bonilla16.html
  PDF: http://proceedings.mlr.press/v48/bonilla16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bonilla16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Edwin
    family: Bonilla
  - given: Daniel
    family: Steinberg
  - given: Alistair
    family: Reid
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1651-1659
  id: bonilla16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1651
  lastpage: 1659
  published: 2016-06-11 00:00:00 +0000
- title: 'Matrix Eigen-decomposition via Doubly Stochastic Riemannian Optimization'
  abstract: 'Matrix eigen-decomposition is a classic and long-standing problem that plays a fundamental role in scientific computing and machine learning. Despite some existing algorithms for this inherently non-convex problem, the study remains inadequate for the need of large data nowadays. To address this gap, we propose a Doubly Stochastic Riemannian Gradient EIGenSolver, DSRG-EIGS, where the double stochasticity comes from the generalization of the stochastic Euclidean gradient ascent and the stochastic Euclidean coordinate ascent to Riemannian manifolds. As a result, it induces a greatly reduced complexity per iteration, enables the algorithm to completely avoid the matrix inversion, and consequently makes it well-suited to large-scale applications. We theoretically analyze its convergence properties and empirically validate it on real-world datasets. Encouraging experimental results demonstrate its advantages over the deterministic counterparts.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/xub16.html
  PDF: http://proceedings.mlr.press/v48/xub16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-xub16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhiqiang
    family: Xu
  - given: Peilin
    family: Zhao
  - given: Jianneng
    family: Cao
  - given: Xiaoli
    family: Li
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1660-1669
  id: xub16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1660
  lastpage: 1669
  published: 2016-06-11 00:00:00 +0000
- title: 'Recommendations as Treatments: Debiasing Learning and Evaluation'
  abstract: 'Most data for evaluating and training recommender systems is subject to selection biases, either through self-selection by the users or through the actions of the recommendation system itself. In this paper, we provide a principled approach to handle selection biases by adapting models and estimation techniques from causal inference. The approach leads to unbiased performance estimators despite biased data, and to a matrix factorization method that provides substantially improved prediction performance on real-world data. We theoretically and empirically characterize the robustness of the approach, and find that it is highly practical and scalable.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/schnabel16.html
  PDF: http://proceedings.mlr.press/v48/schnabel16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-schnabel16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tobias
    family: Schnabel
  - given: Adith
    family: Swaminathan
  - given: Ashudeep
    family: Singh
  - given: Navin
    family: Chandak
  - given: Thorsten
    family: Joachims
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1670-1679
  id: schnabel16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1670
  lastpage: 1679
  published: 2016-06-11 00:00:00 +0000
- title: 'ForecastICU: A Prognostic Decision Support System for Timely Prediction of Intensive Care Unit Admission'
  abstract: 'We develop ForecastICU: a prognostic decision support system that monitors hospitalized patients and prompts alarms for intensive care unit (ICU) admissions. ForecastICU is first trained in an offline stage by constructing a Bayesian belief system that corresponds to its belief about how trajectories of physiological data streams of the patient map to a clinical status. After that, ForecastICU monitors a new patient in real-time by observing her physiological data stream, updating its belief about her status over time, and prompting an alarm whenever its belief process hits a predefined threshold (confidence). Using a real-world dataset obtained from UCLA Ronald Reagan Medical Center, we show that ForecastICU can predict ICU admissions 9 hours before a physician’s decision (for a sensitivity of 40% and a precision of 50%). Also, ForecastICU performs consistently better than other state-of-the-art machine learning algorithms in terms of sensitivity, precision, and timeliness: it can predict ICU admissions 3 hours earlier, and offers a 7.8% gain in sensitivity and a 5.1% gain in precision compared to the best state-of-the-art algorithm. Moreover, ForecastICU offers an area under curve (AUC) gain of 22.3% compared to the Rothman index, which is the currently deployed technology in most hospital wards.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/yoon16.html
  PDF: http://proceedings.mlr.press/v48/yoon16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-yoon16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jinsung
    family: Yoon
  - given: Ahmed
    family: Alaa
  - given: Scott
    family: Hu
  - given: Mihaela
    family: Schaar
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1680-1689
  id: yoon16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1680
  lastpage: 1689
  published: 2016-06-11 00:00:00 +0000
- title: 'An optimal algorithm for the Thresholding Bandit Problem'
  abstract: 'We study a specific combinatorial pure exploration stochastic bandit problem where the learner aims at finding the set of arms whose means are above a given threshold, up to a given precision, and for a fixed time horizon. We propose a parameter-free algorithm based on an original heuristic, and prove that it is optimal for this problem by deriving matching upper and lower bounds. To the best of our knowledge, this is the first non-trivial pure exploration setting with fixed budget for which provably optimal strategies are constructed.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/locatelli16.html
  PDF: http://proceedings.mlr.press/v48/locatelli16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-locatelli16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Andrea
    family: Locatelli
  - given: Maurilio
    family: Gutzeit
  - given: Alexandra
    family: Carpentier
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1690-1698
  id: locatelli16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1690
  lastpage: 1698
  published: 2016-06-11 00:00:00 +0000
- title: 'Fast Parameter Inference in Nonlinear Dynamical Systems using Iterative Gradient Matching'
  abstract: 'Parameter inference in mechanistic models of coupled differential equations is a topical and challenging problem. We propose a new method based on kernel ridge regression and gradient matching, and an objective function that simultaneously encourages goodness of fit and penalises inconsistencies with the differential equations. Fast minimisation is achieved by exploiting partial convexity inherent in this function, and setting up an iterative algorithm in the vein of the EM algorithm. An evaluation of the proposed method on various benchmark data suggests that it compares favourably with state-of-the-art alternatives.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/niu16.html
  PDF: http://proceedings.mlr.press/v48/niu16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-niu16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mu
    family: Niu
  - given: Simon
    family: Rogers
  - given: Maurizio
    family: Filippone
  - given: Dirk
    family: Husmeier
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1699-1707
  id: niu16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1699
  lastpage: 1707
  published: 2016-06-11 00:00:00 +0000
- title: 'Structured and Efficient Variational Deep Learning with Matrix Gaussian Posteriors'
  abstract: 'We introduce a variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices. Specifically, we employ a matrix variate Gaussian (Gupta & Nagar ’99) parameter posterior distribution where we explicitly model the covariance among the input and output dimensions of each layer. Furthermore, with approximate covariance matrices we can achieve a more efficient way to represent those correlations that is also cheaper than fully factorized parameter posteriors. We further show that with the “local reprarametrization trick" (Kingma & Welling ’15) on this posterior distribution we arrive at a Gaussian Process (Rasmussen ’06) interpretation of the hidden units in each layer and we, similarly with (Gal & Ghahramani ’15), provide connections with deep Gaussian processes. We continue in taking advantage of this duality and incorporate “pseudo-data” (Snelson & Ghahramani ’05) in our model, which in turn allows for more efficient posterior sampling while maintaining the properties of the original model. The validity of the proposed approach is verified through extensive experiments.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/louizos16.html
  PDF: http://proceedings.mlr.press/v48/louizos16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-louizos16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Christos
    family: Louizos
  - given: Max
    family: Welling
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1708-1716
  id: louizos16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1708
  lastpage: 1716
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning Granger Causality for Hawkes Processes'
  abstract: 'Learning Granger causality for general point processes is a very challenging task. We propose an effective method learning Granger causality for a special but significant type of point processes — Hawkes processes. Focusing on Hawkes processes, we reveal the relationship between Hawkes process’s impact functions and its Granger causality graph. Specifically, our model represents impact functions using a series of basis functions and recovers the Granger causality graph via group sparsity of the impact functions’ coefficients. We propose an effective learning algorithm combining a maximum likelihood estimator (MLE) with a sparse-group-lasso (SGL) regularizer. Additionally, the pairwise similarity between the dimensions of the process is considered when their clustering structure is available. We analyze our learning method and discuss the selection of the basis functions. Experiments on synthetic data and real-world data show that our method can learn the Granger causality graph and the triggering patterns of Hawkes processes simultaneously.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/xuc16.html
  PDF: http://proceedings.mlr.press/v48/xuc16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-xuc16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hongteng
    family: Xu
  - given: Mehrdad
    family: Farajtabar
  - given: Hongyuan
    family: Zha
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1717-1726
  id: xuc16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1717
  lastpage: 1726
  published: 2016-06-11 00:00:00 +0000
- title: 'Neural Variational Inference for Text Processing'
  abstract: 'Recent advances in neural variational inference have spawned a renaissance in deep latent variable models. In this paper we introduce a generic variational inference framework for generative and conditional models of text. While traditional variational methods derive an analytic approximation for the intractable distributions over latent variables, here we construct an inference network conditioned on the discrete text input to provide the variational distribution. We validate this framework on two very different text modelling applications, generative document modelling and supervised question answering. Our neural variational document model combines a continuous stochastic document representation with a bag-of-words generative model and achieves the lowest reported perplexities on two standard test corpora. The neural answer selection model employs a stochastic representation layer within an attention mechanism to extract the semantics between a question and answer pair. On two question answering benchmarks this model exceeds all previous published benchmarks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/miao16.html
  PDF: http://proceedings.mlr.press/v48/miao16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-miao16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yishu
    family: Miao
  - given: Lei
    family: Yu
  - given: Phil
    family: Blunsom
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1727-1736
  id: miao16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1727
  lastpage: 1736
  published: 2016-06-11 00:00:00 +0000
- title: 'Dictionary Learning for Massive Matrix Factorization'
  abstract: 'Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising. Its applicability to large datasets has been addressed with online and randomized methods, that reduce the complexity in one of the matrix dimension, but not in both of them. In this paper, we tackle very large matrices in both dimensions. We propose a new factorization method that scales gracefully to terabyte-scale datasets. Those could not be processed by previous algorithms in a reasonable amount of time. We demonstrate the efficiency of our approach on massive functional Magnetic Resonance Imaging (fMRI) data, and on matrix completion problems for recommender systems, where we obtain significant speed-ups compared to state-of-the art coordinate descent methods.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/mensch16.html
  PDF: http://proceedings.mlr.press/v48/mensch16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-mensch16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Arthur
    family: Mensch
  - given: Julien
    family: Mairal
  - given: Bertrand
    family: Thirion
  - given: Gael
    family: Varoquaux
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1737-1746
  id: mensch16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1737
  lastpage: 1746
  published: 2016-06-11 00:00:00 +0000
- title: 'Pixel Recurrent Neural Networks'
  abstract: 'Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/oord16.html
  PDF: http://proceedings.mlr.press/v48/oord16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-oord16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Aäron
    prefix: van den
    family: Oord
  - given: Nal
    family: Kalchbrenner
  - given: Koray
    family: Kavukcuoglu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1747-1756
  id: oord16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1747
  lastpage: 1756
  published: 2016-06-11 00:00:00 +0000
- title: 'Why Most Decisions Are Easy in Tetris—And Perhaps in Other Sequential Decision Problems, As Well'
  abstract: 'We examined the sequence of decision problems that are encountered in the game of Tetris and found that most of the problems are easy in the following sense: One can choose well among the available actions without knowing an evaluation function that scores well in the game. This is a consequence of three conditions that are prevalent in the game: simple dominance, cumulative dominance, and noncompensation. These conditions can be exploited to develop faster and more effective learning algorithms. In addition, they allow certain types of domain knowledge to be incorporated with ease into a learning algorithm. Among the sequential decision problems we encounter, it is unlikely that Tetris is unique or rare in having these properties.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/simsek16.html
  PDF: http://proceedings.mlr.press/v48/simsek16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-simsek16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Özgür
    family: Şimşek
  - given: Simón
    family: Algorta
  - given: Amit
    family: Kothiyal
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1757-1765
  id: simsek16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1757
  lastpage: 1765
  published: 2016-06-11 00:00:00 +0000
- title: 'Gaussian quadrature for matrix inverse forms with applications'
  abstract: 'We present a framework for accelerating a spectrum of machine learning algorithms that require computation of \emphbilinear inverse forms u^T A^-1u, where A is a positive definite matrix and u a given vector. Our framework is built on Gauss-type quadrature and easily scales to large, sparse matrices. Further, it allows retrospective computation of lower and upper bounds on u^T A^-1u, which in turn accelerates several algorithms. We prove that these bounds tighten iteratively and converge at a linear (geometric) rate. To our knowledge, ours is the first work to demonstrate these key properties of Gauss-type quadrature, which is a classical and deeply studied topic. We illustrate empirical consequences of our results by using quadrature to accelerate machine learning tasks involving determinantal point processes and submodular optimization, and observe tremendous speedups in several instances.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lig16.html
  PDF: http://proceedings.mlr.press/v48/lig16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lig16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chengtao
    family: Li
  - given: Suvrit
    family: Sra
  - given: Stefanie
    family: Jegelka
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1766-1775
  id: lig16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1766
  lastpage: 1775
  published: 2016-06-11 00:00:00 +0000
- title: 'Train and Test Tightness of LP Relaxations in Structured Prediction'
  abstract: 'Structured prediction is used in areas such as computer vision and natural language processing to predict structured outputs such as segmentations or parse trees. In these settings, prediction is performed by MAP inference or, equivalently, by solving an integer linear program. Because of the complex scoring functions required to obtain accurate predictions, both learning and inference typically require the use of approximate solvers. We propose a theoretical explanation to the striking observation that approximations based on linear programming (LP) relaxations are often tight on real-world instances. In particular, we show that learning with LP relaxed inference encourages integrality of training instances, and that tightness generalizes from train to test data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/meshi16.html
  PDF: http://proceedings.mlr.press/v48/meshi16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-meshi16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ofer
    family: Meshi
  - given: Mehrdad
    family: Mahdavi
  - given: Adrian
    family: Weller
  - given: David
    family: Sontag
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1776-1785
  id: meshi16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1776
  lastpage: 1785
  published: 2016-06-11 00:00:00 +0000
- title: 'Stochastic Optimization for Multiview Representation Learning using Partial Least Squares'
  abstract: 'Partial Least Squares (PLS) is a ubiquitous statistical technique for bilinear factor analysis. It is used in many data analysis, machine learning, and information retrieval applications to model the covariance structure between a pair of data matrices. In this paper, we consider PLS for representation learning in a multiview setting where we have more than one view in data at training time. Furthermore, instead of framing PLS as a problem about a fixed given data set, we argue that PLS should be studied as a stochastic optimization problem, especially in a "big data" setting, with the goal of optimizing a population objective based on sample. This view suggests using Stochastic Approximation (SA) approaches, such as Stochastic Gradient Descent (SGD) and enables a rigorous analysis of their benefits. In this paper, we develop SA approaches to PLS and provide iteration complexity bounds for the proposed algorithms.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/aroraa16.html
  PDF: http://proceedings.mlr.press/v48/aroraa16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-aroraa16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Raman
    family: Arora
  - given: Poorya
    family: Mianjy
  - given: Teodor
    family: Marinov
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1786-1794
  id: aroraa16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1786
  lastpage: 1794
  published: 2016-06-11 00:00:00 +0000
- title: 'Hierarchical Compound Poisson Factorization'
  abstract: 'Non-negative matrix factorization models based on a hierarchical Gamma-Poisson structure capture user and item behavior effectively in extremely sparse data sets, making them the ideal choice for collaborative filtering applications. Hierarchical Poisson factorization (HPF) in particular has proved successful for scalable recommendation systems with extreme sparsity. HPF, however, suffers from a tight coupling of sparsity model (absence of a rating) and response model (the value of the rating), which limits the expressiveness of the latter. Here, we introduce hierarchical compound Poisson factorization (HCPF) that has the favorable Gamma-Poisson structure and scalability of HPF to high-dimensional extremely sparse matrices. More importantly, HCPF decouples the sparsity model from the response model, allowing us to choose the most suitable distribution for the response. HCPF can capture binary, non-negative discrete, non-negative continuous, and zero-inflated continuous responses. We compare HCPF with HPF on nine discrete and three continuous data sets and conclude that HCPF captures the relationship between sparsity and response better than HPF.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/basbug16.html
  PDF: http://proceedings.mlr.press/v48/basbug16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-basbug16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mehmet
    family: Basbug
  - given: Barbara
    family: Engelhardt
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1795-1803
  id: basbug16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1795
  lastpage: 1803
  published: 2016-06-11 00:00:00 +0000
- title: 'Opponent Modeling in Deep Reinforcement Learning'
  abstract: 'Opponent modeling is necessary in multi-agent settings where secondary agents with competing goals also adapt their strategies, yet it remains challenging because of strategies’ complex interaction and the non-stationary nature. Most previous work focuses on developing probabilistic models or parameterized strategies for specific applications. Inspired by the recent success of deep reinforcement learning, we present neural-based models that jointly learn a policy and the behavior of opponents. Instead of explicitly predicting the opponent’s action, we encode observation of the opponents into a deep Q-Network (DQN), while retaining explicit modeling under multitasking. By using a Mixture-of-Experts architecture, our model automatically discovers different strategy patterns of opponents even without extra supervision. We evaluate our models on a simulated soccer game and a popular trivia game, showing superior performance over DQN and its variants.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/he16.html
  PDF: http://proceedings.mlr.press/v48/he16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-he16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: He
    family: He
  - given: Jordan
    family: Boyd-Graber
  - given: Kevin
    family: Kwok
  - given: Hal
    family: Daumé
    suffix: III
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1804-1813
  id: he16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1804
  lastpage: 1813
  published: 2016-06-11 00:00:00 +0000
- title: 'No penalty no tears: Least squares in high-dimensional linear models'
  abstract: 'Ordinary least squares (OLS) is the default method for fitting linear models, but is not applicable for problems with dimensionality larger than the sample size. For these problems, we advocate the use of a generalized version of OLS motivated by ridge regression, and propose two novel three-step algorithms involving least squares fitting and hard thresholding. The algorithms are methodologically simple to understand intuitively, computationally easy to implement efficiently, and theoretically appealing for choosing models consistently. Numerical exercises comparing our methods with penalization-based approaches in simulations and data analyses illustrate the great potential of the proposed algorithms.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wange16.html
  PDF: http://proceedings.mlr.press/v48/wange16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wange16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Xiangyu
    family: Wang
  - given: David
    family: Dunson
  - given: Chenlei
    family: Leng
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1814-1822
  id: wange16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1814
  lastpage: 1822
  published: 2016-06-11 00:00:00 +0000
- title: 'SDNA: Stochastic Dual Newton Ascent for Empirical Risk Minimization'
  abstract: 'We propose a new algorithm for minimizing regularized empirical loss: Stochastic Dual Newton Ascent (SDNA). Our method is dual in nature: in each iteration we update a random subset of the dual variables. However, unlike existing methods such as stochastic dual coordinate ascent, SDNA is capable of utilizing all local curvature information contained in the examples, which leads to striking improvements in both theory and practice – sometimes by orders of magnitude. In the special case when an L2-regularizer is used in the primal, the dual problem is a concave quadratic maximization problem plus a separable term. In this regime, SDNA in each step solves a proximal subproblem involving a random principal submatrix of the Hessian of the quadratic function; whence the name of the method.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/qub16.html
  PDF: http://proceedings.mlr.press/v48/qub16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-qub16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zheng
    family: Qu
  - given: Peter
    family: Richtarik
  - given: Martin
    family: Takac
  - given: Olivier
    family: Fercoq
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1823-1832
  id: qub16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1823
  lastpage: 1832
  published: 2016-06-11 00:00:00 +0000
- title: 'On Graduated Optimization for Stochastic Non-Convex Problems'
  abstract: 'The graduated optimization approach, also known as the continuation method, is a popular heuristic to solving non-convex problems that has received renewed interest over the last decade.Despite being popular, very little is known in terms of its theoretical convergence analysis. In this paper we describe a new first-order algorithm based on graduated optimization and analyze its performance. We characterize a family of non-convex functions for which this algorithm provably converges to a global optimum. In particular, we prove that the algorithm converges to an ε-approximate solution within O(1 / ε^2) gradient-based steps. We extend our algorithm and analysis to the setting of stochastic non-convex optimization with noisy gradient feedback, attaining the same convergence rate. Additionally, we discuss the setting of “zero-order optimization", and devise a variant of our algorithm which converges at rate of O(d^2/ ε^4).'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hazanb16.html
  PDF: http://proceedings.mlr.press/v48/hazanb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hazanb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Elad
    family: Hazan
  - given: Kfir Yehuda
    family: Levy
  - given: Shai
    family: Shalev-Shwartz
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1833-1841
  id: hazanb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1833
  lastpage: 1841
  published: 2016-06-11 00:00:00 +0000
- title: 'Meta-Learning with Memory-Augmented Neural Networks'
  abstract: 'Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of "one-shot learning." Traditional gradient-based networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently relearn their parameters to adequately incorporate the new information without catastrophic interference. Architectures with augmented memory capacities, such as Neural Turing Machines (NTMs), offer the ability to quickly encode and retrieve new information, and hence can potentially obviate the downsides of conventional models. Here, we demonstrate the ability of a memory-augmented neural network to rapidly assimilate new data, and leverage this data to make accurate predictions after only a few samples. We also introduce a new method for accessing an external memory that focuses on memory content, unlike previous methods that additionally use memory location-based focusing mechanisms.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/santoro16.html
  PDF: http://proceedings.mlr.press/v48/santoro16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-santoro16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Adam
    family: Santoro
  - given: Sergey
    family: Bartunov
  - given: Matthew
    family: Botvinick
  - given: Daan
    family: Wierstra
  - given: Timothy
    family: Lillicrap
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1842-1850
  id: santoro16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1842
  lastpage: 1850
  published: 2016-06-11 00:00:00 +0000
- title: 'The knockoff filter for FDR control in group-sparse and multitask regression'
  abstract: 'We propose the group knockoff filter, a method for false discovery rate control in a linear regression setting where the features are grouped, and we would like to select a set of relevant groups which have a nonzero effect on the response. By considering the set of true and false discoveries at the group level, this method gains power relative to sparse regression methods. We also apply our method to the multitask regression problem where multiple response variables share similar sparsity patterns across the set of possible features. Empirically, the group knockoff filter successfully controls false discoveries at the group level in both settings, with substantially more discoveries made by leveraging the group structure.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/daia16.html
  PDF: http://proceedings.mlr.press/v48/daia16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-daia16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ran
    family: Dai
  - given: Rina
    family: Barber
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1851-1859
  id: daia16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1851
  lastpage: 1859
  published: 2016-06-11 00:00:00 +0000
- title: 'Softened Approximate Policy Iteration for Markov Games'
  abstract: 'This paper reports theoretical and empirical investigations on the use of quasi-Newton methods to minimize the Optimal Bellman Residual (OBR) of zero-sum two-player Markov Games. First, it reveals that state-of-the-art algorithms can be derived by the direct application of Newton’s method to different norms of the OBR. More precisely, when applied to the norm of the OBR, Newton’s method results in the Bellman Residual Minimization Policy Iteration (BRMPI) and, when applied to the norm of the Projected OBR (POBR), it results into the standard Least Squares Policy Iteration (LSPI) algorithm. Consequently, new algorithms are proposed, making use of quasi-Newton methods to minimize the OBR and the POBR so as to take benefit of enhanced empirical performances at low cost. Indeed, using a quasi-Newton method approach introduces slight modifications in term of coding of LSPI and BRMPI but improves significantly both the stability and the performance of those algorithms. These phenomena are illustrated on an experiment conducted on artificially constructed games called Garnets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/perolat16.html
  PDF: http://proceedings.mlr.press/v48/perolat16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-perolat16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Julien
    family: Pérolat
  - given: Bilal
    family: Piot
  - given: Matthieu
    family: Geist
  - given: Bruno
    family: Scherrer
  - given: Olivier
    family: Pietquin
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1860-1868
  id: perolat16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1860
  lastpage: 1868
  published: 2016-06-11 00:00:00 +0000
- title: 'Stochastic Block BFGS: Squeezing More Curvature out of Data'
  abstract: 'We propose a novel limited-memory stochastic block BFGS update for incorporating enriched curvature information in stochastic approximation methods. In our method, the estimate of the inverse Hessian matrix that is maintained by it, is updated at each iteration using a sketch of the Hessian, i.e., a randomly generated compressed form of the Hessian. We propose several sketching strategies, present a new quasi-Newton method that uses stochastic block BFGS updates combined with the variance reduction approach SVRG to compute batch stochastic gradients, and prove linear convergence of the resulting method. Numerical tests on large-scale logistic regression problems reveal that our method is more robust and substantially outperforms current state-of-the-art methods.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gower16.html
  PDF: http://proceedings.mlr.press/v48/gower16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gower16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Robert
    family: Gower
  - given: Donald
    family: Goldfarb
  - given: Peter
    family: Richtarik
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1869-1878
  id: gower16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1869
  lastpage: 1878
  published: 2016-06-11 00:00:00 +0000
- title: 'Differential Geometric Regularization for Supervised Learning of Classifiers'
  abstract: 'We study the problem of supervised learning for both binary and multiclass classification from a unified geometric perspective. In particular, we propose a geometric regularization technique to find the submanifold corresponding to an estimator of the class probability P(y|\vec x). The regularization term measures the volume of this submanifold, based on the intuition that overfitting produces rapid local oscillations and hence large volume of the estimator. This technique can be applied to regularize any classification function that satisfies two requirements: firstly, an estimator of the class probability can be obtained; secondly, first and second derivatives of the class probability estimator can be calculated. In experiments, we apply our regularization technique to standard loss functions for classification, our RBF-based implementation compares favorably to widely used regularization methods for both binary and multiclass classification.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/baia16.html
  PDF: http://proceedings.mlr.press/v48/baia16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-baia16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Qinxun
    family: Bai
  - given: Steven
    family: Rosenberg
  - given: Zheng
    family: Wu
  - given: Stan
    family: Sclaroff
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1879-1888
  id: baia16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1879
  lastpage: 1888
  published: 2016-06-11 00:00:00 +0000
- title: 'Exploiting Cyclic Symmetry in Convolutional Neural Networks'
  abstract: 'Many classes of images exhibit rotational symmetry. Convolutional neural networks are sometimes trained using data augmentation to exploit this, but they are still required to learn the rotation equivariance properties from the data. Encoding these properties into the network architecture, as we are already used to doing for translation equivariance by using convolutional layers, could result in a more efficient use of the parameter budget by relieving the model from learning them. We introduce four operations which can be inserted into neural network models as layers, and which can be combined to make these models partially equivariant to rotations. They also enable parameter sharing across different orientations. We evaluate the effect of these architectural modifications on three datasets which exhibit rotational symmetry and demonstrate improved performance with smaller models.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/dieleman16.html
  PDF: http://proceedings.mlr.press/v48/dieleman16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-dieleman16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sander
    family: Dieleman
  - given: Jeffrey De
    family: Fauw
  - given: Koray
    family: Kavukcuoglu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1889-1898
  id: dieleman16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1889
  lastpage: 1898
  published: 2016-06-11 00:00:00 +0000
- title: 'Graying the black box: Understanding DQNs'
  abstract: 'In recent years there is a growing interest in using deep representations for reinforcement learning. In this paper, we present a methodology and tools to analyze Deep Q-networks (DQNs) in a non-blind matter. Using our tools we reveal that the features learned by DQNs aggregate the state space in a hierarchical fashion, explaining its success. Moreover we are able to understand and describe the policies learned by DQNs for three different Atari2600 games and suggest ways to interpret, debug and optimize of deep neural networks in Reinforcement Learning.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zahavy16.html
  PDF: http://proceedings.mlr.press/v48/zahavy16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zahavy16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tom
    family: Zahavy
  - given: Nir
    family: Ben-Zrihem
  - given: Shie
    family: Mannor
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1899-1908
  id: zahavy16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1899
  lastpage: 1908
  published: 2016-06-11 00:00:00 +0000
- title: 'The Sum-Product Theorem: A Foundation for Learning Tractable Models'
  abstract: 'Inference in expressive probabilistic models is generally intractable, which makes them difficult to learn and limits their applicability. Sum-product networks are a class of deep models where, surprisingly, inference remains tractable even when an arbitrary number of hidden layers are present. In this paper, we generalize this result to a much broader set of learning problems: all those where inference consists of summing a function over a semiring. This includes satisfiability, constraint satisfaction, optimization, integration, and others. In any semiring, for summation to be tractable it suffices that the factors of every product have disjoint scopes. This unifies and extends many previous results in the literature. Enforcing this condition at learning time thus ensures that the learned models are tractable. We illustrate the power and generality of this approach by applying it to a new type of structured prediction problem: learning a nonconvex function that can be globally optimized in polynomial time. We show empirically that this greatly outperforms the standard approach of learning without regard to the cost of optimization.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/friesen16.html
  PDF: http://proceedings.mlr.press/v48/friesen16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-friesen16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Abram
    family: Friesen
  - given: Pedro
    family: Domingos
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1909-1918
  id: friesen16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1909
  lastpage: 1918
  published: 2016-06-11 00:00:00 +0000
- title: 'Pareto Frontier Learning with Expensive Correlated Objectives'
  abstract: 'There has been a surge of research interest in developing tools and analysis for Bayesian optimization, the task of finding the global maximizer of an unknown, expensive function through sequential evaluation using Bayesian decision theory. However, many interesting problems involve optimizing multiple, expensive to evaluate objectives simultaneously, and relatively little research has addressed this setting from a Bayesian theoretic standpoint. A prevailing choice when tackling this problem, is to model the multiple objectives as being independent, typically for ease of computation. In practice, objectives are correlated to some extent. In this work, we incorporate the modelling of inter-task correlations, developing an approximation to overcome intractable integrals. We illustrate the power of modelling dependencies between objectives on a range of synthetic and real world multi-objective optimization problems.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shahc16.html
  PDF: http://proceedings.mlr.press/v48/shahc16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shahc16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Amar
    family: Shah
  - given: Zoubin
    family: Ghahramani
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1919-1927
  id: shahc16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1919
  lastpage: 1927
  published: 2016-06-11 00:00:00 +0000
- title: 'Asynchronous Methods for Deep Reinforcement Learning'
  abstract: 'We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/mniha16.html
  PDF: http://proceedings.mlr.press/v48/mniha16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-mniha16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Volodymyr
    family: Mnih
  - given: Adria Puigdomenech
    family: Badia
  - given: Mehdi
    family: Mirza
  - given: Alex
    family: Graves
  - given: Timothy
    family: Lillicrap
  - given: Tim
    family: Harley
  - given: David
    family: Silver
  - given: Koray
    family: Kavukcuoglu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1928-1937
  id: mniha16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1928
  lastpage: 1937
  published: 2016-06-11 00:00:00 +0000
- title: 'A Simple and Strongly-Local Flow-Based Method for Cut Improvement'
  abstract: 'Many graph-based learning problems can be cast as finding a good set of vertices nearby a seed set, and a powerful methodology for these problems is based on minimum cuts and maximum flows. We introduce and analyze a new method for locally-biased graph-based learning called SimpleLocal, which finds good conductance cuts near a set of seed vertices. An important feature of our algorithm is that it is strongly-local, meaning it does not need to explore the entire graph to find cuts that are locally optimal. This method is related to other strongly-local flow-based methods, but it enables a simple implementation. We also show how it achieves localization through an implicit l1-norm penalty term. As a flow-based method, our algorithm exhibits several advantages in terms of cut optimality and accurate identification of target regions in a graph. We demonstrate the power of SimpleLocal solving segmentation problems on a 467 million edge graph based on an MRI scan.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/veldt16.html
  PDF: http://proceedings.mlr.press/v48/veldt16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-veldt16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nate
    family: Veldt
  - given: David
    family: Gleich
  - given: Michael
    family: Mahoney
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1938-1947
  id: veldt16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1938
  lastpage: 1947
  published: 2016-06-11 00:00:00 +0000
- title: 'Nonlinear Statistical Learning with Truncated Gaussian Graphical Models'
  abstract: 'We introduce the truncated Gaussian graphical model (TGGM) as a novel framework for designing statistical models for nonlinear learning. A TGGM is a Gaussian graphical model (GGM) with a subset of variables truncated to be nonnegative. The truncated variables are assumed latent and integrated out to induce a marginal model. We show that the variables in the marginal model are non-Gaussian distributed and their expected relations are nonlinear. We use expectation-maximization to break the inference of the nonlinear model into a sequence of TGGM inference problems, each of which is efficiently solved by using the properties and numerical methods of multivariate Gaussian distributions. We use the TGGM to design models for nonlinear regression and classification, with the performances of these models demonstrated on extensive benchmark datasets and compared to state-of-the-art competing results.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/su16.html
  PDF: http://proceedings.mlr.press/v48/su16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-su16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Qinliang
    family: Su
  - given: Xuejun
    family: Liao
  - given: Changyou
    family: Chen
  - given: Lawrence
    family: Carin
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1948-1957
  id: su16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1948
  lastpage: 1957
  published: 2016-06-11 00:00:00 +0000
- title: 'Barron and Cover’s Theory in Supervised Learning and its Application to Lasso'
  abstract: 'We study Barron and Cover’s theory (BC theory) in supervised learning. The original BC theory can be applied to supervised learning only approximately and limitedly. Though Barron (2008) and Chatterjee and Barron (2014) succeeded in removing the approximation, their idea cannot be essentially applied to supervised learning in general. By solving this issue, we propose an extension of BC theory to supervised learning. The extended theory has several advantages inherited from the original BC theory. First, it holds for finite sample number n. Second, it requires remarkably few assumptions. Third, it gives a justification of the MDL principle in supervised learning. We also derive new risk and regret bounds of lasso with random design as its application. The derived risk bound hold for any finite n without boundedness of features in contrast to past work. Behavior of the regret bound is investigated by numerical simulations. We believe that this is the first extension of BC theory to general supervised learning without approximation.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kawakita16.html
  PDF: http://proceedings.mlr.press/v48/kawakita16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kawakita16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Masanori
    family: Kawakita
  - given: Jun’ichi
    family: Takeuchi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1958-1966
  id: kawakita16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1958
  lastpage: 1966
  published: 2016-06-11 00:00:00 +0000
- title: 'Nonparametric Canonical Correlation Analysis'
  abstract: 'Canonical correlation analysis (CCA) is a classical representation learning technique for finding correlated variables in multi-view data. Several nonlinear extensions of the original linear CCA have been proposed, including kernel and deep neural network methods. These approaches seek maximally correlated projections among families of functions, which the user specifies (by choosing a kernel or neural network structure), and are computationally demanding. Interestingly, the theory of nonlinear CCA, without functional restrictions, had been studied in the population setting by Lancaster already in the 1950s, but these results have not inspired practical algorithms. We revisit Lancaster’s theory to devise a practical algorithm for nonparametric CCA (NCCA). Specifically, we show that the solution can be expressed in terms of the singular value decomposition of a certain operator associated with the joint density of the views. Thus, by estimating the population density from data, NCCA reduces to solving an eigenvalue system, superficially like kernel CCA but, importantly, without requiring the inversion of any kernel matrix. We also derive a partially linear CCA (PLCCA) variant in which one of the views undergoes a linear projection while the other is nonparametric. Using a kernel density estimate based on a small number of nearest neighbors, our NCCA and PLCCA algorithms are memory-efficient, often run much faster, and perform better than kernel CCA and comparable to deep CCA.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/michaeli16.html
  PDF: http://proceedings.mlr.press/v48/michaeli16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-michaeli16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tomer
    family: Michaeli
  - given: Weiran
    family: Wang
  - given: Karen
    family: Livescu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1967-1976
  id: michaeli16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1967
  lastpage: 1976
  published: 2016-06-11 00:00:00 +0000
- title: 'BISTRO: An Efficient Relaxation-Based Method for Contextual Bandits'
  abstract: 'We present efficient algorithms for the problem of contextual bandits with i.i.d. covariates, an arbitrary sequence of rewards, and an arbitrary class of policies. Our algorithm BISTRO requires d calls to the empirical risk minimization (ERM) oracle per round, where d is the number of actions. The method uses unlabeled data to make the problem computationally simple. When the ERM problem itself is computationally hard, we extend the approach by employing multiplicative approximation algorithms for the ERM. The integrality gap of the relaxation only enters in the regret bound rather than the benchmark. Finally, we show that the adversarial version of the contextual bandit problem is learnable (and efficient) whenever the full-information supervised online learning problem has a non-trivial regret bound (and efficient).'
  volume: 48
  URL: https://proceedings.mlr.press/v48/rakhlin16.html
  PDF: http://proceedings.mlr.press/v48/rakhlin16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-rakhlin16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Rakhlin
  - given: Karthik
    family: Sridharan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1977-1985
  id: rakhlin16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1977
  lastpage: 1985
  published: 2016-06-11 00:00:00 +0000
- title: 'Associative Long Short-Term Memory'
  abstract: 'We investigate a new method to augment recurrent neural networks with extra memory without increasing the number of network parameters. The system has an associative memory based on complex-valued vectors and is closely related to Holographic Reduced Representations and Long Short-Term Memory networks. Holographic Reduced Representations have limited capacity: as they store more information, each retrieval becomes noisier due to interference. Our system in contrast creates redundant copies of stored information, which enables retrieval with reduced noise. Experiments demonstrate faster learning on multiple memorization tasks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/danihelka16.html
  PDF: http://proceedings.mlr.press/v48/danihelka16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-danihelka16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ivo
    family: Danihelka
  - given: Greg
    family: Wayne
  - given: Benigno
    family: Uria
  - given: Nal
    family: Kalchbrenner
  - given: Alex
    family: Graves
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1986-1994
  id: danihelka16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1986
  lastpage: 1994
  published: 2016-06-11 00:00:00 +0000
- title: 'Dueling Network Architectures for Deep Reinforcement Learning'
  abstract: 'In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we present a new neural network architecture for model-free reinforcement learning. Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. Moreover, the dueling architecture enables our RL agent to outperform the state-of-the-art on the Atari 2600 domain.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wangf16.html
  PDF: http://proceedings.mlr.press/v48/wangf16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wangf16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ziyu
    family: Wang
  - given: Tom
    family: Schaul
  - given: Matteo
    family: Hessel
  - given: Hado
    family: Hasselt
  - given: Marc
    family: Lanctot
  - given: Nando
    family: Freitas
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 1995-2003
  id: wangf16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 1995
  lastpage: 2003
  published: 2016-06-11 00:00:00 +0000
- title: 'Persistence weighted Gaussian kernel for topological data analysis'
  abstract: 'Topological data analysis (TDA) is an emerging mathematical concept for characterizing shapes in complex data. In TDA, persistence diagrams are widely recognized as a useful descriptor of data, and can distinguish robust and noisy topological properties. This paper proposes a kernel method on persistence diagrams to develop a statistical framework in TDA. The proposed kernel satisfies the stability property and provides explicit control on the effect of persistence. Furthermore, the method allows a fast approximation technique. The method is applied into practical data on proteins and oxide glasses, and the results show the advantage of our method compared to other relevant methods on persistence diagrams.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kusano16.html
  PDF: http://proceedings.mlr.press/v48/kusano16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kusano16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Genki
    family: Kusano
  - given: Yasuaki
    family: Hiraoka
  - given: Kenji
    family: Fukumizu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2004-2013
  id: kusano16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2004
  lastpage: 2013
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning Convolutional Neural Networks for Graphs'
  abstract: 'Numerous important problems can be framed as learning from graph data. We propose a framework for learning convolutional neural networks for arbitrary graphs. These graphs may be undirected, directed, and with both discrete and continuous node and edge attributes. Analogous to image-based convolutional networks that operate on locally connected regions of the input, we present a general approach to extracting locally connected regions from graphs. Using established benchmark data sets, we demonstrate that the learned feature representations are competitive with state of the art graph kernels and that their computation is highly efficient.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/niepert16.html
  PDF: http://proceedings.mlr.press/v48/niepert16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-niepert16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mathias
    family: Niepert
  - given: Mohamed
    family: Ahmed
  - given: Konstantin
    family: Kutzkov
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2014-2023
  id: niepert16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2014
  lastpage: 2023
  published: 2016-06-11 00:00:00 +0000
- title: 'Persistent RNNs: Stashing Recurrent Weights On-Chip'
  abstract: 'This paper introduces a new technique for mapping Deep Recurrent Neural Networks (RNN) efficiently onto GPUs. We show how it is possi- ble to achieve substantially higher computational throughput at low mini-batch sizes than direct implementations of RNNs based on matrix multiplications. The key to our approach is the use of persistent computational kernels that exploit the GPU’s inverted memory hierarchy to reuse network weights over multiple timesteps. Our initial implementation sustains 2.8 TFLOP/s at a mini-batch size of 4 on an NVIDIA TitanX GPU. This provides a 16x reduction in activation memory footprint, enables model training with 12x more parameters on the same hardware, allows us to strongly scale RNN training to 128 GPUs, and allows us to efficiently explore end-to-end speech recognition models with over 100 layers.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/diamos16.html
  PDF: http://proceedings.mlr.press/v48/diamos16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-diamos16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Greg
    family: Diamos
  - given: Shubho
    family: Sengupta
  - given: Bryan
    family: Catanzaro
  - given: Mike
    family: Chrzanowski
  - given: Adam
    family: Coates
  - given: Erich
    family: Elsen
  - given: Jesse
    family: Engel
  - given: Awni
    family: Hannun
  - given: Sanjeev
    family: Satheesh
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2024-2033
  id: diamos16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2024
  lastpage: 2033
  published: 2016-06-11 00:00:00 +0000
- title: 'Recurrent Orthogonal Networks and Long-Memory Tasks'
  abstract: 'Although RNNs have been shown to be power- ful tools for processing sequential data, finding architectures or optimization strategies that al- low them to model very long term dependencies is still an active area of research. In this work, we carefully analyze two synthetic datasets orig- inally outlined in (Hochreiter & Schmidhuber, 1997) which are used to evaluate the ability of RNNs to store information over many time steps. We explicitly construct RNN solutions to these problems, and using these constructions, illumi- nate both the problems themselves and the way in which RNNs store different types of information in their hidden states. These constructions fur- thermore explain the success of recent methods that specify unitary initializations or constraints on the transition matrices.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/henaff16.html
  PDF: http://proceedings.mlr.press/v48/henaff16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-henaff16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mikael
    family: Henaff
  - given: Arthur
    family: Szlam
  - given: Yann
    family: LeCun
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2034-2042
  id: henaff16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2034
  lastpage: 2042
  published: 2016-06-11 00:00:00 +0000
- title: 'The Arrow of Time in Multivariate Time Series'
  abstract: 'We prove that a time series satisfying a (linear) multivariate autoregressive moving average (VARMA) model satisfies the same model assumption in the reversed time direction, too, if all innovations are normally distributed. This reversibility breaks down if the innovations are non-Gaussian. This means that under the assumption of a VARMA process with non-Gaussian noise, the arrow of time becomes detectable. Our work thereby provides a theoretic justification of an algorithm that has been used for inferring the direction of video snippets. We present a slightly modified practical algorithm that estimates the time direction for a given sample and prove its consistency. We further investigate how the performance of the algorithm depends on sample size, number of dimensions of the time series and the order of the process. An application to real world data from economics shows that considering multivariate processes instead of univariate processes can be beneficial for estimating the time direction. Our result extends earlier work on univariate time series. It relates to the concept of causal inference, where recent methods exploit non-Gaussianity of the error terms for causal structure learning.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bauer16.html
  PDF: http://proceedings.mlr.press/v48/bauer16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bauer16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Stefan
    family: Bauer
  - given: Bernhard
    family: Schölkopf
  - given: Jonas
    family: Peters
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2043-2051
  id: bauer16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2043
  lastpage: 2051
  published: 2016-06-11 00:00:00 +0000
- title: 'Mixture Proportion Estimation via Kernel Embeddings of Distributions'
  abstract: 'Mixture proportion estimation (MPE) is the problem of estimating the weight of a component distribution in a mixture, given samples from the mixture and component. This problem constitutes a key part in many "weakly supervised learning" problems like learning with positive and unlabelled samples, learning with label noise, anomaly detection and crowdsourcing. While there have been several methods proposed to solve this problem, to the best of our knowledge no efficient algorithm with a proven convergence rate towards the true proportion exists for this problem. We fill this gap by constructing a provably correct algorithm for MPE, and derive convergence rates under certain assumptions on the distribution. Our method is based on embedding distributions onto an RKHS, and implementing it only requires solving a simple convex quadratic programming problem a few times. We run our algorithm on several standard classification datasets, and demonstrate that it performs comparably to or better than other algorithms on most datasets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ramaswamy16.html
  PDF: http://proceedings.mlr.press/v48/ramaswamy16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ramaswamy16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Harish
    family: Ramaswamy
  - given: Clayton
    family: Scott
  - given: Ambuj
    family: Tewari
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2052-2060
  id: ramaswamy16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2052
  lastpage: 2060
  published: 2016-06-11 00:00:00 +0000
- title: 'Fast DPP Sampling for Nystrom with Application to Kernel Methods'
  abstract: 'The Nystrom method has long been popular for scaling up kernel methods. Its theoretical guarantees and empirical performance rely critically on the quality of the landmarks selected. We study landmark selection for Nystrom using Determinantal Point Processes (DPPs), discrete probability models that allow tractable generation of diverse samples. We prove that landmarks selected via DPPs guarantee bounds on approximation errors; subsequently, we analyze implications for kernel ridge regression. Contrary to prior reservations due to cubic complexity of DPP sampling, we show that (under certain conditions) Markov chain DPP sampling requires only linear time in the size of the data. We present several empirical results that support our theoretical analysis, and demonstrate the superior performance of DPP-based landmark selection compared with existing approaches.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lih16.html
  PDF: http://proceedings.mlr.press/v48/lih16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lih16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chengtao
    family: Li
  - given: Stefanie
    family: Jegelka
  - given: Suvrit
    family: Sra
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2061-2070
  id: lih16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2061
  lastpage: 2070
  published: 2016-06-11 00:00:00 +0000
- title: 'Complex Embeddings for Simple Link Prediction'
  abstract: 'In statistical relational learning, the link prediction problem is key to automatically understand the structure of large knowledge bases. As in previous studies, we propose to solve this problem through latent factorization. However, here we make use of complex valued embeddings. The composition of complex embeddings can handle a large variety of binary relations, among them symmetric and antisymmetric relations. Compared to state-of-the-art models such as Neural Tensor Network and Holographic Embeddings, our approach based on complex embeddings is arguably simpler, as it only uses the Hermitian dot product, the complex counterpart of the standard dot product between real vectors. Our approach is scalable to large datasets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link prediction benchmarks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/trouillon16.html
  PDF: http://proceedings.mlr.press/v48/trouillon16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-trouillon16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Théo
    family: Trouillon
  - given: Johannes
    family: Welbl
  - given: Sebastian
    family: Riedel
  - given: Eric
    family: Gaussier
  - given: Guillaume
    family: Bouchard
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2071-2080
  id: trouillon16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2071
  lastpage: 2080
  published: 2016-06-11 00:00:00 +0000
- title: 'Interactive Bayesian Hierarchical Clustering'
  abstract: 'Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user’s needs. To address this, several methods incorporate constraints obtained from users into clustering algorithms, but unfortunately do not apply to hierarchical clustering. We design an interactive Bayesian algorithm that incorporates user interaction into hierarchical clustering while still utilizing the geometry of the data by sampling a constrained posterior distribution over hierarchies. We also suggest several ways to intelligently query a user. The algorithm, along with the querying schemes, shows promising results on real data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/vikram16.html
  PDF: http://proceedings.mlr.press/v48/vikram16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-vikram16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sharad
    family: Vikram
  - given: Sanjoy
    family: Dasgupta
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2081-2090
  id: vikram16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2081
  lastpage: 2090
  published: 2016-06-11 00:00:00 +0000
- title: 'A Convolutional Attention Network for Extreme Summarization of Source Code'
  abstract: 'Attention mechanisms in neural networks have proved useful for problems in which the input and output do not have fixed dimension. Often there exist features that are locally translation invariant and would be valuable for directing the model’s attention, but previous attentional architectures are not constructed to learn such features specifically. We introduce an attentional neural network that employs convolution on the input tokens to detect local time-invariant and long-range topical attention features in a context-dependent way. We apply this architecture to the problem of extreme summarization of source code snippets into short, descriptive function name-like summaries. Using those features, the model sequentially generates a summary by marginalizing over two attention mechanisms: one that predicts the next summary token based on the attention weights of the input tokens and another that is able to copy a code token as-is directly into the summary. We demonstrate our convolutional attention neural network’s performance on 10 popular Java projects showing that it achieves better performance compared to previous attentional mechanisms.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/allamanis16.html
  PDF: http://proceedings.mlr.press/v48/allamanis16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-allamanis16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Miltiadis
    family: Allamanis
  - given: Hao
    family: Peng
  - given: Charles
    family: Sutton
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2091-2100
  id: allamanis16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2091
  lastpage: 2100
  published: 2016-06-11 00:00:00 +0000
- title: 'How to Fake Multiply by a Gaussian Matrix'
  abstract: 'Have you ever wanted to multiply an n \times d matrix X, with n ≫d, on the left by an m \times n matrix \tilde G of i.i.d. Gaussian random variables, but could not afford to do it because it was too slow? In this work we propose a new randomized m \times n matrix T, for which one can compute T ⋅X in only O(nnz(X)) + \tilde O(m^1.5 ⋅d^3) time, for which the total variation distance between the distributions T ⋅X and \tilde G ⋅X is as small as desired, i.e., less than any positive constant. Here nnz(X) denotes the number of non-zero entries of X. Assuming nnz(X) ≫m^1.5 ⋅d^3, this is a significant savings over the naïve O(nnz(X) m) time to compute \tilde G ⋅X. Moreover, since the total variation distance is small, we can provably use T ⋅X in place of \tilde G ⋅X in any application and have the same guarantees as if we were using \tilde G ⋅X, up to a small positive constant in error probability. We apply this transform to nonnegative matrix factorization (NMF) and support vector machines (SVM).'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kapralov16.html
  PDF: http://proceedings.mlr.press/v48/kapralov16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kapralov16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Michael
    family: Kapralov
  - given: Vamsi
    family: Potluru
  - given: David
    family: Woodruff
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2101-2110
  id: kapralov16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2101
  lastpage: 2110
  published: 2016-06-11 00:00:00 +0000
- title: 'Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing'
  abstract: 'Hypothesis testing is a useful statistical tool in determining whether a given model should be rejected based on a sample from the population. Sample data may contain sensitive information about individuals, such as medical information. Thus it is important to design statistical tests that guarantee the privacy of subjects in the data. In this work, we study hypothesis testing subject to differential privacy, specifically chi-squared tests for goodness of fit for multinomial data and independence between two categorical variables.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/rogers16.html
  PDF: http://proceedings.mlr.press/v48/rogers16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-rogers16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Marco
    family: Gaboardi
  - given: Hyun
    family: Lim
  - given: Ryan
    family: Rogers
  - given: Salil
    family: Vadhan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2111-2120
  id: rogers16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2111
  lastpage: 2120
  published: 2016-06-11 00:00:00 +0000
- title: 'Pliable Rejection Sampling'
  abstract: 'Rejection sampling is a technique for sampling from difficult distributions. However, its use is limited due to a high rejection rate. Common adaptive rejection sampling methods either work only for very specific distributions or without performance guarantees. In this paper, we present pliable rejection sampling (PRS), a new approach to rejection sampling, where we learn the sampling proposal using a kernel estimator. Since our method builds on rejection sampling, the samples obtained are with high probability i.i.d. and distributed according to f. Moreover, PRS comes with a guarantee on the number of accepted samples.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/erraqabi16.html
  PDF: http://proceedings.mlr.press/v48/erraqabi16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-erraqabi16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Akram
    family: Erraqabi
  - given: Michal
    family: Valko
  - given: Alexandra
    family: Carpentier
  - given: Odalric
    family: Maillard
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2121-2129
  id: erraqabi16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2121
  lastpage: 2129
  published: 2016-06-11 00:00:00 +0000
- title: 'Differentially Private Policy Evaluation'
  abstract: 'We present the first differentially private algorithms for reinforcement learning, which apply to the task of evaluating a fixed policy. We establish two approaches for achieving differential privacy, provide a theoretical analysis of the privacy and utility of the two algorithms, and show promising results on simple empirical examples.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/balle16.html
  PDF: http://proceedings.mlr.press/v48/balle16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-balle16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Borja
    family: Balle
  - given: Maziar
    family: Gomrokchi
  - given: Doina
    family: Precup
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2130-2138
  id: balle16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2130
  lastpage: 2138
  published: 2016-06-11 00:00:00 +0000
- title: 'Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning'
  abstract: 'In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have orders of magnitude lower mean squared error than existing methods—it makes more efficient use of the available data. Our new estimator is based on two advances: an extension of the doubly robust estimator (Jiang & Li, 2015), and a new way to mix between model based and importance sampling based estimates.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/thomasa16.html
  PDF: http://proceedings.mlr.press/v48/thomasa16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-thomasa16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Philip
    family: Thomas
  - given: Emma
    family: Brunskill
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2139-2148
  id: thomasa16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2139
  lastpage: 2148
  published: 2016-06-11 00:00:00 +0000
- title: 'Discrete Deep Feature Extraction: A Theory and New Architectures'
  abstract: 'First steps towards a mathematical theory of deep convolutional neural networks for feature extraction were made—for the continuous-time case—in Mallat, 2012, and Wiatowski and Bölcskei, 2015. This paper considers the discrete case, introduces new convolutional neural network architectures, and proposes a mathematical framework for their analysis. Specifically, we establish deformation and translation sensitivity results of local and global nature, and we investigate how certain structural properties of the input signal are reflected in the corresponding feature vectors. Our theory applies to general filters and general Lipschitz-continuous non-linearities and pooling operators. Experiments on handwritten digit classification and facial landmark detection—including feature importance evaluation—complement the theoretical findings.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wiatowski16.html
  PDF: http://proceedings.mlr.press/v48/wiatowski16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wiatowski16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Thomas
    family: Wiatowski
  - given: Michael
    family: Tschannen
  - given: Aleksandar
    family: Stanic
  - given: Philipp
    family: Grohs
  - given: Helmut
    family: Boelcskei
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2149-2158
  id: wiatowski16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2149
  lastpage: 2158
  published: 2016-06-11 00:00:00 +0000
- title: 'Efficient Algorithms for Adversarial Contextual Learning'
  abstract: 'We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts such that any two policies behave differently on one of the contexts in the set. Our algorithms fall into the Follow-The-Perturbed-Leader family (Kalai and Vempala, 2005) and achieve regret O(T^3/4\sqrtK\log(N)) in the transductive setting and O(T^2/3 d^3/4 K\sqrt\log(N)) in the separator setting, where T is the number of rounds, K is the number of actions, N is the number of baseline policies, and d is the size of the separator. We actually solve the more general adversarial contextual semi-bandit linear optimization problem, whilst in the full information setting we address the even more general contextual combinatorial optimization. We provide several extensions and implications of our algorithms, such as switching regret and efficient learning with predictable sequences.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/syrgkanis16.html
  PDF: http://proceedings.mlr.press/v48/syrgkanis16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-syrgkanis16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Vasilis
    family: Syrgkanis
  - given: Akshay
    family: Krishnamurthy
  - given: Robert
    family: Schapire
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2159-2168
  id: syrgkanis16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2159
  lastpage: 2168
  published: 2016-06-11 00:00:00 +0000
- title: 'Training Deep Neural Networks via Direct Loss Minimization'
  abstract: 'Supervised training of deep neural nets typically relies on minimizing cross-entropy. However, in many domains, we are interested in performing well on metrics specific to the application. In this paper we propose a direct loss minimization approach to train deep neural networks, which provably minimizes the application-specific loss function. This is often non-trivial, since these functions are neither smooth nor decomposable and thus are not amenable to optimization with standard gradient-based methods. We demonstrate the effectiveness of our approach in the context of maximizing average precision for ranking problems. Towards this goal, we develop a novel dynamic programming algorithm that can efficiently compute the weight updates. Our approach proves superior to a variety of baselines in the context of action classification and object detection, especially in the presence of label noise.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/songb16.html
  PDF: http://proceedings.mlr.press/v48/songb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-songb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yang
    family: Song
  - given: Alexander
    family: Schwing
  - given: 
    family: Richard
  - given: Raquel
    family: Urtasun
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2169-2177
  id: songb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2169
  lastpage: 2177
  published: 2016-06-11 00:00:00 +0000
- title: 'Sequence to Sequence Training of CTC-RNNs with Partial Windowing'
  abstract: 'Connectionist temporal classification (CTC) based supervised sequence training of recurrent neural networks (RNNs) has shown great success in many machine learning areas including end-to-end speech and handwritten character recognition. For the CTC training, however, it is required to unroll (or unfold) the RNN by the length of an input sequence. This unrolling requires a lot of memory and hinders a small footprint implementation of online learning or adaptation. Furthermore, the length of training sequences is usually not uniform, which makes parallel training with multiple sequences inefficient on shared memory models such as graphics processing units (GPUs). In this work, we introduce an expectation-maximization (EM) based online CTC algorithm that enables unidirectional RNNs to learn sequences that are longer than the amount of unrolling. The RNNs can also be trained to process an infinitely long input sequence without pre-segmentation or external reset. Moreover, the proposed approach allows efficient parallel training on GPUs. Our approach achieves 20.7% phoneme error rate (PER) on the very long input sequence that is generated by concatenating all 192 utterances in the TIMIT core test set. In the end-to-end speech recognition task on the Wall Street Journal corpus, a network can be trained with only 64 times of unrolling with little performance loss.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hwanga16.html
  PDF: http://proceedings.mlr.press/v48/hwanga16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hwanga16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Kyuyeon
    family: Hwang
  - given: Wonyong
    family: Sung
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2178-2187
  id: hwanga16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2178
  lastpage: 2187
  published: 2016-06-11 00:00:00 +0000
- title: 'Variational Inference for Monte Carlo Objectives'
  abstract: 'Recent progress in deep latent variable models has largely been driven by the development of flexible and scalable variational inference methods. Variational training of this type involves maximizing a lower bound on the log-likelihood, using samples from the variational posterior to compute the required gradients. Recently, Burda et al. (2016) have derived a tighter lower bound using a multi-sample importance sampling estimate of the likelihood and showed that optimizing it yields models that use more of their capacity and achieve higher likelihoods. This development showed the importance of such multi-sample objectives and explained the success of several related approaches. We extend the multi-sample approach to discrete latent variables and analyze the difficulty encountered when estimating the gradients involved. We then develop the first unbiased gradient estimator designed for importance-sampled objectives and evaluate it at training generative and structured output prediction models. The resulting estimator, which is based on low-variance per-sample learning signals, is both simpler and more effective than the NVIL estimator proposed for the single-sample variational objective, and is competitive with the currently used biased estimators.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/mnihb16.html
  PDF: http://proceedings.mlr.press/v48/mnihb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-mnihb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Andriy
    family: Mnih
  - given: Danilo
    family: Rezende
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2188-2196
  id: mnihb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2188
  lastpage: 2196
  published: 2016-06-11 00:00:00 +0000
- title: 'Hierarchical Decision Making In Electricity Grid Management'
  abstract: 'The power grid is a complex and vital system that necessitates careful reliability management. Managing the grid is a difficult problem with multiple time scales of decision making and stochastic behavior due to renewable energy generations, variable demand and unplanned outages. Solving this problem in the face of uncertainty requires a new methodology with tractable algorithms. In this work, we introduce a new model for hierarchical decision making in complex systems. We apply reinforcement learning (RL) methods to learn a proxy, i.e., a level of abstraction, for real-time power grid reliability. We devise an algorithm that alternates between slow time-scale policy improvement, and fast time-scale value function approximation. We compare our results to prevailing heuristics, and show the strength of our method.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/dalal16.html
  PDF: http://proceedings.mlr.press/v48/dalal16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-dalal16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Gal
    family: Dalal
  - given: Elad
    family: Gilboa
  - given: Shie
    family: Mannor
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2197-2206
  id: dalal16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2197
  lastpage: 2206
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning Sparse Combinatorial Representations via Two-stage Submodular Maximization'
  abstract: 'We consider the problem of learning sparse representations of data sets, where the goal is to reduce a data set in manner that optimizes multiple objectives. Motivated by applications of data summarization, we develop a new model which we refer to as the two-stage submodular maximization problem. This task can be viewed as a combinatorial analogue of representation learning problems such as dictionary learning and sparse regression. The two-stage problem strictly generalizes the problem of cardinality constrained submodular maximization, though the objective function is not submodular and the techniques for submodular maximization cannot be applied. We describe a continuous optimization method which achieves an approximation ratio which asymptotically approaches 1-1/e. For instances where the asymptotics do not kick in, we design a local-search algorithm whose approximation ratio is arbitrarily close to 1/2. We empirically demonstrate the effectiveness of our methods on two multi-objective data summarization tasks, where the goal is to construct summaries via sparse representative subsets w.r.t. to predefined objectives.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/balkanski16.html
  PDF: http://proceedings.mlr.press/v48/balkanski16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-balkanski16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Eric
    family: Balkanski
  - given: Baharan
    family: Mirzasoleiman
  - given: Andreas
    family: Krause
  - given: Yaron
    family: Singer
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2207-2216
  id: balkanski16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2207
  lastpage: 2216
  published: 2016-06-11 00:00:00 +0000
- title: 'Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units'
  abstract: 'Recently, convolutional neural networks (CNNs) have been used as a powerful tool to solve many problems of machine learning and computer vision. In this paper, we aim to provide insight on the property of convolutional neural networks, as well as a generic method to improve the performance of many CNN architectures. Specifically, we first examine existing CNN models and observe an intriguing property that the filters in the lower layers form pairs (i.e., filters with opposite phase). Inspired by our observation, we propose a novel, simple yet effective activation scheme called concatenated ReLU (CReLU) and theoretically analyze its reconstruction property in CNNs. We integrate CReLU into several state-of-the-art CNN architectures and demonstrate improvement in their recognition performance on CIFAR-10/100 and ImageNet datasets with fewer trainable parameters. Our results suggest that better understanding of the properties of CNNs can lead to significant performance improvement with a simple modification.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/shang16.html
  PDF: http://proceedings.mlr.press/v48/shang16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-shang16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Wenling
    family: Shang
  - given: Kihyuk
    family: Sohn
  - given: Diogo
    family: Almeida
  - given: Honglak
    family: Lee
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2217-2225
  id: shang16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2217
  lastpage: 2225
  published: 2016-06-11 00:00:00 +0000
- title: 'Isotonic Hawkes Processes'
  abstract: 'Hawkes processes are powerful tools for modeling the mutual-excitation phenomena commonly observed in event data from a variety of domains, such as social networks, quantitative finance and healthcare records. The intensity function of a Hawkes process is typically assumed to be linear in the sum of triggering kernels, rendering it inadequate to capture nonlinear effects present in real-world data. To address this shortcoming, we propose an Isotonic-Hawkes process whose intensity function is modulated by an additional nonlinear link function. We also developed a novel iterative algorithm which learns both the nonlinear link function and other parameters provably. We showed that Isotonic-Hawkes processes can fit a variety of nonlinear patterns which cannot be captured by conventional Hawkes processes, and achieve superior empirical performance in real world applications.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wangg16.html
  PDF: http://proceedings.mlr.press/v48/wangg16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wangg16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yichen
    family: Wang
  - given: Bo
    family: Xie
  - given: Nan
    family: Du
  - given: Le
    family: Song
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2226-2234
  id: wangg16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2226
  lastpage: 2234
  published: 2016-06-11 00:00:00 +0000
- title: 'Cross-Graph Learning of Multi-Relational Associations'
  abstract: 'Cross-graph Relational Learning (CGRL) refers to the problem of predicting the strengths or labels of multi-relational tuples of heterogeneous object types, through the joint inference over multiple graphs which specify the internal connections among each type of objects. CGRL is an open challenge in machine learning due to the daunting number of all possible tuples to deal with when the numbers of nodes in multiple graphs are large, and because the labeled training instances are extremely sparse as typical. Existing methods such as tensor factorization or tensor-kernel machines do not work well because of the lack of convex formulation for the optimization of CGRL models, the poor scalability of the algorithms in handling combinatorial numbers of tuples, and/or the non-transductive nature of the learning methods which limits their ability to leverage unlabeled data in training. This paper proposes a novel framework which formulates CGRL as a convex optimization problem, enables transductive learning using both labeled and unlabeled tuples, and offers a scalable algorithm that guarantees the optimal solution and enjoys a constant time complexity with respect to the sizes of input graphs. In our experiments with a subset of DBLP publication records and an Enzyme multi-source dataset, the proposed method successfully scaled to the large cross-graph inference problem, and outperformed other representative approaches significantly.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/liuf16.html
  PDF: http://proceedings.mlr.press/v48/liuf16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-liuf16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hanxiao
    family: Liu
  - given: Yiming
    family: Yang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2235-2243
  id: liuf16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2235
  lastpage: 2243
  published: 2016-06-11 00:00:00 +0000
- title: 'Markov-modulated Marked Poisson Processes for Check-in Data'
  abstract: 'We develop continuous-time probabilistic models to study trajectory data consisting of times and locations of user “check-ins”. We model the data as realizations of a marked point process, with intensity and mark-distribution modulated by a latent Markov jump process (MJP). We also include user-heterogeneity in our model by assigning each user a vector of “preferred locations”. Our model extends latent Dirichlet allocation by dropping the bag-of-words assumption and operating in continuous time. We show how an appropriate choice of priors allows efficient posterior inference. Our experiments demonstrate the usefulness of our approach by comparing with various baselines on a variety of tasks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/pana16.html
  PDF: http://proceedings.mlr.press/v48/pana16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-pana16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jiangwei
    family: Pan
  - given: Vinayak
    family: Rao
  - given: Pankaj
    family: Agarwal
  - given: Alan
    family: Gelfand
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2244-2253
  id: pana16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2244
  lastpage: 2253
  published: 2016-06-11 00:00:00 +0000
- title: 'Beyond Parity Constraints: Fourier Analysis of Hash Functions for Inference'
  abstract: 'Random projections have played an important role in scaling up machine learning and data mining algorithms. Recently they have also been applied to probabilistic inference to estimate properties of high-dimensional distributions; however, they all rely on the same class of projections based on universal hashing. We provide a general framework to analyze random projections which relates their statistical properties to their Fourier spectrum, which is a well-studied area of theoretical computer science. Using this framework we introduce two new classes of hash functions for probabilistic inference and model counting that show promising performance on synthetic and real-world benchmarks.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/achim16.html
  PDF: http://proceedings.mlr.press/v48/achim16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-achim16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tudor
    family: Achim
  - given: Ashish
    family: Sabharwal
  - given: Stefano
    family: Ermon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2254-2262
  id: achim16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2254
  lastpage: 2262
  published: 2016-06-11 00:00:00 +0000
- title: 'On the Power and Limits of Distance-Based Learning'
  abstract: 'We initiate the study of low-distortion finite metric embeddings in multi-class (and multi-label) classification where (i) both the space of input instances and the space of output classes have combinatorial metric structure and (ii) the concepts we wish to learn are low-distortion embeddings. We develop new geometric techniques and prove strong learning lower bounds. These provable limits hold even when we allow learners and classifiers to get advice by one or more experts. Our study overwhelmingly indicates that post-geometry assumptions are necessary in multi-class classification, as in natural language processing (NLP). Technically, the mathematical tools we developed in this work could be of independent interest to NLP. To the best of our knowledge, this is the first work which formally studies classification problems in combinatorial spaces. and where the concepts are low-distortion embeddings.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/papakonstantinou16.html
  PDF: http://proceedings.mlr.press/v48/papakonstantinou16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-papakonstantinou16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Periklis
    family: Papakonstantinou
  - given: Jia
    family: Xu
  - given: Guang
    family: Yang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2263-2271
  id: papakonstantinou16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2263
  lastpage: 2271
  published: 2016-06-11 00:00:00 +0000
- title: 'A Convex Atomic-Norm Approach to Multiple Sequence Alignment and Motif Discovery'
  abstract: 'Multiple Sequence Alignment and Motif Discovery, known as NP-hard problems, are two fundamental tasks in Bioinformatics. Existing approaches to these two problems are based on either local search methods such as Expectation Maximization (EM), Gibbs Sampling or greedy heuristic methods. In this work, we develop a convex relaxation approach to both problems based on the recent concept of atomic norm and develop a new algorithm, termed Greedy Direction Method of Multiplier, for solving the convex relaxation with two convex atomic constraints. Experiments show that our convex relaxation approach produces solutions of higher quality than those standard tools widely-used in Bioinformatics community on the Multiple Sequence Alignment and Motif Discovery problems.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/yena16.html
  PDF: http://proceedings.mlr.press/v48/yena16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-yena16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ian En-Hsu
    family: Yen
  - given: Xin
    family: Lin
  - given: Jiong
    family: Zhang
  - given: Pradeep
    family: Ravikumar
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2272-2280
  id: yena16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2272
  lastpage: 2280
  published: 2016-06-11 00:00:00 +0000
- title: 'Generalized Direct Change Estimation in Ising Model Structure'
  abstract: 'We consider the problem of estimating change in the dependency structure of two p-dimensional Ising models, based on respectively n_1 and n_2 samples drawn from the models. The change is assumed to be structured, e.g., sparse, block sparse, node-perturbed sparse, etc., such that it can be characterized by a suitable (atomic) norm. We present and analyze a norm-regularized estimator for directly estimating the change in structure, without having to estimate the structures of the individual Ising models. The estimator can work with any norm, and can be generalized to other graphical models under mild assumptions. We show that only one set of samples, say n_2, needs to satisfy the sample complexity requirement for the estimator to work, and the estimation error decreases as \fracc\sqrt\min(n_1,n_2), where c depends on the Gaussian width of the unit norm ball. For example, for \ell_1 norm applied to s-sparse change, the change can be accurately estimated with \min(n_1,n_2)=O(s \log p) which is sharper than an existing result n_1= O(s^2 \log p) and n_2 = O(n_1^2). Experimental results illustrating the effectiveness of the proposed estimator are presented.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/fazayeli16.html
  PDF: http://proceedings.mlr.press/v48/fazayeli16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-fazayeli16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Farideh
    family: Fazayeli
  - given: Arindam
    family: Banerjee
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2281-2290
  id: fazayeli16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2281
  lastpage: 2290
  published: 2016-06-11 00:00:00 +0000
- title: 'Robust Principal Component Analysis with Side Information'
  abstract: 'The robust principal component analysis (robust PCA) problem has been considered in many machine learning applications, where the goal is to decompose the data matrix as a low rank part plus a sparse residual. While current approaches are developed by only considering the low rank plus sparse structure, in many applications, side information of row and/or column entities may also be given, and it is still unclear to what extent could such information help robust PCA. Thus, in this paper, we study the problem of robust PCA with side information, where both prior structure and features of entities are exploited for recovery. We propose a convex problem to incorporate side information in robust PCA and show that the low rank matrix can be exactly recovered via the proposed method under certain conditions. In particular, our guarantee suggests that a substantial amount of low rank matrices, which cannot be recovered by standard robust PCA, become recoverable by our proposed method. The result theoretically justifies the effectiveness of features in robust PCA. In addition, we conduct synthetic experiments as well as a real application on noisy image classification to show that our method also improves the performance in practice by exploiting side information.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/chiang16.html
  PDF: http://proceedings.mlr.press/v48/chiang16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-chiang16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Kai-Yang
    family: Chiang
  - given: Cho-Jui
    family: Hsieh
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2291-2299
  id: chiang16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2291
  lastpage: 2299
  published: 2016-06-11 00:00:00 +0000
- title: 'Towards Faster Rates and Oracle Property for Low-Rank Matrix Estimation'
  abstract: 'We present a unified framework for low-rank matrix estimation with a nonconvex penalty. A proximal gradient homotopy algorithm is proposed to solve the proposed optimization problem. Theoretically, we first prove that the proposed estimator attains a faster statistical rate than the traditional low-rank matrix estimator with nuclear norm penalty. Moreover, we rigorously show that under a certain condition on the magnitude of the nonzero singular values, the proposed estimator enjoys oracle property (i.e., exactly recovers the true rank of the matrix), besides attaining a faster rate. Extensive numerical experiments on both synthetic and real world datasets corroborate our theoretical findings.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gui16.html
  PDF: http://proceedings.mlr.press/v48/gui16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gui16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Huan
    family: Gui
  - given: Jiawei
    family: Han
  - given: Quanquan
    family: Gu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2300-2309
  id: gui16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2300
  lastpage: 2309
  published: 2016-06-11 00:00:00 +0000
- title: 'Early and Reliable Event Detection Using Proximity Space Representation'
  abstract: 'Let us consider a specific action or situation (called event) that takes place within a time series. The objective in early detection is to build a decision function that is able to go off as soon as possible from the onset of an occurrence of this event. This implies making a decision with an incomplete information. This paper proposes a novel framework that i) guarantees that a detection made with a partial observation will also occur at full observation of the time-series; ii) incorporates in a consistent manner the lack of knowledge about the minimal amount of information needed to make a decision. The proposed detector is based on mapping the temporal sequences to a landmarking space thanks to appropriately designed similarity functions. As a by-product, the framework benefits from a scalable training algorithm and a theoretical guarantee concerning its generalization ability. We also discuss an important improvement of our framework in which decision function can still be made reliable while being more expressive. Our experimental studies provide compelling results on toy data, presenting the trade-off that occurs when aiming at accuracy, earliness and reliability. Results on real physiological and video datasets show that our proposed approach is as accurate and early as state-of-the-art algorithm, while ensuring reliability and being far more efficient to learn.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/sangnier16.html
  PDF: http://proceedings.mlr.press/v48/sangnier16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-sangnier16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Maxime
    family: Sangnier
  - given: Jerome
    family: Gauthier
  - given: Alain
    family: Rakotomamonjy
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2310-2319
  id: sangnier16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2310
  lastpage: 2319
  published: 2016-06-11 00:00:00 +0000
- title: 'Stratified Sampling Meets Machine Learning'
  abstract: 'This paper solves a specialized regression problem to obtain sampling probabilities for records in databases. The goal is to sample a small set of records over which evaluating aggregate queries can be done both efficiently and accurately. We provide a principled and provable solution for this problem; it is parameterless and requires no data insights. Unlike standard regression problems, the loss is inversely proportional to the regressed-to values. Moreover, a cost zero solution always exists and can only be excluded by hard budget constraints. A unique form of regularization is also needed. We provide an efficient and simple regularized Empirical Risk Minimization (ERM) algorithm along with a theoretical generalization result. Our extensive experimental results significantly improve over both uniform sampling and standard stratified sampling which are de-facto the industry standards.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/liberty16.html
  PDF: http://proceedings.mlr.press/v48/liberty16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-liberty16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Edo
    family: Liberty
  - given: Kevin
    family: Lang
  - given: Konstantin
    family: Shmakov
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2320-2329
  id: liberty16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2320
  lastpage: 2329
  published: 2016-06-11 00:00:00 +0000
- title: 'Efficient Multi-Instance Learning for Activity Recognition from Time Series Data Using an Auto-Regressive Hidden Markov Model'
  abstract: 'Activity recognition from sensor data has spurred a great deal of interest due to its impact on health care. Prior work on activity recognition from multivariate time series data has mainly applied supervised learning techniques which require a high degree of annotation effort to produce training data with the start and end times of each activity. In order to reduce the annotation effort, we present a weakly supervised approach based on multi-instance learning. We introduce a generative graphical model for multi-instance learning on time series data based on an auto-regressive hidden Markov model. Our model has a number of advantages, including the ability to produce both bag and instance-level predictions as well as an efficient exact inference algorithm based on dynamic programming.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/guan16.html
  PDF: http://proceedings.mlr.press/v48/guan16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-guan16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Xinze
    family: Guan
  - given: Raviv
    family: Raich
  - given: Weng-Keen
    family: Wong
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2330-2339
  id: guan16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2330
  lastpage: 2339
  published: 2016-06-11 00:00:00 +0000
- title: 'Generalization Properties and Implicit Regularization for Multiple Passes SGM'
  abstract: 'We study the generalization properties of stochastic gradient methods for learning with convex loss functions and linearly parameterized functions. We show that, in the absence of penalizations or constraints, the stability and approximation properties of the algorithm can be controlled by tuning either the step-size or the number of passes over the data. In this view, these parameters can be seen to control a form of implicit regularization. Numerical results complement the theoretical findings.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lina16.html
  PDF: http://proceedings.mlr.press/v48/lina16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lina16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Junhong
    family: Lin
  - given: Raffaello
    family: Camoriano
  - given: Lorenzo
    family: Rosasco
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2340-2348
  id: lina16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2340
  lastpage: 2348
  published: 2016-06-11 00:00:00 +0000
- title: 'Principal Component Projection Without Principal Component Analysis'
  abstract: 'We show how to efficiently project a vector onto the top principal components of a matrix, *without explicitly computing these components*. Specifically, we introduce an iterative algorithm that provably computes the projection using few calls to any black-box routine for ridge regression. By avoiding explicit principal component analysis (PCA), our algorithm is the first with no runtime dependence on the number of top principal components. We show that it can be used to give a fast iterative method for the popular principal component regression problem, giving the first major runtime improvement over the naive method of combining PCA with regression. To achieve our results, we first observe that ridge regression can be used to obtain a "smooth projection" onto the top principal components. We then sharpen this approximation to true projection using a low-degree polynomial approximation to the matrix step function. Step function approximation is a topic of long-term interest in scientific computing. We extend prior theory by constructing polynomials with simple iterative structure and rigorously analyzing their behavior under limited precision.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/frostig16.html
  PDF: http://proceedings.mlr.press/v48/frostig16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-frostig16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Roy
    family: Frostig
  - given: Cameron
    family: Musco
  - given: Christopher
    family: Musco
  - given: Aaron
    family: Sidford
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2349-2357
  id: frostig16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2349
  lastpage: 2357
  published: 2016-06-11 00:00:00 +0000
- title: 'Recovery guarantee of weighted low-rank approximation via alternating minimization'
  abstract: 'Many applications require recovering a ground truth low-rank matrix from noisy observations of the entries, which in practice is typically formulated as a weighted low-rank approximation problem and solved by non-convex optimization heuristics such as alternating minimization. In this paper, we provide provable recovery guarantee of weighted low-rank via a simple alternating minimization algorithm. In particular, for a natural class of matrices and weights and without any assumption on the noise, we bound the spectral norm of the difference between the recovered matrix and the ground truth, by the spectral norm of the weighted noise plus an additive error term that decreases exponentially with the number of rounds of alternating minimization, from either initialization by SVD or, more importantly, random initialization. These provide the first theoretical results for weighted low-rank approximation via alternating minimization with non-binary deterministic weights, significantly generalizing those for matrix completion, the special case with binary weights, since our assumptions are similar or weaker than those made in existing works. Furthermore, this is achieved by a very simple algorithm that improves the vanilla alternating minimization with a simple clipping step.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lii16.html
  PDF: http://proceedings.mlr.press/v48/lii16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lii16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yuanzhi
    family: Li
  - given: Yingyu
    family: Liang
  - given: Andrej
    family: Risteski
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2358-2367
  id: lii16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2358
  lastpage: 2367
  published: 2016-06-11 00:00:00 +0000
- title: 'Deconstructing the Ladder Network Architecture'
  abstract: 'The Ladder Network is a recent new approach to semi-supervised learning that turned out to be very successful. While showing impressive performance, the Ladder Network has many components intertwined, whose contributions are not obvious in such a complex architecture. This paper presents an extensive experimental investigation of variants of the Ladder Network in which we replaced or removed individual components to learn about their relative importance. For semi-supervised tasks, we conclude that the most important contribution is made by the lateral connections, followed by the application of noise, and the choice of what we refer to as the ‘combinator function’. As the number of labeled training examples increases, the lateral connections and the reconstruction criterion become less important, with most of the generalization improvement coming from the injection of noise in each layer. Finally, we introduce a combinator function that reduces test error rates on Permutation-Invariant MNIST to 0.57% for the supervised setting, and to 0.97% and 1.0% for semi-supervised settings with 1000 and 100 labeled examples, respectively.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/pezeshki16.html
  PDF: http://proceedings.mlr.press/v48/pezeshki16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-pezeshki16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mohammad
    family: Pezeshki
  - given: Linxi
    family: Fan
  - given: Philemon
    family: Brakel
  - given: Aaron
    family: Courville
  - given: Yoshua
    family: Bengio
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2368-2376
  id: pezeshki16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2368
  lastpage: 2376
  published: 2016-06-11 00:00:00 +0000
- title: 'Generalization and Exploration via Randomized Value Functions'
  abstract: 'We propose randomized least-squares value iteration (RLSVI) – a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value iteration that use Boltzmann or epsilon-greedy exploration can be highly inefficient, and we present computational results that demonstrate dramatic efficiency gains enjoyed by RLSVI. Further, we establish an upper bound on the expected regret of RLSVI that demonstrates near-optimality in a tabula rasa learning context. More broadly, our results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/osband16.html
  PDF: http://proceedings.mlr.press/v48/osband16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-osband16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ian
    family: Osband
  - given: Benjamin Van
    family: Roy
  - given: Zheng
    family: Wen
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2377-2386
  id: osband16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2377
  lastpage: 2386
  published: 2016-06-11 00:00:00 +0000
- title: 'Evasion and Hardening of Tree Ensemble Classifiers'
  abstract: 'Classifier evasion consists in finding for a given instance x the “nearest” instance x’ such that the classifier predictions of x and x’ are different. We present two novel algorithms for systematically computing evasions for tree ensembles such as boosted trees and random forests. Our first algorithm uses a Mixed Integer Linear Program solver and finds the optimal evading instance under an expressive set of constraints. Our second algorithm trades off optimality for speed by using symbolic prediction, a novel algorithm for fast finite differences on tree ensembles. On a digit recognition task, we demonstrate that both gradient boosted trees and random forests are extremely susceptible to evasions. Finally, we harden a boosted tree model without loss of predictive accuracy by augmenting the training set of each boosting round with evading instances, a technique we call adversarial boosting.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kantchelian16.html
  PDF: http://proceedings.mlr.press/v48/kantchelian16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kantchelian16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Alex
    family: Kantchelian
  - given: J. D.
    family: Tygar
  - given: Anthony
    family: Joseph
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2387-2396
  id: kantchelian16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2387
  lastpage: 2396
  published: 2016-06-11 00:00:00 +0000
- title: 'Dynamic Memory Networks for Visual and Textual Question Answering'
  abstract: 'Neural network architectures with memory and attention mechanisms exhibit certain reason- ing capabilities required for question answering. One such architecture, the dynamic memory net- work (DMN), obtained high accuracy on a variety of language tasks. However, it was not shown whether the architecture achieves strong results for question answering when supporting facts are not marked during training or whether it could be applied to other modalities such as images. Based on an analysis of the DMN, we propose several improvements to its memory and input modules. Together with these changes we introduce a novel input module for images in order to be able to answer visual questions. Our new DMN+ model improves the state of the art on both the Visual Question Answering dataset and the bAbI-10k text question-answering dataset without supporting fact supervision.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/xiong16.html
  PDF: http://proceedings.mlr.press/v48/xiong16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-xiong16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Caiming
    family: Xiong
  - given: Stephen
    family: Merity
  - given: Richard
    family: Socher
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2397-2406
  id: xiong16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2397
  lastpage: 2406
  published: 2016-06-11 00:00:00 +0000
- title: 'Estimating Cosmological Parameters from the Dark Matter Distribution'
  abstract: 'A grand challenge of the 21st century cosmology is to accurately estimate the cosmological parameters of our Universe. A major approach in estimating the cosmological parameters is to use the large scale matter distribution of the Universe. Galaxy surveys provide the means to map out cosmic large-scale structure in three dimensions. Information about galaxy locations is typically summarized in a "single" function of scale, such as the galaxy correlation function or power-spectrum. We show that it is possible to estimate these cosmological parameters directly from the distribution of matter. This paper presents the application of deep 3D convolutional networks to volumetric representation of dark matter simulations as well as the results obtained using a recently proposed distribution regression framework, showing that machine learning techniques are comparable to, and can sometimes outperform, maximum-likelihood point estimates using "cosmological models". This opens the way to estimating the parameters of our Universe with higher accuracy.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ravanbakhshb16.html
  PDF: http://proceedings.mlr.press/v48/ravanbakhshb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ravanbakhshb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Siamak
    family: Ravanbakhsh
  - given: Junier
    family: Oliva
  - given: Sebastian
    family: Fromenteau
  - given: Layne
    family: Price
  - given: Shirley
    family: Ho
  - given: Jeff
    family: Schneider
  - given: Barnabas
    family: Poczos
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2407-2416
  id: ravanbakhshb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2407
  lastpage: 2416
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning Population-Level Diffusions with Generative RNNs'
  abstract: 'We estimate stochastic processes that govern the dynamics of evolving populations such as cell differentiation. The problem is challenging since longitudinal trajectory measurements of individuals in a population are rarely available due to experimental cost and/or privacy. We show that cross-sectional samples from an evolving population suffice for recovery within a class of processes even if samples are available only at a few distinct time points. We provide a stratified analysis of recoverability conditions, and establish that reversibility is sufficient for recoverability. For estimation, we derive a natural loss and regularization, and parameterize the processes as diffusive recurrent neural networks. We demonstrate the approach in the context of uncovering complex cellular dynamics known as the ‘epigenetic landscape’ from existing biological assays.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hashimoto16.html
  PDF: http://proceedings.mlr.press/v48/hashimoto16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hashimoto16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tatsunori
    family: Hashimoto
  - given: David
    family: Gifford
  - given: Tommi
    family: Jaakkola
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2417-2426
  id: hashimoto16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2417
  lastpage: 2426
  published: 2016-06-11 00:00:00 +0000
- title: 'Expressiveness of Rectifier Networks'
  abstract: 'Rectified Linear Units (ReLUs) have been shown to ameliorate the vanishing gradient problem, allow for efficient backpropagation, and empirically promote sparsity in the learned parameters. They have led to state-of-the-art results in a variety of applications. However, unlike threshold and sigmoid networks, ReLU networks are less explored from the perspective of their expressiveness. This paper studies the expressiveness of ReLU networks. We characterize the decision boundary of two-layer ReLU networks by constructing functionally equivalent threshold networks. We show that while the decision boundary of a two-layer ReLU network can be captured by a threshold network, the latter may require an exponentially larger number of hidden units. We also formulate sufficient conditions for a corresponding logarithmic reduction in the number of hidden units to represent a sign network as a ReLU network. Finally, we experimentally compare threshold networks and their much smaller ReLU counterparts with respect to their ability to learn from synthetically generated data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/panb16.html
  PDF: http://proceedings.mlr.press/v48/panb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-panb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Xingyuan
    family: Pan
  - given: Vivek
    family: Srikumar
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2427-2435
  id: panb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2427
  lastpage: 2435
  published: 2016-06-11 00:00:00 +0000
- title: 'Discrete Distribution Estimation under Local Privacy'
  abstract: 'The collection and analysis of user data drives improvements in the app and web ecosystems, but comes with risks to privacy. This paper examines discrete distribution estimation under local privacy, a setting wherein service providers can learn the distribution of a categorical statistic of interest without collecting the underlying data. We present new mechanisms, including hashed k-ary Randomized Response (KRR), that empirically meet or exceed the utility of existing mechanisms at all privacy levels. New theoretical results demonstrate the order-optimality of KRR and the existing RAPPOR mechanism at different privacy regimes.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/kairouz16.html
  PDF: http://proceedings.mlr.press/v48/kairouz16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-kairouz16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Peter
    family: Kairouz
  - given: Keith
    family: Bonawitz
  - given: Daniel
    family: Ramage
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2436-2444
  id: kairouz16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2436
  lastpage: 2444
  published: 2016-06-11 00:00:00 +0000
- title: 'Square Root Graphical Models: Multivariate Generalizations of Univariate Exponential Families that Permit Positive Dependencies'
  abstract: 'We develop Square Root Graphical Models (SQR), a novel class of parametric graphical models that provides multivariate generalizations of univariate exponential family distributions. Previous multivariate graphical models [Yang et al. 2015] did not allow positive dependencies for the exponential and Poisson generalizations. However, in many real-world datasets, variables clearly have positive dependencies. For example, the airport delay time in New York—modeled as an exponential distribution—is positively related to the delay time in Boston. With this motivation, we give an example of our model class derived from the univariate exponential distribution that allows for almost arbitrary positive and negative dependencies with only a mild condition on the parameter matrix—a condition akin to the positive definiteness of the Gaussian covariance matrix. Our Poisson generalization allows for both positive and negative dependencies without any constraints on the parameter values. We also develop parameter estimation methods using node-wise regressions with \ell_1 regularization and likelihood approximation methods using sampling. Finally, we demonstrate our exponential generalization on a synthetic dataset and a real-world dataset of airport delay times.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/inouye16.html
  PDF: http://proceedings.mlr.press/v48/inouye16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-inouye16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Inouye
  - given: Pradeep
    family: Ravikumar
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2445-2453
  id: inouye16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2445
  lastpage: 2453
  published: 2016-06-11 00:00:00 +0000
- title: 'A Box-Constrained Approach for Hard Permutation Problems'
  abstract: 'We describe the use of sorting networks to form relaxations of problems involving permutations of n objects. This approach is an alternative to relaxations based on the Birkhoff polytope (the set of n \times n doubly stochastic matrices), providing a more compact formulation in which the only constraints are box constraints. Using this approach, we form a variant of the relaxation of the quadratic assignment problem recently studied in Vogelstein et al. (2015), and show that the continuation method applied to this formulation can be quite effective. We develop a coordinate descent algorithm that achieves a per-cycle complexity of O(n^2 \log^2 n). We compare this method with Fast Approximate QAP (FAQ) algorithm introduced in Vogelstein et al. (2015), which uses a conditional-gradient method whose per-iteration complexity is O(n^3). We demonstrate that for most problems in QAPLIB and for a class of synthetic QAP problems, the sorting-network formulation returns solutions that are competitive with the FAQ algorithm, often in significantly less computing time.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lim16.html
  PDF: http://proceedings.mlr.press/v48/lim16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lim16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Cong Han
    family: Lim
  - given: Steve
    family: Wright
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2454-2463
  id: lim16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2454
  lastpage: 2463
  published: 2016-06-11 00:00:00 +0000
- title: 'Geometric Mean Metric Learning'
  abstract: 'We revisit the task of learning a Euclidean metric from data. We approach this problem from first principles and formulate it as a surprisingly simple optimization problem. Indeed, our formulation even admits a closed form solution. This solution possesses several very attractive properties: (i) an innate geometric appeal through the Riemannian geometry of positive definite matrices; (ii) ease of interpretability; and (iii) computational speed several orders of magnitude faster than the widely used LMNN and ITML methods. Furthermore, on standard benchmark datasets, our closed-form solution consistently attains higher classification accuracy.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zadeh16.html
  PDF: http://proceedings.mlr.press/v48/zadeh16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zadeh16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Pourya
    family: Zadeh
  - given: Reshad
    family: Hosseini
  - given: Suvrit
    family: Sra
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2464-2471
  id: zadeh16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2464
  lastpage: 2471
  published: 2016-06-11 00:00:00 +0000
- title: 'Sparse Nonlinear Regression: Parameter Estimation under Nonconvexity'
  abstract: 'We study parameter estimation for sparse nonlinear regression. More specifically, we assume the data are given by y = f( \bf x^T \bf β^* ) + ε, where f is nonlinear. To recover \bf βs, we propose an \ell_1-regularized least-squares estimator. Unlike classical linear regression, the corresponding optimization problem is nonconvex because of the nonlinearity of f. In spite of the nonconvexity, we prove that under mild conditions, every stationary point of the objective enjoys an optimal statistical rate of convergence. Detailed numerical results are provided to back up our theory.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/yangc16.html
  PDF: http://proceedings.mlr.press/v48/yangc16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-yangc16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhuoran
    family: Yang
  - given: Zhaoran
    family: Wang
  - given: Han
    family: Liu
  - given: Yonina
    family: Eldar
  - given: Tong
    family: Zhang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2472-2481
  id: yangc16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2472
  lastpage: 2481
  published: 2016-06-11 00:00:00 +0000
- title: 'Conditional Bernoulli Mixtures for Multi-label Classification'
  abstract: 'Multi-label classification is an important machine learning task wherein one assigns a subset of candidate labels to an object. In this paper, we propose a new multi-label classification method based on Conditional Bernoulli Mixtures. Our proposed method has several attractive properties: it captures label dependencies; it reduces the multi-label problem to several standard binary and multi-class problems; it subsumes the classic independent binary prediction and power-set subset prediction methods as special cases; and it exhibits accuracy and/or computational complexity advantages over existing approaches. We demonstrate two implementations of our method using logistic regressions and gradient boosted trees, together with a simple training procedure based on Expectation Maximization. We further derive an efficient prediction procedure based on dynamic programming, thus avoiding the cost of examining an exponential number of potential label subsets. Experimental results show the effectiveness of the proposed method against competitive alternatives on benchmark datasets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lij16.html
  PDF: http://proceedings.mlr.press/v48/lij16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lij16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Cheng
    family: Li
  - given: Bingyu
    family: Wang
  - given: Virgil
    family: Pavlu
  - given: Javed
    family: Aslam
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2482-2491
  id: lij16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2482
  lastpage: 2491
  published: 2016-06-11 00:00:00 +0000
- title: 'Scalable Discrete Sampling as a Multi-Armed Bandit Problem'
  abstract: 'Drawing a sample from a discrete distribution is one of the building components for Monte Carlo methods. Like other sampling algorithms, discrete sampling suffers from the high computational burden in large-scale inference problems. We study the problem of sampling a discrete random variable with a high degree of dependency that is typical in large-scale Bayesian inference and graphical models, and propose an efficient approximate solution with a subsampling approach. We make a novel connection between the discrete sampling and Multi-Armed Bandits problems with a finite reward population and provide three algorithms with theoretical guarantees. Empirical evaluations show the robustness and efficiency of the approximate algorithms in both synthetic and real-world large-scale problems.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/chenb16.html
  PDF: http://proceedings.mlr.press/v48/chenb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-chenb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yutian
    family: Chen
  - given: Zoubin
    family: Ghahramani
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2492-2501
  id: chenb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2492
  lastpage: 2501
  published: 2016-06-11 00:00:00 +0000
- title: 'Recycling Randomness with Structure for Sublinear time Kernel Expansions'
  abstract: 'We propose a scheme for recycling Gaussian random vectors into structured matrices to ap- proximate various kernel functions in sublin- ear time via random embeddings. Our frame- work includes the Fastfood construction of Le et al. (2013) as a special case, but also ex- tends to Circulant, Toeplitz and Hankel matri- ces, and the broader family of structured matri- ces that are characterized by the concept of low- displacement rank. We introduce notions of co- herence and graph-theoretic structural constants that control the approximation quality, and prove unbiasedness and low-variance properties of ran- dom feature maps that arise within our frame- work. For the case of low-displacement matri- ces, we show how the degree of structure and randomness can be controlled to reduce statis- tical variance at the cost of increased computa- tion and storage requirements. Empirical results strongly support our theory and justify the use of a broader family of structured matrices for scal- ing up kernel methods using random features.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/choromanski16.html
  PDF: http://proceedings.mlr.press/v48/choromanski16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-choromanski16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Krzysztof
    family: Choromanski
  - given: Vikas
    family: Sindhwani
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2502-2510
  id: choromanski16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2502
  lastpage: 2510
  published: 2016-06-11 00:00:00 +0000
- title: 'Bidirectional Helmholtz Machines'
  abstract: 'Efficient unsupervised training and inference in deep generative models remains a challenging problem. One basic approach, called Helmholtz machine or Variational Autoencoder, involves training a top-down directed generative model together with a bottom-up auxiliary model used for approximate inference. Recent results indicate that better generative models can be obtained with better approximate inference procedures. Instead of improving the inference procedure, we here propose a new model, the bidirectional Helmholtz machine, which guarantees that the top-down and bottom-up distributions can efficiently invert each other. We achieve this by interpreting both the top-down and the bottom-up directed models as approximate inference distributions and by defining the model distribution to be the geometric mean of these two. We present a lower-bound for the likelihood of this model and we show that optimizing this bound regularizes the model so that the Bhattacharyya distance between the bottom-up and top-down approximate distributions is minimized. This approach results in state of the art generative models which prefer significantly deeper architectures while it allows for orders of magnitude more efficient likelihood estimation.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bornschein16.html
  PDF: http://proceedings.mlr.press/v48/bornschein16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bornschein16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jorg
    family: Bornschein
  - given: Samira
    family: Shabanian
  - given: Asja
    family: Fischer
  - given: Yoshua
    family: Bengio
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2511-2519
  id: bornschein16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2511
  lastpage: 2519
  published: 2016-06-11 00:00:00 +0000
- title: 'Faster Convex Optimization: Simulated Annealing with an Efficient Universal Barrier'
  abstract: 'This paper explores a surprising equivalence between two seemingly-distinct convex optimization methods. We show that simulated annealing, a well-studied random walk algorithms, is *directly equivalent*, in a certain sense, to the central path interior point algorithm for the the entropic universal barrier function. This connection exhibits several benefits. First, we are able improve the state of the art time complexity for convex optimization under the membership oracle model by devising a new temperature schedule for simulated annealing motivated by central path following interior point methods. Second, we get an efficient randomized interior point method with an efficiently computable universal barrier for any convex set described by a membership oracle. Previously, efficiently computable barriers were known only for particular convex sets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/abernethy16.html
  PDF: http://proceedings.mlr.press/v48/abernethy16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-abernethy16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jacob
    family: Abernethy
  - given: Elad
    family: Hazan
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2520-2528
  id: abernethy16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2520
  lastpage: 2528
  published: 2016-06-11 00:00:00 +0000
- title: 'Preconditioning Kernel Matrices'
  abstract: 'The computational and storage complexity of kernel machines presents the primary barrier to their scaling to large, modern, datasets. A common way to tackle the scalability issue is to use the conjugate gradient algorithm, which relieves the constraints on both storage (the kernel matrix need not be stored) and computation (both stochastic gradients and parallelization can be used). Even so, conjugate gradient is not without its own issues: the conditioning of kernel matrices is often such that conjugate gradients will have poor convergence in practice. Preconditioning is a common approach to alleviating this issue. Here we propose preconditioned conjugate gradients for kernel machines, and develop a broad range of preconditioners particularly useful for kernel matrices. We describe a scalable approach to both solving kernel machines and learning their hyperparameters. We show this approach is exact in the limit of iterations and outperforms state-of-the-art approximations for a given computational budget.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/cutajar16.html
  PDF: http://proceedings.mlr.press/v48/cutajar16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-cutajar16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Kurt
    family: Cutajar
  - given: Michael
    family: Osborne
  - given: John
    family: Cunningham
  - given: Maurizio
    family: Filippone
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2529-2538
  id: cutajar16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2529
  lastpage: 2538
  published: 2016-06-11 00:00:00 +0000
- title: 'Greedy Column Subset Selection: New Bounds and Distributed Algorithms'
  abstract: 'The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy algorithm has been shown to be quite effective in practice. However, theoretical guarantees on its performance have not been explored thoroughly, especially in a distributed setting. In this paper, we study the greedy algorithm for the column subset selection problem from a theoretical and empirical perspective and show its effectiveness in a distributed setting. In particular, we provide an improved approximation guarantee for the greedy algorithm which we show is tight up to a constant factor, and present the first distributed implementation with provable approximation factors. We use the idea of randomized composable core-sets, developed recently in the context of submodular maximization. Finally, we validate the effectiveness of this distributed algorithm via an empirical study.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/altschuler16.html
  PDF: http://proceedings.mlr.press/v48/altschuler16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-altschuler16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jason
    family: Altschuler
  - given: Aditya
    family: Bhaskara
  - given: Gang
    family: Fu
  - given: Vahab
    family: Mirrokni
  - given: Afshin
    family: Rostamizadeh
  - given: Morteza
    family: Zadimoghaddam
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2539-2548
  id: altschuler16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2539
  lastpage: 2548
  published: 2016-06-11 00:00:00 +0000
- title: 'Dynamic Capacity Networks'
  abstract: 'We introduce the Dynamic Capacity Network (DCN), a neural network that can adaptively assign its capacity across different portions of the input data. This is achieved by combining modules of two types: low-capacity sub-networks and high-capacity sub-networks. The low-capacity sub-networks are applied across most of the input, but also provide a guide to select a few portions of the input on which to apply the high-capacity sub-networks. The selection is made using a novel gradient-based attention mechanism, that efficiently identifies input regions for which the DCN’s output is most sensitive and to which we should devote more capacity. We focus our empirical evaluation on the Cluttered MNIST and SVHN image datasets. Our findings indicate that DCNs are able to drastically reduce the number of computations, compared to traditional convolutional neural networks, while maintaining similar or even better performance.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/almahairi16.html
  PDF: http://proceedings.mlr.press/v48/almahairi16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-almahairi16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Amjad
    family: Almahairi
  - given: Nicolas
    family: Ballas
  - given: Tim
    family: Cooijmans
  - given: Yin
    family: Zheng
  - given: Hugo
    family: Larochelle
  - given: Aaron
    family: Courville
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2549-2558
  id: almahairi16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2549
  lastpage: 2558
  published: 2016-06-11 00:00:00 +0000
- title: 'Pricing a Low-regret Seller'
  abstract: 'As the number of ad exchanges has grown, publishers have turned to low regret learning algorithms to decide which exchange offers the best price for their inventory. This in turn opens the following question for the exchange: how to set prices to attract as many sellers as possible and maximize revenue. In this work we formulate this precisely as a learning problem, and present algorithms showing that by simply knowing that the counterparty is using a low regret algorithm is enough for the exchange to have its own low regret learning algorithm to find the optimal price.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/heidari16.html
  PDF: http://proceedings.mlr.press/v48/heidari16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-heidari16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hoda
    family: Heidari
  - given: Mohammad
    family: Mahdian
  - given: Umar
    family: Syed
  - given: Sergei
    family: Vassilvitskii
  - given: Sadra
    family: Yazdanbod
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2559-2567
  id: heidari16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2559
  lastpage: 2567
  published: 2016-06-11 00:00:00 +0000
- title: 'Estimation from Indirect Supervision with Linear Moments'
  abstract: 'In structured prediction problems where we have indirect supervision of the output, maximum marginal likelihood faces two computational obstacles: non-convexity of the objective and intractability of even a single gradient computation. In this paper, we bypass both obstacles for a class of what we call linear indirectly-supervised problems. Our approach is simple: we solve a linear system to estimate sufficient statistics of the model, which we then use to estimate parameters via convex optimization. We analyze the statistical properties of our approach and show empirically that it is effective in two settings: learning with local privacy constraints and learning from low-cost count-based annotations.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/raghunathan16.html
  PDF: http://proceedings.mlr.press/v48/raghunathan16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-raghunathan16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Aditi
    family: Raghunathan
  - given: Roy
    family: Frostig
  - given: John
    family: Duchi
  - given: Percy
    family: Liang
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2568-2577
  id: raghunathan16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2568
  lastpage: 2577
  published: 2016-06-11 00:00:00 +0000
- title: 'Speeding up k-means by approximating Euclidean distances via block vectors'
  abstract: 'This paper introduces a new method to approximate Euclidean distances between points using block vectors in combination with the Hölder inequality. By defining lower bounds based on the proposed approximation, cluster algorithms can be considerably accelerated without loss of quality. In extensive experiments, we show a considerable reduction in terms of computational time in comparison to standard methods and the recently proposed Yinyang k-means. Additionally we show that the memory consumption of the presented clustering algorithm does not depend on the number of clusters, which makes the approach suitable for large scale problems.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bottesch16.html
  PDF: http://proceedings.mlr.press/v48/bottesch16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bottesch16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Thomas
    family: Bottesch
  - given: Thomas
    family: Bühler
  - given: Markus
    family: Kächele
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2578-2586
  id: bottesch16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2578
  lastpage: 2586
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning and Inference via Maximum Inner Product Search'
  abstract: 'A large class of commonly used probabilistic models known as log-linear models are defined up to a normalization constant.Typical learning algorithms for such models require solving a sequence of probabilistic inference queries. These inferences are typically intractable, and are a major bottleneck for learning models with large output spaces. In this paper, we provide a new approach for amortizing the cost of a sequence of related inference queries, such as the ones arising during learning. Our technique relies on a surprising connection with algorithms developed in the past two decades for similarity search in large data bases. Our approach achieves improved running times with provable approximation guarantees. We show that it performs well both on synthetic data and neural language models with large output spaces.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/mussmann16.html
  PDF: http://proceedings.mlr.press/v48/mussmann16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-mussmann16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Stephen
    family: Mussmann
  - given: Stefano
    family: Ermon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2587-2596
  id: mussmann16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2587
  lastpage: 2596
  published: 2016-06-11 00:00:00 +0000
- title: 'A Superlinearly-Convergent Proximal Newton-type Method for the Optimization of Finite Sums'
  abstract: 'We consider the problem of minimizing the strongly convex sum of a finite number of convex functions. Standard algorithms for solving this problem in the class of incremental/stochastic methods have at most a linear convergence rate. We propose a new incremental method whose convergence rate is superlinear – the Newton-type incremental method (NIM). The idea of the method is to introduce a model of the objective with the same sum-of-functions structure and further update a single component of the model per iteration. We prove that NIM has a superlinear local convergence rate and linear global convergence rate. Experiments show that the method is very effective for problems with a large number of functions and a small number of variables.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/rodomanov16.html
  PDF: http://proceedings.mlr.press/v48/rodomanov16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-rodomanov16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Anton
    family: Rodomanov
  - given: Dmitry
    family: Kropotov
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2597-2605
  id: rodomanov16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2597
  lastpage: 2605
  published: 2016-06-11 00:00:00 +0000
- title: 'A Kernel Test of Goodness of Fit'
  abstract: 'We propose a nonparametric statistical test for goodness-of-fit: given a set of samples, the test determines how likely it is that these were generated from a target density function. The measure of goodness-of-fit is a divergence constructed via Stein’s method using functions from a Reproducing Kernel Hilbert Space. Our test statistic is based on an empirical estimate of this divergence, taking the form of a V-statistic in terms of the log gradients of the target density and the kernel. We derive a statistical test, both for i.i.d. and non-i.i.d. samples, where we estimate the null distribution quantiles using a wild bootstrap procedure. We apply our test to quantifying convergence of approximate Markov Chain Monte Carlo methods, statistical model criticism, and evaluating quality of fit vs model complexity in nonparametric density estimation.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/chwialkowski16.html
  PDF: http://proceedings.mlr.press/v48/chwialkowski16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-chwialkowski16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Kacper
    family: Chwialkowski
  - given: Heiko
    family: Strathmann
  - given: Arthur
    family: Gretton
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2606-2615
  id: chwialkowski16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2606
  lastpage: 2615
  published: 2016-06-11 00:00:00 +0000
- title: 'Interacting Particle Markov Chain Monte Carlo'
  abstract: 'We introduce interacting particle Markov chain Monte Carlo (iPMCMC), a PMCMC method based on an interacting pool of standard and conditional sequential Monte Carlo samplers. Like related methods, iPMCMC is a Markov chain Monte Carlo sampler on an extended space. We present empirical results that show significant improvements in mixing rates relative to both non-interacting PMCMC samplers and a single PMCMC sampler with an equivalent memory and computational budget. An additional advantage of the iPMCMC method is that it is suitable for distributed and multi-core architectures.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/rainforth16.html
  PDF: http://proceedings.mlr.press/v48/rainforth16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-rainforth16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Tom
    family: Rainforth
  - given: Christian
    family: Naesseth
  - given: Fredrik
    family: Lindsten
  - given: Brooks
    family: Paige
  - given: Jan-Willem
    family: Vandemeent
  - given: Arnaud
    family: Doucet
  - given: Frank
    family: Wood
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2616-2625
  id: rainforth16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2616
  lastpage: 2625
  published: 2016-06-11 00:00:00 +0000
- title: 'Faster Eigenvector Computation via Shift-and-Invert Preconditioning'
  abstract: 'We give faster algorithms and improved sample complexities for the fundamental problem of estimating the top eigenvector. Given an explicit matrix $A \in \mathbb{R}^{n \times d}$, we show how to compute an $\epsilon$-approximate top eigenvector of $A^TA$ in time $\tilde O\left( \left[\text{nnz}(A) + \frac{d \text{sr}(A)}{\text{gap}^2} \right] \cdot \log 1/\epsilon\right)$. Here $\text{nnz}(A)$ is the number of nonzeros in $A$, $\text{sr}(A)$ is the stable rank, and gap is the relative eigengap. We also consider an online setting in which, given a stream of i.i.d. samples from a distribution D with covariance matrix $\Sigma$ and a vector $x_0$ which is an $O(\text{gap})$ approximate top eigenvector for $\Sigma$, we show how to refine $x_0$ to an $\epsilon$ approximation using  $O \left( \frac{\text{var}(\mathcal{D})}{\text{gap}-\epsilon}\right)$ samples from $\mathcal{D}$. Here $\text{var}(\mathcal{D})$ is a natural notion of variance. Combining our algorithm with previous work to initialize $x_0$, we obtain improved sample complexities and runtimes under a variety of assumptions on D. We achieve our results via a robust analysis of the classic shift-and-invert preconditioning method. This technique lets us reduce eigenvector computation to <i>approximately</i> solving a series of linear systems with fast stochastic gradient methods.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/garber16.html
  PDF: http://proceedings.mlr.press/v48/garber16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-garber16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Dan
    family: Garber
  - given: Elad
    family: Hazan
  - given: Chi
    family: Jin
  - given: 
    family: Sham
  - given: Cameron
    family: Musco
  - given: Praneeth
    family: Netrapalli
  - given: Aaron
    family: Sidford
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2626-2634
  id: garber16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2626
  lastpage: 2634
  published: 2016-06-11 00:00:00 +0000
- title: 'A Theory of Generative ConvNet'
  abstract: 'We show that a generative random field model, which we call generative ConvNet, can be derived from the commonly used discriminative ConvNet, by assuming a ConvNet for multi-category classification and assuming one of the category is a base category generated by a reference distribution. If we further assume that the non-linearity in the ConvNet is Rectified Linear Unit (ReLU) and the reference distribution is Gaussian white noise, then we obtain a generative ConvNet model that is unique among energy-based models: The model is piecewise Gaussian, and the means of the Gaussian pieces are defined by an auto-encoder, where the filters in the bottom-up encoding become the basis functions in the top-down decoding, and the binary activation variables detected by the filters in the bottom-up convolution process become the coefficients of the basis functions in the top-down deconvolution process. The Langevin dynamics for sampling the generative ConvNet is driven by the reconstruction error of this auto-encoder. The contrastive divergence learning of the generative ConvNet reconstructs the training images by the auto-encoder. The maximum likelihood learning algorithm can synthesize realistic natural image patterns.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/xiec16.html
  PDF: http://proceedings.mlr.press/v48/xiec16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-xiec16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jianwen
    family: Xie
  - given: Yang
    family: Lu
  - given: Song-Chun
    family: Zhu
  - given: Yingnian
    family: Wu
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2635-2644
  id: xiec16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2635
  lastpage: 2644
  published: 2016-06-11 00:00:00 +0000
- title: 'Efficient Learning with a Family of Nonconvex Regularizers by Redistributing Nonconvexity'
  abstract: 'The use of convex regularizers allow for easy optimization, though they often produce biased estimation and inferior prediction performance. Recently, nonconvex regularizers have attracted a lot of attention and outperformed convex ones. However, the resultant optimization problem is much harder. In this paper, for a large class of nonconvex regularizers, we propose to move the nonconvexity from the regularizer to the loss. The nonconvex regularizer is then transformed to a familiar convex regularizer, while the resultant loss function can still be guaranteed to be smooth. Learning with the convexified regularizer can be performed by existing efficient algorithms originally designed for convex regularizers (such as the standard proximal algorithm and Frank-Wolfe algorithm). Moreover, it can be shown that critical points of the transformed problem are also critical points of the original problem. Extensive experiments on a number of nonconvex regularization problems show that the proposed procedure is much faster than the state-of-the-art nonconvex solvers.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/yao16.html
  PDF: http://proceedings.mlr.press/v48/yao16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-yao16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Quanming
    family: Yao
  - given: James
    family: Kwok
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2645-2654
  id: yao16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2645
  lastpage: 2654
  published: 2016-06-11 00:00:00 +0000
- title: 'Computationally Efficient Nyström Approximation using Fast Transforms'
  abstract: 'Our goal is to improve the \it training and \it prediction time of Nyström method, which is a widely-used technique for generating low-rank kernel matrix approximations. When applying the Nyström approximation for large-scale applications, both training and prediction time is dominated by computing kernel values between a data point and all landmark points. With m landmark points, this computation requires Θ(md) time (flops), where d is the input dimension. In this paper, we propose the use of a family of fast transforms to generate structured landmark points for Nyström approximation. By exploiting fast transforms, e.g., Haar transform and Hadamard transform, our modified Nyström method requires only Θ(m) or Θ(m\log d) time to compute the kernel values between a given data point and m landmark points. This improvement in time complexity can significantly speed up kernel approximation and benefit prediction speed in kernel machines. For instance, on the webspam data (more than 300,000 data points), our proposed algorithm enables kernel SVM prediction to deliver 98% accuracy and the resulting prediction time is 1000 times faster than LIBSVM and only 10 times slower than linear SVM prediction (which yields only 91% accuracy).'
  volume: 48
  URL: https://proceedings.mlr.press/v48/si16.html
  PDF: http://proceedings.mlr.press/v48/si16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-si16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Si
    family: Si
  - given: Cho-Jui
    family: Hsieh
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2655-2663
  id: si16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2655
  lastpage: 2663
  published: 2016-06-11 00:00:00 +0000
- title: 'Gromov-Wasserstein Averaging of Kernel and Distance Matrices'
  abstract: 'This paper presents a new technique for computing the barycenter of a set of distance or kernel matrices. These matrices, which define the inter-relationships between points sampled from individual domains, are not required to have the same size or to be in row-by-row correspondence. We compare these matrices using the softassign criterion, which measures the minimum distortion induced by a probabilistic map from the rows of one similarity matrix to the rows of another; this criterion amounts to a regularized version of the Gromov-Wasserstein (GW) distance between metric-measure spaces. The barycenter is then defined as a Fréchet mean of the input matrices with respect to this criterion, minimizing a weighted sum of softassign values. We provide a fast iterative algorithm for the resulting nonconvex optimization problem, built upon state-of- the-art tools for regularized optimal transportation. We demonstrate its application to the computation of shape barycenters and to the prediction of energy levels from molecular configurations in quantum chemistry.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/peyre16.html
  PDF: http://proceedings.mlr.press/v48/peyre16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-peyre16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Gabriel
    family: Peyré
  - given: Marco
    family: Cuturi
  - given: Justin
    family: Solomon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2664-2672
  id: peyre16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2664
  lastpage: 2672
  published: 2016-06-11 00:00:00 +0000
- title: 'Robust Monte Carlo Sampling using Riemannian Nosé-Poincaré Hamiltonian Dynamics'
  abstract: 'We present a Monte Carlo sampler using a modified Nosé-Poincaré Hamiltonian along with Riemannian preconditioning. Hamiltonian Monte Carlo samplers allow better exploration of the state space as opposed to random walk-based methods, but, from a molecular dynamics perspective, may not necessarily provide samples from the canonical ensemble. Nosé-Hoover samplers rectify that shortcoming, but the resultant dynamics are not Hamiltonian. Furthermore, usage of these algorithms on large real-life datasets necessitates the use of stochastic gradients, which acts as another potentially destabilizing source of noise. In this work, we propose dynamics based on a modified Nosé-Poincaré Hamiltonian augmented with Riemannian manifold corrections. The resultant symplectic sampling algorithm samples from the canonical ensemble while using structural cues from the Riemannian preconditioning matrices to efficiently traverse the parameter space. We also propose a stochastic variant using additional terms in the Hamiltonian to correct for the noise from the stochastic gradients. We show strong performance of our algorithms on synthetic datasets and high-dimensional Poisson factor analysis-based topic modeling scenarios.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/roychowdhury16.html
  PDF: http://proceedings.mlr.press/v48/roychowdhury16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-roychowdhury16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Anirban
    family: Roychowdhury
  - given: Brian
    family: Kulis
  - given: Srinivasan
    family: Parthasarathy
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2673-2681
  id: roychowdhury16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2673
  lastpage: 2681
  published: 2016-06-11 00:00:00 +0000
- title: 'The Segmented iHMM: A Simple, Efficient Hierarchical Infinite HMM'
  abstract: 'We propose the segmented iHMM (siHMM), a hierarchical infinite hidden Markov model (iHMM) that supports a simple, efficient inference scheme. The siHMM is well suited to segmentation problems, where the goal is to identify points at which a time series transitions from one relatively stable regime to a new regime. Conventional iHMMs often struggle with such problems, since they have no mechanism for distinguishing between high-and low-level dynamics. Hierarchical HMMs (HHMMs) can do better, but they require much more complex and expensive inference algorithms. The siHMM retains the simplicity and efficiency of the iHMM, but outperforms it on a variety of segmentation problems, achieving performance that matches or exceeds that of a more complicated HHMM.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/saeedi16.html
  PDF: http://proceedings.mlr.press/v48/saeedi16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-saeedi16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ardavan
    family: Saeedi
  - given: Matthew
    family: Hoffman
  - given: Matthew
    family: Johnson
  - given: Ryan
    family: Adams
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2682-2691
  id: saeedi16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2682
  lastpage: 2691
  published: 2016-06-11 00:00:00 +0000
- title: 'Meta–Gradient Boosted Decision Tree Model for Weight and Target Learning'
  abstract: 'Labeled training data is an essential part of any supervised machine learning framework. In practice, there is a trade-off between the quality of a label and its cost. In this paper, we consider a problem of learning to rank on a large-scale dataset with low-quality relevance labels aiming at maximizing the quality of a trained ranker on a small validation dataset with high-quality ground truth relevance labels. Motivated by the classical Gauss-Markov theorem for the linear regression problem, we formulate the problems of (1) reweighting training instances and (2) remapping learning targets. We propose meta–gradient decision tree learning framework for optimizing weight and target functions by applying gradient-based hyperparameter optimization. Experiments on a large-scale real-world dataset demonstrate that we can significantly improve state-of-the-art machine-learning algorithms by incorporating our framework.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ustinovskiy16.html
  PDF: http://proceedings.mlr.press/v48/ustinovskiy16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ustinovskiy16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yury
    family: Ustinovskiy
  - given: Valentina
    family: Fedorova
  - given: Gleb
    family: Gusev
  - given: Pavel
    family: Serdyukov
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2692-2701
  id: ustinovskiy16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2692
  lastpage: 2701
  published: 2016-06-11 00:00:00 +0000
- title: 'Discriminative Embeddings of Latent Variable Models for Structured Data'
  abstract: 'Kernel classifiers and regressors designed for structured data, such as sequences, trees and graphs, have significantly advanced a number of interdisciplinary areas such as computational biology and drug design. Typically, kernels are designed beforehand for a data type which either exploit statistics of the structures or make use of probabilistic generative models, and then a discriminative classifier is learned based on the kernels via convex optimization. However, such an elegant two-stage approach also limited kernel methods from scaling up to millions of data points, and exploiting discriminative information to learn feature representations. We propose, structure2vec, an effective and scalable approach for structured data representation based on the idea of embedding latent variable models into feature spaces, and learning such feature spaces using discriminative information. Interestingly, structure2vec extracts features by performing a sequence of function mappings in a way similar to graphical model inference procedures, such as mean field and belief propagation. In applications involving millions of data points, we showed that structure2vec runs 2 times faster, produces models which are 10,000 times smaller, while at the same time achieving the state-of-the-art predictive performance.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/daib16.html
  PDF: http://proceedings.mlr.press/v48/daib16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-daib16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Hanjun
    family: Dai
  - given: Bo
    family: Dai
  - given: Le
    family: Song
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2702-2711
  id: daib16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2702
  lastpage: 2711
  published: 2016-06-11 00:00:00 +0000
- title: 'Robust Random Cut Forest Based Anomaly Detection on Streams'
  abstract: 'In this paper we focus on the anomaly detection problem for dynamic data streams through the lens of random cut forests. We investigate a robust random cut data structure that can be used as a sketch or synopsis of the input stream. We provide a plausible definition of non-parametric anomalies based on the influence of an unseen point on the remainder of the data, i.e., the externality imposed by that point. We show how the sketch can be efficiently updated in a dynamic data stream. We demonstrate the viability of the algorithm on publicly available real data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/guha16.html
  PDF: http://proceedings.mlr.press/v48/guha16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-guha16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sudipto
    family: Guha
  - given: Nina
    family: Mishra
  - given: Gourav
    family: Roy
  - given: Okke
    family: Schrijvers
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2712-2721
  id: guha16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2712
  lastpage: 2721
  published: 2016-06-11 00:00:00 +0000
- title: 'Training Neural Networks Without Gradients: A Scalable ADMM Approach'
  abstract: 'With the growing importance of large network models and enormous training datasets, GPUs have become increasingly necessary to train neural networks. This is largely because conventional optimization algorithms rely on stochastic gradient methods that don’t scale well to large numbers of cores in a cluster setting. Furthermore, the convergence of all gradient methods, including batch methods, suffers from common problems like saturation effects, poor conditioning, and saddle points. This paper explores an unconventional training method that uses alternating direction methods and Bregman iteration to train networks without gradient descent steps. The proposed method reduces the network training problem to a sequence of minimization sub-steps that can each be solved globally in closed form. The proposed method is advantageous because it avoids many of the caveats that make gradient methods slow on highly non-convex problems. In addition, the method exhibits strong scaling in the distributed setting, yielding linear speedups even when split over thousands of cores.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/taylor16.html
  PDF: http://proceedings.mlr.press/v48/taylor16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-taylor16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Gavin
    family: Taylor
  - given: Ryan
    family: Burmeister
  - given: Zheng
    family: Xu
  - given: Bharat
    family: Singh
  - given: Ankit
    family: Patel
  - given: Tom
    family: Goldstein
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2722-2731
  id: taylor16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2722
  lastpage: 2731
  published: 2016-06-11 00:00:00 +0000
- title: 'Clustering High Dimensional Categorical Data via Topographical Features'
  abstract: 'Analysis of categorical data is a challenging task. In this paper, we propose to compute topographical features of high-dimensional categorical data. We propose an efficient algorithm to extract modes of the underlying distribution and their attractive basins. These topographical features provide a geometric view of the data and can be applied to visualization and clustering of real world challenging datasets. Experiments show that our principled method outperforms state-of-the-art clustering methods while also admits an embarrassingly parallel property.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/chenc16.html
  PDF: http://proceedings.mlr.press/v48/chenc16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-chenc16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Chao
    family: Chen
  - given: Novi
    family: Quadrianto
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2732-2740
  id: chenc16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2732
  lastpage: 2740
  published: 2016-06-11 00:00:00 +0000
- title: 'Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis'
  abstract: 'This paper considers the problem of canonical-correlation analysis (CCA) and, more broadly, the generalized eigenvector problem for a pair of symmetric matrices. These are two fundamental problems in data analysis and scientific computing with numerous applications in machine learning and statistics. We provide simple iterative algorithms, with improved runtimes, for solving these problems that are globally linearly convergent with moderate dependencies on the condition numbers and eigenvalue gaps of the matrices involved. We obtain our results by reducing CCA to the top-k generalized eigenvector problem. We solve this problem through a general framework that simply requires black box access to an approximate linear system solver. Instantiating this framework with accelerated gradient descent we obtain a running time of \order\fracz k \sqrtκρ \log(1/ε) \log \left(kκ/ρ\right) where z is the total number of nonzero entries, κis the condition number and ρis the relative eigenvalue gap of the appropriate matrices. Our algorithm is linear in the input size and the number of components k up to a \log(k) factor. This is essential for handling large-scale matrices that appear in practice. To the best of our knowledge this is the first such algorithm with global linear convergence. We hope that our results prompt further research and ultimately improve the practical running time for performing these important data analysis procedures on large data sets.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/geb16.html
  PDF: http://proceedings.mlr.press/v48/geb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-geb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Rong
    family: Ge
  - given: Chi
    family: Jin
  - given: 
    family: Sham
  - given: Praneeth
    family: Netrapalli
  - given: Aaron
    family: Sidford
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2741-2750
  id: geb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2741
  lastpage: 2750
  published: 2016-06-11 00:00:00 +0000
- title: 'Algorithms for Optimizing the Ratio of Submodular Functions'
  abstract: 'We investigate a new optimization problem involving minimizing the Ratio of Submodular (RS) functions. We argue that this problem occurs naturally in several real world applications. We then show the connection between this problem and several related problems, including minimizing the difference of submodular functions, and to submodular optimization subject to submodular constraints. We show RS that optimization can be solved within bounded approximation factors. We also provide a hardness bound and show that our tightest algorithm matches the lower bound up to a \log factor. Finally, we empirically demonstrate the performance and good scalability properties of our algorithms.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/baib16.html
  PDF: http://proceedings.mlr.press/v48/baib16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-baib16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Wenruo
    family: Bai
  - given: Rishabh
    family: Iyer
  - given: Kai
    family: Wei
  - given: Jeff
    family: Bilmes
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2751-2759
  id: baib16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2751
  lastpage: 2759
  published: 2016-06-11 00:00:00 +0000
- title: 'Model-Free Imitation Learning with Policy Optimization'
  abstract: 'In imitation learning, an agent learns how to behave in an environment with an unknown cost function by mimicking expert demonstrations. Existing imitation learning algorithms typically involve solving a sequence of planning or reinforcement learning problems. Such algorithms are therefore not directly applicable to large, high-dimensional environments, and their performance can significantly degrade if the planning problems are not solved to optimality. Under the apprenticeship learning formalism, we develop alternative model-free algorithms for finding a parameterized stochastic policy that performs at least as well as an expert policy on an unknown cost function, based on sample trajectories from the expert. Our approach, based on policy gradients, scales to large continuous environments with guaranteed convergence to local minima.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/ho16.html
  PDF: http://proceedings.mlr.press/v48/ho16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-ho16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jonathan
    family: Ho
  - given: Jayesh
    family: Gupta
  - given: Stefano
    family: Ermon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2760-2769
  id: ho16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2760
  lastpage: 2769
  published: 2016-06-11 00:00:00 +0000
- title: 'ADIOS: Architectures Deep In Output Space'
  abstract: 'Multi-label classification is a generalization of binary classification where the task consists in predicting \emphsets of labels. With the availability of ever larger datasets, the multi-label setting has become a natural one in many applications, and the interest in solving multi-label problems has grown significantly. As expected, deep learning approaches are now yielding state-of-the-art performance for this class of problems. Unfortunately, they usually do not take into account the often unknown but nevertheless rich relationships between labels. In this paper, we propose to make use of this underlying structure by learning to partition the labels into a Markov Blanket Chain and then applying a novel deep architecture that exploits the partition. Experiments on several popular and large multi-label datasets demonstrate that our approach not only yields significant improvements, but also helps to overcome trade-offs specific to the multi-label classification setting.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/cisse16.html
  PDF: http://proceedings.mlr.press/v48/cisse16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-cisse16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Moustapha
    family: Cisse
  - given: Maruan
    family: Al-Shedivat
  - given: Samy
    family: Bengio
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2770-2779
  id: cisse16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2770
  lastpage: 2779
  published: 2016-06-11 00:00:00 +0000
- title: 'Conditional Dependence via Shannon Capacity: Axioms, Estimators and Applications'
  abstract: 'We consider axiomatically the problem of estimating the strength of a conditional dependence relationship P_Y|X from a random variables X to a random variable Y. This has applications in determining the strength of a known causal relationship, where the strength depends only on the conditional distribution of the effect given the cause (and not on the driving distribution of the cause). Shannon capacity, appropriately regularized, emerges as a natural measure under these axioms. We examine the problem of calculating Shannon capacity from the observed samples and propose a novel fixed-k nearest neighbor estimator, and demonstrate its consistency. Finally, we demonstrate an application to single-cell flow-cytometry, where the proposed estimators significantly reduce sample complexity.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gaob16.html
  PDF: http://proceedings.mlr.press/v48/gaob16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gaob16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Weihao
    family: Gao
  - given: Sreeram
    family: Kannan
  - given: Sewoong
    family: Oh
  - given: Pramod
    family: Viswanath
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2780-2789
  id: gaob16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2780
  lastpage: 2789
  published: 2016-06-11 00:00:00 +0000
- title: 'Control of Memory, Active Perception, and Action in Minecraft'
  abstract: 'In this paper, we introduce a new set of reinforcement learning (RL) tasks in Minecraft (a flexible 3D world). We then use these tasks to systematically compare and contrast existing deep reinforcement learning (DRL) architectures with our new memory-based DRL architectures. These tasks are designed to emphasize, in a controllable manner, issues that pose challenges for RL methods including partial observability (due to first-person visual observations), delayed rewards, high-dimensional visual observations, and the need to use active perception in a correct manner so as to perform well in the tasks. While these tasks are conceptually simple to describe, by virtue of having all of these challenges simultaneously they are difficult for current DRL architectures. Additionally, we evaluate the generalization performance of the architectures on environments not used during training. The experimental results show that our new architectures generalize to unseen environments better than existing DRL architectures.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/oh16.html
  PDF: http://proceedings.mlr.press/v48/oh16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-oh16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Junhyuk
    family: Oh
  - given: Valliappa
    family: Chockalingam
  - given: 
    family: Satinder
  - given: Honglak
    family: Lee
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2790-2799
  id: oh16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2790
  lastpage: 2799
  published: 2016-06-11 00:00:00 +0000
- title: 'The Label Complexity of Mixed-Initiative Classifier Training'
  abstract: 'Mixed-initiative classifier training, where the human teacher can choose which items to label or to label items chosen by the computer, has enjoyed empirical success but without a rigorous statistical learning theoretical justification. We analyze the label complexity of a simple mixed-initiative training mechanism using teach- ing dimension and active learning. We show that mixed-initiative training is advantageous com- pared to either computer-initiated (represented by active learning) or human-initiated classifier training. The advantage exists across all human teaching abilities, from optimal to completely unhelpful teachers. We further improve classifier training by educating the human teachers. This is done by showing, or explaining, optimal teaching sets to the human teachers. We conduct Mechanical Turk human experiments on two stylistic classifier training tasks to illustrate our approach.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/suh16.html
  PDF: http://proceedings.mlr.press/v48/suh16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-suh16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jina
    family: Suh
  - given: Xiaojin
    family: Zhu
  - given: Saleema
    family: Amershi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2800-2809
  id: suh16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2800
  lastpage: 2809
  published: 2016-06-11 00:00:00 +0000
- title: 'Bayesian Poisson Tucker Decomposition for Learning the Structure of International Relations'
  abstract: 'We introduce Bayesian Poisson Tucker decomposition (BPTD) for modeling country–country interaction event data. These data consist of interaction events of the form “country i took action a toward country j at time t.” BPTD discovers overlapping country–community memberships, including the number of latent communities. In addition, it discovers directed community–community interaction networks that are specific to “topics” of action types and temporal “regimes.” We show that BPTD yields an efficient MCMC inference algorithm and achieves better predictive performance than related models. We also demonstrate that it discovers interpretable latent structure that agrees with our knowledge of international relations.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/schein16.html
  PDF: http://proceedings.mlr.press/v48/schein16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-schein16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Aaron
    family: Schein
  - given: Mingyuan
    family: Zhou
  - given: David
    family: Blei
  - given: Hanna
    family: Wallach
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2810-2819
  id: schein16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2810
  lastpage: 2819
  published: 2016-06-11 00:00:00 +0000
- title: 'Tensor Decomposition via Joint Matrix Schur Decomposition'
  abstract: 'We describe an approach to tensor decomposition that involves extracting a set of observable matrices from the tensor and applying an approximate joint Schur decomposition on those matrices, and we establish the corresponding first-order perturbation bounds. We develop a novel iterative Gauss-Newton algorithm for joint matrix Schur decomposition, which minimizes a nonconvex objective over the manifold of orthogonal matrices, and which is guaranteed to converge to a global optimum under certain conditions. We empirically demonstrate that our algorithm is faster and at least as accurate and robust than state-of-the-art algorithms for this problem.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/colombo16.html
  PDF: http://proceedings.mlr.press/v48/colombo16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-colombo16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nicolo
    family: Colombo
  - given: Nikos
    family: Vlassis
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2820-2828
  id: colombo16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2820
  lastpage: 2828
  published: 2016-06-11 00:00:00 +0000
- title: 'Continuous Deep Q-Learning with Model-based Acceleration'
  abstract: 'Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of model-free algorithms, particularly when using high-dimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm, which we call normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods. NAF representation allows us to apply Q-learning with experience replay to continuous tasks, and substantially improves performance on a set of simulated robotic control tasks. To further improve the efficiency of our approach, we explore the use of learned models for accelerating model-free reinforcement learning. We show that iteratively refitted local linear models are especially effective for this, and demonstrate substantially faster learning on domains where such models are applicable.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gu16.html
  PDF: http://proceedings.mlr.press/v48/gu16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gu16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Shixiang
    family: Gu
  - given: Timothy
    family: Lillicrap
  - given: Ilya
    family: Sutskever
  - given: Sergey
    family: Levine
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2829-2838
  id: gu16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2829
  lastpage: 2838
  published: 2016-06-11 00:00:00 +0000
- title: 'Domain Adaptation with Conditional Transferable Components'
  abstract: 'Domain adaptation arises in supervised learning when the training (source domain) and test (target domain) data have different distributions. Let X and Y denote the features and target, respectively, previous work on domain adaptation considers the covariate shift situation where the distribution of the features P(X) changes across domains while the conditional distribution P(Y|X) stays the same. To reduce domain discrepancy, recent methods try to find invariant components \mathcalT(X) that have similar P(\mathcalT(X)) by explicitly minimizing a distribution discrepancy measure. However, it is not clear if P(Y|\mathcalT(X)) in different domains is also similar when P(Y|X) changes. Furthermore, transferable components do not necessarily have to be invariant. If the change in some components is identifiable, we can make use of such components for prediction in the target domain. In this paper, we focus on the case where P(X|Y) and P(Y) both change in a causal system in which Y is the cause for X. Under appropriate assumptions, we aim to extract conditional transferable components whose conditional distribution P(\mathcalT(X)|Y) is invariant after proper location-scale (LS) transformations, and identify how P(Y) changes between domains simultaneously. We provide theoretical analysis and empirical evaluation on both synthetic and real-world data to show the effectiveness of our method.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gong16.html
  PDF: http://proceedings.mlr.press/v48/gong16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gong16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mingming
    family: Gong
  - given: Kun
    family: Zhang
  - given: Tongliang
    family: Liu
  - given: Dacheng
    family: Tao
  - given: Clark
    family: Glymour
  - given: Bernhard
    family: Schölkopf
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2839-2848
  id: gong16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2839
  lastpage: 2848
  published: 2016-06-11 00:00:00 +0000
- title: 'Fixed Point Quantization of Deep Convolutional Networks'
  abstract: 'In recent years increasingly complex architectures for deep convolution networks (DCNs) have been proposed to boost the performance on image recognition tasks. However, the gains in performance have come at a cost of substantial increase in computation and model storage resources. Fixed point implementation of DCNs has the potential to alleviate some of these complexities and facilitate potential deployment on embedded hardware. In this paper, we propose a quantizer design for fixed point implementation of DCNs. We formulate and solve an optimization problem to identify optimal fixed point bit-width allocation across DCN layers. Our experiments show that in comparison to equal bit-width settings, the fixed point DCNs with optimized bit width allocation offer >20% reduction in the model size without any loss in accuracy on CIFAR-10 benchmark. We also demonstrate that fine-tuning can further enhance the accuracy of fixed point DCNs beyond that of the original floating point model. In doing so, we report a new state-of-the-art fixed point performance of 6.78% error-rate on CIFAR-10 benchmark.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/linb16.html
  PDF: http://proceedings.mlr.press/v48/linb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-linb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Darryl
    family: Lin
  - given: Sachin
    family: Talathi
  - given: Sreekanth
    family: Annapureddy
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2849-2858
  id: linb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2849
  lastpage: 2858
  published: 2016-06-11 00:00:00 +0000
- title: 'Provable Algorithms for Inference in Topic Models'
  abstract: 'Recently, there has been considerable progress on designing algorithms with provable guarantees —typically using linear algebraic methods—for parameter learning in latent variable models. Designing provable algorithms for inference has proved more difficult. Here we take a first step towards provable inference in topic models. We leverage a property of topic models that enables us to construct simple linear estimators for the unknown topic proportions that have small variance, and consequently can work with short documents. Our estimators also correspond to finding an estimate around which the posterior is well-concentrated. We show lower bounds that for shorter documents it can be information theoretically impossible to find the hidden topics. Finally, we give empirical results that demonstrate that our algorithm works on realistic topic models. It yields good solutions on synthetic data and runs in time comparable to a single iteration of Gibbs sampling.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/arorab16.html
  PDF: http://proceedings.mlr.press/v48/arorab16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-arorab16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Sanjeev
    family: Arora
  - given: Rong
    family: Ge
  - given: Frederic
    family: Koehler
  - given: Tengyu
    family: Ma
  - given: Ankur
    family: Moitra
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2859-2867
  id: arorab16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2859
  lastpage: 2867
  published: 2016-06-11 00:00:00 +0000
- title: 'Epigraph projections for fast general convex programming'
  abstract: 'This paper develops an approach for efficiently solving general convex optimization problems specified as disciplined convex programs (DCP), a common general-purpose modeling framework. Specifically we develop an algorithm based upon fast epigraph projections, projections onto the epigraph of a convex function, an approach closely linked to proximal operator methods. We show that by using these operators, we can solve any disciplined convex program without transforming the problem to a standard cone form, as is done by current DCP libraries. We then develop a large library of efficient epigraph projection operators, mirroring and extending work on fast proximal algorithms, for many common convex functions. Finally, we evaluate the performance of the algorithm, and show it often achieves order of magnitude speedups over existing general-purpose optimization solvers.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/wangh16.html
  PDF: http://proceedings.mlr.press/v48/wangh16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-wangh16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Po-Wei
    family: Wang
  - given: Matt
    family: Wytock
  - given: Zico
    family: Kolter
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2868-2877
  id: wangh16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2868
  lastpage: 2877
  published: 2016-06-11 00:00:00 +0000
- title: 'Fast Algorithms for Segmented Regression'
  abstract: 'We study the fixed design segmented regression problem: Given noisy samples from a piecewise linear function f, we want to recover f up to a desired accuracy in mean-squared error. Previous rigorous approaches for this problem rely on dynamic programming (DP) and, while sample efficient, have running time quadratic in the sample size. As our main contribution, we provide new sample near-linear time algorithms for the problem that - while not being minimax optimal - achieve a significantly better sample-time tradeoff on large datasets compared to the DP approach. Our experimental evaluation shows that, compared with the DP approach, our algorithms provide a convergence rate that is only off by a factor of 2 to 4, while achieving speedups of three orders of magnitude.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/acharya16.html
  PDF: http://proceedings.mlr.press/v48/acharya16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-acharya16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jayadev
    family: Acharya
  - given: Ilias
    family: Diakonikolas
  - given: Jerry
    family: Li
  - given: Ludwig
    family: Schmidt
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2878-2886
  id: acharya16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2878
  lastpage: 2886
  published: 2016-06-11 00:00:00 +0000
- title: 'Energetic Natural Gradient Descent'
  abstract: 'We propose a new class of algorithms for minimizing or maximizing functions of parametric probabilistic models. These new algorithms are natural gradient algorithms that leverage more information than prior methods by using a new metric tensor in place of the commonly used Fisher information matrix. This new metric tensor is derived by computing directions of steepest ascent where the distance between distributions is measured using an approximation of energy distance (as opposed to Kullback-Leibler divergence, which produces the Fisher information matrix), and so we refer to our new ascent direction as the energetic natural gradient.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/thomasb16.html
  PDF: http://proceedings.mlr.press/v48/thomasb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-thomasb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Philip
    family: Thomas
  - given: Bruno Castro
    family: Silva
  - given: Christoph
    family: Dann
  - given: Emma
    family: Brunskill
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2887-2895
  id: thomasb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2887
  lastpage: 2895
  published: 2016-06-11 00:00:00 +0000
- title: 'Partition Functions from Rao-Blackwellized Tempered Sampling'
  abstract: 'Partition functions of probability distributions are important quantities for model evaluation and comparisons. We present a new method to compute partition functions of complex and multimodal distributions. Such distributions are often sampled using simulated tempering, which augments the target space with an auxiliary inverse temperature variable. Our method exploits the multinomial probability law of the inverse temperatures, and provides estimates of the partition function in terms of a simple quotient of Rao-Blackwellized marginal inverse temperature probability estimates, which are updated while sampling. We show that the method has interesting connections with several alternative popular methods, and offers some significant advantages. In particular, we empirically find that the new method provides more accurate estimates than Annealed Importance Sampling when calculating partition functions of large Restricted Boltzmann Machines (RBM); moreover, the method is sufficiently accurate to track training and validation log-likelihoods during learning of RBMs, at minimal computational cost.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/carlson16.html
  PDF: http://proceedings.mlr.press/v48/carlson16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-carlson16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Carlson
  - given: Patrick
    family: Stinson
  - given: Ari
    family: Pakman
  - given: Liam
    family: Paninski
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2896-2905
  id: carlson16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2896
  lastpage: 2905
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning Mixtures of Plackett-Luce Models'
  abstract: 'In this paper we address the identifiability and efficient learning problems of finite mixtures of Plackett-Luce models for rank data. We prove that for any k≥2, the mixture of k Plackett-Luce models for no more than 2k-1 alternatives is non-identifiable and this bound is tight for k=2. For generic identifiability, we prove that the mixture of k Plackett-Luce models over m alternatives is \em generically identifiable if k≤⌊\frac m-2 2⌋!. We also propose an efficient generalized method of moments (GMM) algorithm to learn the mixture of two Plackett-Luce models and show that the algorithm is consistent. Our experiments show that our GMM algorithm is significantly faster than the EMM algorithm by Gormley & Murphy (2008), while achieving competitive statistical efficiency.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/zhaob16.html
  PDF: http://proceedings.mlr.press/v48/zhaob16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-zhaob16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Zhibing
    family: Zhao
  - given: Peter
    family: Piech
  - given: Lirong
    family: Xia
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2906-2914
  id: zhaob16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2906
  lastpage: 2914
  published: 2016-06-11 00:00:00 +0000
- title: 'Near Optimal Behavior via Approximate State Abstraction'
  abstract: 'The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present opportunities for abstraction in environments where no two situations are exactly alike. In this work, we investigate approximate state abstractions, which treat nearly-identical situations as equivalent. We present theoretical guarantees of the quality of behaviors derived from four types of approximate abstractions. Additionally, we empirically demonstrate that approximate abstractions lead to reduction in task complexity and bounded loss of optimality of behavior in a variety of environments.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/abel16.html
  PDF: http://proceedings.mlr.press/v48/abel16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-abel16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Abel
  - given: David
    family: Hershkowitz
  - given: Michael
    family: Littman
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2915-2923
  id: abel16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2915
  lastpage: 2923
  published: 2016-06-11 00:00:00 +0000
- title: 'Power of Ordered Hypothesis Testing'
  abstract: 'Ordered testing procedures are multiple testing procedures that exploit a pre-specified ordering of the null hypotheses, from most to least promising. We analyze and compare the power of several recent proposals using the asymptotic framework of Li & Barber (2015). While accumulation tests including ForwardStop can be quite powerful when the ordering is very informative, they are asymptotically powerless when the ordering is weaker. By contrast, Selective SeqStep, proposed by Barber & Candes (2015), is much less sensitive to the quality of the ordering. We compare the power of these procedures in different regimes, concluding that Selective SeqStep dominates accumulation tests if either the ordering is weak or non-null hypotheses are sparse or weak. Motivated by our asymptotic analysis, we derive an improved version of Selective SeqStep which we call Adaptive SeqStep, analogous to Storey’s improvement on the Benjamini-Hochberg procedure. We compare these methods using the GEO-Query data set analyzed by (Li & Barber, 2015) and find Adaptive SeqStep has favorable performance for both good and bad prior orderings.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lei16.html
  PDF: http://proceedings.mlr.press/v48/lei16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lei16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Lihua
    family: Lei
  - given: William
    family: Fithian
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2924-2932
  id: lei16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2924
  lastpage: 2932
  published: 2016-06-11 00:00:00 +0000
- title: 'PHOG: Probabilistic Model for Code'
  abstract: 'We introduce a new generative model for code called probabilistic higher order grammar (PHOG). PHOG generalizes probabilistic context free grammars (PCFGs) by allowing conditioning of a production rule beyond the parent non-terminal, thus capturing rich contexts relevant to programs. Even though PHOG is more powerful than a PCFG, it can be learned from data just as efficiently. We trained a PHOG model on a large JavaScript code corpus and show that it is more precise than existing models, while similarly fast. As a result, PHOG can immediately benefit existing programming tools based on probabilistic models of code.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bielik16.html
  PDF: http://proceedings.mlr.press/v48/bielik16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bielik16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Pavol
    family: Bielik
  - given: Veselin
    family: Raychev
  - given: Martin
    family: Vechev
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2933-2942
  id: bielik16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2933
  lastpage: 2942
  published: 2016-06-11 00:00:00 +0000
- title: 'Shifting Regret, Mirror Descent, and Matrices'
  abstract: 'We consider the problem of online prediction in changing environments. In this framework the performance of a predictor is evaluated as the loss relative to an arbitrarily changing predictor, whose individual components come from a base class of predictors. Typical results in the literature consider different base classes (experts, linear predictors on the simplex, etc.) separately. Introducing an arbitrary mapping inside the mirror decent algorithm, we provide a framework that unifies and extends existing results. As an example, we prove new shifting regret bounds for matrix prediction problems.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gyorgy16.html
  PDF: http://proceedings.mlr.press/v48/gyorgy16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gyorgy16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Andras
    family: Gyorgy
  - given: Csaba
    family: Szepesvari
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2943-2951
  id: gyorgy16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2943
  lastpage: 2951
  published: 2016-06-11 00:00:00 +0000
- title: 'Scalable Gradient-Based Tuning of Continuous Regularization Hyperparameters'
  abstract: 'Hyperparameter selection generally relies on running multiple full training trials, with selection based on validation set performance. We propose a gradient-based approach for locally adjusting hyperparameters during training of the model. Hyperparameters are adjusted so as to make the model parameter gradients, and hence updates, more advantageous for the validation cost. We explore the approach for tuning regularization hyperparameters and find that in experiments on MNIST, SVHN and CIFAR-10, the resulting regularization levels are within the optimal regions. The additional computational cost depends on how frequently the hyperparameters are trained, but the tested scheme adds only 30% computational overhead regardless of the model size. Since the method is significantly less computationally demanding compared to similar gradient-based approaches to hyperparameter optimization, and consistently finds good hyperparameter values, it can be a useful tool for training neural network models.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/luketina16.html
  PDF: http://proceedings.mlr.press/v48/luketina16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-luketina16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Jelena
    family: Luketina
  - given: Mathias
    family: Berglund
  - given: Klaus
    family: Greff
  - given: Tapani
    family: Raiko
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2952-2960
  id: luketina16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2952
  lastpage: 2960
  published: 2016-06-11 00:00:00 +0000
- title: 'Model-Free Trajectory Optimization for Reinforcement Learning'
  abstract: 'Many of the recent Trajectory Optimization algorithms alternate between local approximation of the dynamics and conservative policy update. However, linearly approximating the dynamics in order to derive the new policy can bias the update and prevent convergence to the optimal policy. In this article, we propose a new model-free algorithm that backpropagates a local quadratic time-dependent Q-Function, allowing the derivation of the policy update in closed form. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics demonstrating improved performance in comparison to related Trajectory Optimization algorithms linearizing the dynamics.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/akrour16.html
  PDF: http://proceedings.mlr.press/v48/akrour16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-akrour16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Riad
    family: Akrour
  - given: Gerhard
    family: Neumann
  - given: Hany
    family: Abdulsamad
  - given: Abbas
    family: Abdolmaleki
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2961-2970
  id: akrour16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2961
  lastpage: 2970
  published: 2016-06-11 00:00:00 +0000
- title: 'Controlling the distance to a Kemeny consensus without computing it'
  abstract: 'Due to its numerous applications, rank aggregation has become a problem of major interest across many fields of the computer science literature. In the vast majority of situations, Kemeny consensus(es) are considered as the ideal solutions. It is however well known that their computation is NP-hard. Many contributions have thus established various results to apprehend this complexity. In this paper we introduce a practical method to predict, for a ranking and a dataset, how close the Kemeny consensus(es) are to this ranking. A major strength of this method is its generality: it does not require any assumption on the dataset nor the ranking. Furthermore, it relies on a new geometric interpretation of Kemeny aggregation that, we believe, could lead to many other results.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/korba16.html
  PDF: http://proceedings.mlr.press/v48/korba16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-korba16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yunlong
    family: Jiao
  - given: Anna
    family: Korba
  - given: Eric
    family: Sibony
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2971-2980
  id: korba16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2971
  lastpage: 2980
  published: 2016-06-11 00:00:00 +0000
- title: 'Horizontally Scalable Submodular Maximization'
  abstract: 'A variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity - number of instances that can fit in memory - must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical constraints. We propose a truly scalable approach for distributed submodular maximization under fixed capacity. The proposed framework applies to a broad class of algorithms and constraints and provides theoretical guarantees on the approximation factor for any available capacity. We empirically evaluate the proposed algorithm on a variety of data sets and demonstrate that it achieves performance competitive with the centralized greedy solution.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/lucic16.html
  PDF: http://proceedings.mlr.press/v48/lucic16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-lucic16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Mario
    family: Lucic
  - given: Olivier
    family: Bachem
  - given: Morteza
    family: Zadimoghaddam
  - given: Andreas
    family: Krause
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2981-2989
  id: lucic16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2981
  lastpage: 2989
  published: 2016-06-11 00:00:00 +0000
- title: 'Group Equivariant Convolutional Networks'
  abstract: 'We introduce Group equivariant Convolutional Neural Networks (G-CNNs), a natural generalization of convolutional neural networks that reduces sample complexity by exploiting symmetries. G-CNNs use G-convolutions, a new type of layer that enjoys a substantially higher degree of weight sharing than regular convolution layers. G-convolutions increase the expressive capacity of the network without increasing the number of parameters. Group convolution layers are easy to use and can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. G-CNNs achieve state of the art results on CIFAR10 and rotated MNIST.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/cohenc16.html
  PDF: http://proceedings.mlr.press/v48/cohenc16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-cohenc16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Taco
    family: Cohen
  - given: Max
    family: Welling
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 2990-2999
  id: cohenc16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 2990
  lastpage: 2999
  published: 2016-06-11 00:00:00 +0000
- title: 'Stochastic Discrete Clenshaw-Curtis Quadrature'
  abstract: 'The partition function is fundamental for probabilistic graphical models—it is required for inference, parameter estimation, and model selection. Evaluating this function corresponds to discrete integration, namely a weighted sum over an exponentially large set. This task quickly becomes intractable as the dimensionality of the problem increases. We propose an approximation scheme that, for any discrete graphical model whose parameter vector has bounded norm, estimates the partition function with arbitrarily small error. Our algorithm relies on a near minimax optimal polynomial approximation to the potential function and a Clenshaw-Curtis style quadrature. Furthermore, we show that this algorithm can be randomized to split the computation into a high-complexity part and a low-complexity part, where the latter may be carried out on small computational devices. Experiments confirm that the new randomized algorithm is highly accurate if the parameter norm is small, and is otherwise comparable to methods with unbounded error.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/piatkowski16.html
  PDF: http://proceedings.mlr.press/v48/piatkowski16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-piatkowski16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Nico
    family: Piatkowski
  - given: Katharina
    family: Morik
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 3000-3009
  id: piatkowski16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 3000
  lastpage: 3009
  published: 2016-06-11 00:00:00 +0000
- title: 'Correcting Forecasts with Multifactor Neural Attention'
  abstract: 'Automatic forecasting of time series data is a challenging problem in many industries. Current forecast models adopted by businesses do not provide adequate means for including data representing external factors that may have a significant impact on the time series, such as weather, national events, local events, social media trends, promotions, etc. This paper introduces a novel neural network attention mechanism that naturally incorporates data from multiple external sources without the feature engineering needed to get other techniques to work. We demonstrate empirically that the proposed model achieves superior performance for predicting the demand of 20 commodities across 107 stores of one of America’s largest retailers when compared to other baseline models, including neural networks, linear models, certain kernel methods, Bayesian regression, and decision trees. Our method ultimately accounts for a 23.9% relative improvement as a result of the incorporation of external data sources, and provides an unprecedented level of descriptive ability for a neural network forecasting model.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/riemer16.html
  PDF: http://proceedings.mlr.press/v48/riemer16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-riemer16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Matthew
    family: Riemer
  - given: Aditya
    family: Vempaty
  - given: Flavio
    family: Calmon
  - given: Fenno
    family: Heath
  - given: Richard
    family: Hull
  - given: Elham
    family: Khabiri
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 3010-3019
  id: riemer16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 3010
  lastpage: 3019
  published: 2016-06-11 00:00:00 +0000
- title: 'Learning Representations for Counterfactual Inference'
  abstract: 'Observational studies are rising in importance due to the widespread accumulation of data in fields such as healthcare, education, employment and ecology. We consider the task of answering counterfactual questions such as, “Would this patient have lower blood sugar had she received a different medication?". We propose a new algorithmic framework for counterfactual inference which brings together ideas from domain adaptation and representation learning. In addition to a theoretical justification, we perform an empirical comparison with previous approaches to causal inference from observational data. Our deep learning algorithm significantly outperforms the previous state-of-the-art.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/johansson16.html
  PDF: http://proceedings.mlr.press/v48/johansson16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-johansson16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Fredrik
    family: Johansson
  - given: Uri
    family: Shalit
  - given: David
    family: Sontag
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 3020-3029
  id: johansson16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 3020
  lastpage: 3029
  published: 2016-06-11 00:00:00 +0000
- title: 'Automatic Construction of Nonparametric Relational Regression Models for Multiple Time Series'
  abstract: 'Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets; US stock data, US house price index data and currency exchange rate data.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/hwangb16.html
  PDF: http://proceedings.mlr.press/v48/hwangb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-hwangb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Yunseong
    family: Hwang
  - given: Anh
    family: Tong
  - given: Jaesik
    family: Choi
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 3030-3039
  id: hwangb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 3030
  lastpage: 3039
  published: 2016-06-11 00:00:00 +0000
- title: 'Inference Networks for Sequential Monte Carlo in Graphical Models'
  abstract: 'We introduce a new approach for amortizing inference in directed graphical models by learning heuristic approximations to stochastic inverses, designed specifically for use as proposal distributions in sequential Monte Carlo methods. We describe a procedure for constructing and learning a structured neural network which represents an inverse factorization of the graphical model, resulting in a conditional density estimator that takes as input particular values of the observed random variables, and returns an approximation to the distribution of the latent variables. This recognition model can be learned offline, independent from any particular dataset, prior to performing inference. The output of these networks can be used as automatically-learned high-quality proposal distributions to accelerate sequential Monte Carlo across a diverse range of problem settings.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/paige16.html
  PDF: http://proceedings.mlr.press/v48/paige16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-paige16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Brooks
    family: Paige
  - given: Frank
    family: Wood
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 3040-3049
  id: paige16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 3040
  lastpage: 3049
  published: 2016-06-11 00:00:00 +0000
- title: 'Slice Sampling on Hamiltonian Trajectories'
  abstract: 'Hamiltonian Monte Carlo and slice sampling are amongst the most widely used and studied classes of Markov Chain Monte Carlo samplers. We connect these two methods and present Hamiltonian slice sampling, which allows slice sampling to be carried out along Hamiltonian trajectories, or transformations thereof. Hamiltonian slice sampling clarifies a class of model priors that induce closed-form slice samplers. More pragmatically, inheriting properties of slice samplers, it offers advantages over Hamiltonian Monte Carlo, in that it has fewer tunable hyperparameters and does not require gradient information. We demonstrate the utility of Hamiltonian slice sampling out of the box on problems ranging from Gaussian process regression to Pitman-Yor based mixture models.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/bloem-reddy16.html
  PDF: http://proceedings.mlr.press/v48/bloem-reddy16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-bloem-reddy16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Benjamin
    family: Bloem-Reddy
  - given: John
    family: Cunningham
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 3050-3058
  id: bloem-reddy16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 3050
  lastpage: 3058
  published: 2016-06-11 00:00:00 +0000
- title: 'Noisy Activation Functions'
  abstract: 'Common nonlinear activation functions used in neural networks can cause training difficulties due to the saturation behavior of the activation function, which may hide dependencies that are not visible to vanilla-SGD (using first order gradients only). Gating mechanisms that use softly saturating activation functions to emulate the discrete switching of digital logic circuits are good examples of this. We propose to exploit the injection of appropriate noise so that the gradients may flow easily, even if the noiseless application of the activation function would yield zero gradients. Large noise will dominate the noise-free gradient and allow stochastic gradient descent to explore more. By adding noise only to the problematic parts of the activation function, we allow the optimization procedure to explore the boundary between the degenerate saturating) and the well-behaved parts of the activation function. We also establish connections to simulated annealing, when the amount of noise is annealed down, making it easier to optimize hard objective functions. We find experimentally that replacing such saturating activation functions by noisy variants helps optimization in many contexts, yielding state-of-the-art or competitive results on different datasets and task, especially when training seems to be the most difficult, e.g., when curriculum learning is necessary to obtain good results.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/gulcehre16.html
  PDF: http://proceedings.mlr.press/v48/gulcehre16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-gulcehre16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Caglar
    family: Gulcehre
  - given: Marcin
    family: Moczulski
  - given: Misha
    family: Denil
  - given: Yoshua
    family: Bengio
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 3059-3068
  id: gulcehre16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 3059
  lastpage: 3068
  published: 2016-06-11 00:00:00 +0000
- title: 'PD-Sparse : A Primal and Dual Sparse Approach to Extreme Multiclass and Multilabel Classification'
  abstract: 'We consider Multiclass and Multilabel classification with extremely large number of classes, of which only few are labeled to each instance. In such setting, standard methods that have training, prediction cost linear to the number of classes become intractable. State-of-the-art methods thus aim to reduce the complexity by exploiting correlation between labels under assumption that the similarity between labels can be captured by structures such as low-rank matrix or balanced tree. However, as the diversity of labels increases in the feature space, structural assumption can be easily violated, which leads to degrade in the testing performance. In this work, we show that a margin-maximizing loss with l1 penalty, in case of Extreme Classification, yields extremely sparse solution both in primal and in dual without sacrificing the expressive power of predictor. We thus propose a Fully-Corrective Block-Coordinate Frank-Wolfe (FC-BCFW) algorithm that exploits both primal and dual sparsity to achieve a complexity sublinear to the number of primal and dual variables. A bi-stochastic search method is proposed to further improve the efficiency. In our experiments on both Multiclass and Multilabel problems, the proposed method achieves significant higher accuracy than existing approaches of Extreme Classification with very competitive training and prediction time.'
  volume: 48
  URL: https://proceedings.mlr.press/v48/yenb16.html
  PDF: http://proceedings.mlr.press/v48/yenb16.pdf
  edit: https://github.com/mlresearch//v48/edit/gh-pages/_posts/2016-06-11-yenb16.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of The 33rd International Conference on Machine Learning'
  publisher: 'PMLR'
  author: 
  - given: Ian En-Hsu
    family: Yen
  - given: Xiangru
    family: Huang
  - given: Pradeep
    family: Ravikumar
  - given: Kai
    family: Zhong
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Maria Florina
    family: Balcan
  - given: Kilian Q.
    family: Weinberger
  address: New York, New York, USA
  page: 3069-3077
  id: yenb16
  issued:
    date-parts: 
      - 2016
      - 6
      - 11
  firstpage: 3069
  lastpage: 3077
  published: 2016-06-11 00:00:00 +0000