- title: 'Proximal Splitting Meets Variance Reduction'
abstract: 'Despite the raise to fame of stochastic variance reduced methods like SAGA and ProxSVRG, their use in non-smooth optimization is still limited to a few simple cases. Existing methods require to compute the proximal operator of the non-smooth term at each iteration, which, for complex penalties like the total variation, overlapping group lasso or trend filtering, is an iterative process that becomes unfeasible for moderately large problems. In this work we propose and analyze VRTOS, a variance-reduced method to solve problems with an arbitrary number of non-smooth terms. Like other variance reduced methods, it only requires to evaluate one gradient per iteration and converges with a constant step size, and so is ideally suited for large scale applications. Unlike existing variance reduced methods, it admits multiple non-smooth terms whose proximal operator only needs to be evaluated once per iteration. We provide a convergence rate analysis for the proposed methods that achieves the same asymptotic rate as their full gradient variants and illustrate its computational advantage on 4 different large scale datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/pedregosa19a.html
PDF: http://proceedings.mlr.press/v89/pedregosa19a/pedregosa19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-pedregosa19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Pedregosa
given: Fabian
- family: Fatras
given: Kilian
- family: Casotto
given: Mattia
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1-10
id: pedregosa19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1
lastpage: 10
published: 2019-04-11 00:00:00 +0000
- title: 'Optimal Noise-Adding Mechanism in Additive Differential Privacy'
abstract: 'We derive the optimal $(0, \delta)$-differentially private query-output independent noise-adding mechanism for single real-valued query function under a general cost-minimization framework. Under a mild technical condition, we show that the optimal noise probability distribution is a uniform distribution with a probability mass at the origin. We explicitly derive the optimal noise distribution for general $\ell^p$ cost functions, including $\ell^1$ (for noise magnitude) and $\ell^2$ (for noise power) cost functions, and show that the probability concentration on the origin occurs when $\delta > \frac{p}{p+1}$. Our result demonstrates an improvement over the existing Gaussian mechanisms by a factor of two and three for $(0,\delta)$-differential privacy in the high privacy regime in the context of minimizing the noise magnitude and noise power, and the gain is more pronounced in the low privacy regime. Our result is consistent with the existing result for $(0,\delta)$-differential privacy in the discrete setting, and identifies a probability concentration phenomenon in the continuous setting.'
volume: 89
URL: http://proceedings.mlr.press/v89/geng19a.html
PDF: http://proceedings.mlr.press/v89/geng19a/geng19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-geng19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Geng
given: Quan
- family: Ding
given: Wei
- family: Guo
given: Ruiqi
- family: Kumar
given: Sanjiv
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 11-20
id: geng19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 11
lastpage: 20
published: 2019-04-11 00:00:00 +0000
- title: 'Tossing Coins Under Monotonicity'
abstract: 'This paper considers the following problem: we are given n coin tosses of coins with monotone increasing probability of getting heads (success). We study the performance of the monotone constrained likelihood estimate, which is equivalent to the estimate produced by isotonic regression. We derive adaptive and non-adaptive bounds on the performance of the isotonic estimate, i.e., we demonstrate that for some probability vectors the isotonic estimate converges much faster than in general. As an application of this framework we propose a two step procedure for the binary monotone single index model, which consists of running LASSO and consequently running an isotonic regression. We provide thorough numerical studies in support of our claims.'
volume: 89
URL: http://proceedings.mlr.press/v89/neykov19a.html
PDF: http://proceedings.mlr.press/v89/neykov19a/neykov19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-neykov19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Neykov
given: Matey
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 21-30
id: neykov19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 21
lastpage: 30
published: 2019-04-11 00:00:00 +0000
- title: 'Gaussian Regression with Convex Constraints'
abstract: 'The focus of this paper is the linear model with Gaussian design under convex constraints. Specifically, we study the performance of the constrained least squares estimate. We derive two general results characterizing its performance - one requiring a tangent cone structure, and one which holds in a general setting. We use our general results to analyze three functional shape constrained problems where the signal is generated from an underlying Lipschitz, monotone or convex function. In each of the examples we show specific classes of functions which achieve fast adaptive estimation rates, and we also provide non-adaptive estimation rates which hold for any function. Our results demonstrate that the Lipschitz, monotone and convex constraints allow one to analyze regression problems even in high-dimensional settings where the dimension may scale as the square or fourth degree of the sample size respectively.'
volume: 89
URL: http://proceedings.mlr.press/v89/neykov19b.html
PDF: http://proceedings.mlr.press/v89/neykov19b/neykov19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-neykov19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Neykov
given: Matey
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 31-38
id: neykov19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 31
lastpage: 38
published: 2019-04-11 00:00:00 +0000
- title: 'Risk-Averse Stochastic Convex Bandit'
abstract: 'Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.'
volume: 89
URL: http://proceedings.mlr.press/v89/cardoso19a.html
PDF: http://proceedings.mlr.press/v89/cardoso19a/cardoso19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-cardoso19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Cardoso
given: Adrian Rivera
- family: Xu
given: Huan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 39-47
id: cardoso19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 39
lastpage: 47
published: 2019-04-11 00:00:00 +0000
- title: 'Error bounds for sparse classifiers in high-dimensions'
abstract: 'We prove an L2 recovery bound for a family of sparse estimators defined as minimizers of some empirical loss functions – which include hinge loss and logistic loss. More precisely, we achieve an upper-bound for coefficients estimation scaling as $(k\ast/n)\log(p/k\ast)$: n,p is the size of the design matrix and k* the dimension of the theoretical loss minimizer. This is done under standard assumptions, for which we derive stronger versions of a cone condition and a restricted strong convexity. Our bound holds with high probability and in expectation and applies to an L1-regularized estimator and to a recently introduced Slope estimator, which we generalize for classification problems. Slope presents the advantage of adapting to unknown sparsity. Thus, we propose a tractable proximal algorithm to compute it and assess its empirical performance. Our results match the best existing bounds for classification and regression problems.'
volume: 89
URL: http://proceedings.mlr.press/v89/dedieu19a.html
PDF: http://proceedings.mlr.press/v89/dedieu19a/dedieu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dedieu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dedieu
given: Antoine
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 48-56
id: dedieu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 48
lastpage: 56
published: 2019-04-11 00:00:00 +0000
- title: 'Boosting Transfer Learning with Survival Data from Heterogeneous Domains'
abstract: 'Survival models derived from health care data are an important support to inform critical screening and therapeutic decisions. Most models however, do not generalize to populations outside the marginal and conditional distribution assumptions for which they were derived. This presents a significant barrier to the deployment of machine learning techniques into wider clinical practice as most medical studies are data scarce, especially for the analysis of time-to-event outcomes. In this work we propose a survival prediction model that is able to improve predictions on a small data domain of interest - such as a local hospital - by leveraging related data from other domains - such as data from other hospitals. We construct an ensemble of weak survival predictors which iteratively adapt the marginal distributions of the source and target data such that similar source patients contribute to the fit and ultimately improve predictions on target patients of interest. This represents the first boosting-based transfer learning algorithm in the survival analysis literature. We demonstrate the performance and utility of our algorithm on synthetic and real healthcare data collected at various locations.'
volume: 89
URL: http://proceedings.mlr.press/v89/bellot19a.html
PDF: http://proceedings.mlr.press/v89/bellot19a/bellot19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-bellot19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Bellot
given: Alexis
- family: Schaar
given: Mihaela
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 57-65
id: bellot19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 57
lastpage: 65
published: 2019-04-11 00:00:00 +0000
- title: 'Resampled Priors for Variational Autoencoders'
abstract: 'We propose Learned Accept/Reject Sampling (LARS), a method for constructing richer priors using rejection sampling with a learned acceptance function. This work is motivated by recent analyses of the VAE objective, which pointed out that commonly used simple priors can lead to underfitting. As the distribution induced by LARS involves an intractable normalizing constant, we show how to estimate it and its gradients efficiently. We demonstrate that LARS priors improve VAE performance on several standard datasets both when they are learned jointly with the rest of the model and when they are fitted to a pretrained model. Finally, we show that LARS can be combined with existing methods for defining flexible priors for an additional boost in performance.'
volume: 89
URL: http://proceedings.mlr.press/v89/bauer19a.html
PDF: http://proceedings.mlr.press/v89/bauer19a/bauer19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-bauer19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Bauer
given: Matthias
- family: Mnih
given: Andriy
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 66-75
id: bauer19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 66
lastpage: 75
published: 2019-04-11 00:00:00 +0000
- title: 'Scalable Bayesian Learning for State Space Models using Variational Inference with SMC Samplers'
abstract: 'We present a scalable approach to performing approximate fully Bayesian inference in generic state space models. The proposed method is an alternative to particle MCMC that provides fully Bayesian inference of both the dynamic latent states and the static pa- rameters of the model. We build up on recent advances in computational statistics that combine variational methods with sequential Monte Carlo sampling and we demonstrate the advantages of performing full Bayesian inference over the static parameters rather than just performing variational EM approxima- tions. We illustrate how our approach enables scalable inference in multivariate stochastic volatility models and self-exciting point pro- cess models that allow for flexible dynamics in the latent intensity function.'
volume: 89
URL: http://proceedings.mlr.press/v89/hirt19a.html
PDF: http://proceedings.mlr.press/v89/hirt19a/hirt19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hirt19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hirt
given: Marcel
- family: Dellaportas
given: Petros
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 76-86
id: hirt19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 76
lastpage: 86
published: 2019-04-11 00:00:00 +0000
- title: 'Scalable Thompson Sampling via Optimal Transport'
abstract: 'Thompson sampling (TS) is a class of algorithms for sequential decision-making, which requires maintaining a posterior distribution over a reward model. However, calculating exact posterior distributions is intractable for all but the simplest models. Consequently, how to computationally-efficiently approximate a posterior distribution is a crucial problem for scalable TS with complex models, such as neural networks. In this paper, we use distribution optimization techniques to approximate the posterior distribution, solved via Wasserstein gradient flows. Based on the framework, a principled particle-optimization algorithm is developed for TS to approximate the posterior efficiently. Our approach is scalable and does not make explicit distribution assumptions on posterior approximations. Extensive experiments on both synthetic data and large-scale real data demonstrate the superior performance of the proposed methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhang19a.html
PDF: http://proceedings.mlr.press/v89/zhang19a/zhang19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhang19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhang
given: Ruiyi
- family: Wen
given: Zheng
- family: Chen
given: Changyou
- family: Fang
given: Chen
- family: Yu
given: Tong
- family: Carin
given: Lawrence
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 87-96
id: zhang19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 87
lastpage: 96
published: 2019-04-11 00:00:00 +0000
- title: 'Inferring Multidimensional Rates of Aging from Cross-Sectional Data'
abstract: 'Modeling how individuals evolve over time is a fundamental problem in the natural and social sciences. However, existing datasets are often cross-sectional with each individual observed only once, making it impossible to apply traditional time-series methods. Motivated by the study of human aging, we present an interpretable latent-variable model that learns temporal dynamics from cross-sectional data. Our model represents each individual’s features over time as a nonlinear function of a low-dimensional, linearly-evolving latent state. We prove that when this nonlinear function is constrained to be order-isomorphic, the model family is identifiable solely from cross-sectional data provided the distribution of time-independent variation is known. On the UK Biobank human health dataset, our model reconstructs the observed data while learning interpretable rates of aging associated with diseases, mortality, and aging risk factors.'
volume: 89
URL: http://proceedings.mlr.press/v89/pierson19a.html
PDF: http://proceedings.mlr.press/v89/pierson19a/pierson19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-pierson19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Pierson
given: Emma
- family: Koh
given: Pang Wei
- family: Hashimoto
given: Tatsunori
- family: Koller
given: Daphne
- family: Leskovec
given: Jure
- family: Eriksson
given: Nick
- family: Liang
given: Percy
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 97-107
id: pierson19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 97
lastpage: 107
published: 2019-04-11 00:00:00 +0000
- title: 'Interaction Detection with Bayesian Decision Tree Ensembles'
abstract: 'Methods based on Bayesian decision tree ensembles have proven valuable in constructing high-quality predictions, and are particularly attractive in certain settings because they encourage low-order interaction effects. Despite adapting to the presence of low-order interactions for prediction purpose, we show that Bayesian decision tree ensembles are generally anti-conservative for the purpose of conducting interaction detection. We address this problem by introducing Dirichlet process forests (DP-Forests), which leverage the presence of low-order interactions by clustering the trees so that trees within the same cluster focus on detecting a specific interaction. We show on both simulated and benchmark data that DP-Forests perform well relative to existing interaction detection techniques for detecting low-order interactions, attaining very low false-positive and false-negative rates while maintaining the same performance for prediction using a comparable computational budget.'
volume: 89
URL: http://proceedings.mlr.press/v89/du19a.html
PDF: http://proceedings.mlr.press/v89/du19a/du19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-du19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Du
given: Junliang
- family: Linero
given: Antonio R.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 108-117
id: du19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 108
lastpage: 117
published: 2019-04-11 00:00:00 +0000
- title: 'On the Interaction Effects Between Prediction and Clustering'
abstract: 'Machine learning systems increasingly depend on pipelines of multiple algorithms to provide high quality and well structured predictions. This paper argues interaction effects between clustering and prediction (e.g. classification, regression) algorithms can cause subtle adverse behaviors during cross-validation that may not be initially apparent. In particular, we focus on the problem of estimating the out-of-cluster (OOC) prediction loss given an approximate clustering with probabilistic error rate p_0. Traditional cross-validation techniques exhibit significant empirical bias in this setting, and the few attempts to estimate and correct for these effects are intractable on larger datasets. Further, no previous work has been able to characterize the conditions under which these empirical effects occur, and if they do, what properties they have. We precisely answer these questions by providing theoretical properties which hold in various settings, and prove that expected out-of-cluster loss behavior rapidly decays with even minor clustering errors. Fortunately, we are able to leverage these same properties to construct hypothesis tests and scalable estimators necessary for correcting the problem. Empirical results on benchmark datasets validate our theoretical results and demonstrate how scaling techniques provide solutions to new classes of problems.'
volume: 89
URL: http://proceedings.mlr.press/v89/barnes19a.html
PDF: http://proceedings.mlr.press/v89/barnes19a/barnes19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-barnes19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Barnes
given: Matt
- family: Dubrawski
given: Artur
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 118-126
id: barnes19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 118
lastpage: 126
published: 2019-04-11 00:00:00 +0000
- title: 'Towards a Theoretical Understanding of Hashing-Based Neural Nets'
abstract: 'Parameter reduction has been a popular topic in deep learning due to the ever- increasing size of deep neural network models and the need to train and run deep neural nets on resource limited machines. Despite many efforts in this area, there were no rigorous theoretical guarantees on why existing neural net compression methods should work. In this paper, we provide provable guarantees on some hashing-based parameter reduction methods in neural nets. First, we introduce a neural net compression scheme based on random linear sketching (which is usually implemented efficiently via hashing), and show that the sketched (smaller) network is able to approximate the original network on all input data coming from any smooth well-conditioned low-dimensional manifold. The sketched network can also be trained directly via back-propagation. Next, we study the previously proposed HashedNets architecture and show that the optimization landscape of one-hidden-layer HashedNets has a local strong convexity property similar to a normal fully connected neural network. Together with the initialization algorithm developed in [51], this implies that the parameters in HashedNets can be provably recovered. We complement our theoretical results with some empirical verification.'
volume: 89
URL: http://proceedings.mlr.press/v89/lin19a.html
PDF: http://proceedings.mlr.press/v89/lin19a/lin19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lin19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lin
given: Yibo
- family: Song
given: Zhao
- family: Yang
given: Lin F.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 127-137
id: lin19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 127
lastpage: 137
published: 2019-04-11 00:00:00 +0000
- title: 'Faster First-Order Methods for Stochastic Non-Convex Optimization on Riemannian Manifolds'
abstract: 'SPIDER (Stochastic Path Integrated Differential EstimatoR) is an efficient gradient estimation technique developed for non-convex stochastic optimization. Although having been shown to attain nearly optimal computational complexity bounds, the SPIDER-type methods are limited to linear metric spaces. In this paper, we introduce the Riemannian SPIDER (R-SPIDER) method as a novel nonlinear-metric extension of SPIDER for efficient non-convex optimization on Riemannian manifolds. We prove that for finite-sum problems with $n$ components, R-SPIDER converges to an $\epsilon$-accuracy stationary point within $\mathcal{O}\big(\min\big(n+\frac{\sqrt{n}}{\epsilon^2},\frac{1}{\epsilon^3}\big)\big)$ stochastic gradient evaluations, which is sharper in magnitude than the prior Riemannian first-order methods. For online optimization, R-SPIDER is shown to converge with $\mathcal{O}\big(\frac{1}{\epsilon^3}\big)$ complexity which is, to the best of our knowledge, the first non-asymptotic result for online Riemannian optimization. Especially, for gradient dominated functions, we further develop a variant of R-SPIDER and prove its linear convergence rate. Numerical results demonstrate the computational efficiency of the proposed methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhou19a.html
PDF: http://proceedings.mlr.press/v89/zhou19a/zhou19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhou19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhou
given: Pan
- family: Yuan
given: Xiao-Tong
- family: Feng
given: Jiashi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 138-147
id: zhou19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 138
lastpage: 147
published: 2019-04-11 00:00:00 +0000
- title: 'LF-PPL: A Low-Level First Order Probabilistic Programming Language for Non-Differentiable Models'
abstract: 'We develop a new Low-level, First-order Probabilistic Programming Language (LF-PPL) suited for models containing a mix of continuous, discrete, and/or piecewise-continuous variables. The key success of this language and its compilation scheme is in its ability to automatically distinguish parameters the density function is discontinuous with respect to, while further providing runtime checks for boundary crossings. This enables the introduction of new inference engines that are able to exploit gradient information, while remaining efficient for models which are not everywhere differentiable. We demonstrate this ability by incorporating a discontinuous Hamiltonian Monte Carlo (DHMC) inference engine that is able to deliver automated and efficient inference for non-differentiable models. Our system is backed up by a mathematical formalism that ensures that any model expressed in this language has a density with measure zero discontinuities to maintain the validity of the inference engine.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhou19b.html
PDF: http://proceedings.mlr.press/v89/zhou19b/zhou19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhou19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhou
given: Yuan
- family: Gram-Hansen
given: Bradley J.
- family: Kohn
given: Tobias
- family: Rainforth
given: Tom
- family: Yang
given: Hongseok
- family: Wood
given: Frank
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 148-157
id: zhou19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 148
lastpage: 157
published: 2019-04-11 00:00:00 +0000
- title: 'Identifiability of Generalized Hypergeometric Distribution (GHD) Directed Acyclic Graphical Models'
abstract: 'We introduce a new class of identifiable DAG models where the conditional distribution of each node given its parents belongs to a family of generalized hypergeometric distributions (GHD). A family of generalized hypergeometric distributions includes a lot of discrete distributions such as the binomial, Beta-binomial, negative binomial, Poisson, hyper-Poisson, and many more. We prove that if the data drawn from the new class of DAG models, one can fully identify the graph structure. We further present a reliable and polynomial-time algorithm that recovers the graph from finitely many data. We show through theoretical results and numerical experiments that our algorithm is statistically consistent in high-dimensional settings (p >n) if the indegree of the graph is bounded, and out-performs state-of-the-art DAG learning algorithms.'
volume: 89
URL: http://proceedings.mlr.press/v89/park19a.html
PDF: http://proceedings.mlr.press/v89/park19a/park19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-park19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Park
given: Gunwoong
- family: Park
given: Hyewon
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 158-166
id: park19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 158
lastpage: 166
published: 2019-04-11 00:00:00 +0000
- title: 'Unbiased Implicit Variational Inference'
abstract: 'We develop unbiased implicit variational inference (UIVI), a method that expands the applicability of variational inference by defining an expressive variational family. UIVI considers an implicit variational distribution obtained in a hierarchical manner using a simple reparameterizable distribution whose variational parameters are defined by arbitrarily flexible deep neural networks. Unlike previous works, UIVI directly optimizes the evidence lower bound (ELBO) rather than an approximation to the ELBO. We demonstrate UIVI on several models, including Bayesian multinomial logistic regression and variational autoencoders, and show that UIVI achieves both tighter ELBO and better predictive performance than existing approaches at a similar computational cost.'
volume: 89
URL: http://proceedings.mlr.press/v89/titsias19a.html
PDF: http://proceedings.mlr.press/v89/titsias19a/titsias19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-titsias19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Titsias
given: Michalis K.
- family: Ruiz
given: Francisco
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 167-176
id: titsias19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 167
lastpage: 176
published: 2019-04-11 00:00:00 +0000
- title: 'Efficient Linear Bandits through Matrix Sketching'
abstract: 'We prove that two popular linear contextual bandit algorithms, OFUL and Thompson Sampling, can be made efficient using Frequent Directions, a deterministic online sketching technique. More precisely, we show that a sketch of size $m$ allows a $\mathcal{O}(md)$ update time for both algorithms, as opposed to $\Omega(d^2)$ required by their non-sketched versions in general (where $d$ is the dimension of context vectors). This computational speedup is accompanied by regret bounds of order $(1+\varepsilon_m)^{3/2}d\sqrt{T}$ for OFUL and of order $\big((1+\varepsilon_m)d\big)^{3/2}\sqrt{T}$ for Thompson Sampling, where $\varepsilon_m$ is bounded by the sum of the tail eigenvalues not covered by the sketch. In particular, when the selected contexts span a subspace of dimension at most $m$, our algorithms have a regret bound matching that of their slower, non-sketched counterparts. Experiments on real-world datasets corroborate our theoretical results.'
volume: 89
URL: http://proceedings.mlr.press/v89/kuzborskij19a.html
PDF: http://proceedings.mlr.press/v89/kuzborskij19a/kuzborskij19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kuzborskij19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kuzborskij
given: Ilja
- family: Cella
given: Leonardo
- family: Cesa-Bianchi
given: Nicolò
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 177-185
id: kuzborskij19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 177
lastpage: 185
published: 2019-04-11 00:00:00 +0000
- title: 'Orthogonal Estimation of Wasserstein Distances'
abstract: 'Wasserstein distances are increasingly used in a wide variety of applications in machine learning. Sliced Wasserstein distances form an important subclass which may be estimated efficiently through one-dimensional sorting operations. In this paper, we propose a new variant of sliced Wasserstein distance, study the use of orthogonal coupling in Monte Carlo estimation of Wasserstein distances and draw connections with stratified sampling, and evaluate our approaches experimentally in a range of large-scale experiments in generative modelling and reinforcement learning.'
volume: 89
URL: http://proceedings.mlr.press/v89/rowland19a.html
PDF: http://proceedings.mlr.press/v89/rowland19a/rowland19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-rowland19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Rowland
given: Mark
- family: Hron
given: Jiri
- family: Tang
given: Yunhao
- family: Choromanski
given: Krzysztof
- family: Sarlos
given: Tamas
- family: Weller
given: Adrian
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 186-195
id: rowland19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 186
lastpage: 195
published: 2019-04-11 00:00:00 +0000
- title: 'Linear Convergence of the Primal-Dual Gradient Method for Convex-Concave Saddle Point Problems without Strong Convexity'
abstract: 'We consider the convex-concave saddle point problem $\min_{x}\max_{y} f(x)+y^\top A x-g(y)$ where $f$ is smooth and convex and $g$ is smooth and strongly convex. We prove that if the coupling matrix $A$ has full column rank, the vanilla primal-dual gradient method can achieve linear convergence even if $f$ is not strongly convex. Our result generalizes previous work which either requires $f$ and $g$ to be quadratic functions or requires proximal mappings for both $f$ and $g$. We adopt a novel analysis technique that in each iteration uses a "ghost" update as a reference, and show that the iterates in the primal-dual gradient method converge to this "ghost" sequence. Using the same technique we further give an analysis for the primal-dual stochastic variance reduced gradient method for convex-concave saddle point problems with a finite-sum structure.'
volume: 89
URL: http://proceedings.mlr.press/v89/du19b.html
PDF: http://proceedings.mlr.press/v89/du19b/du19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-du19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Du
given: Simon S.
- family: Hu
given: Wei
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 196-205
id: du19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 196
lastpage: 205
published: 2019-04-11 00:00:00 +0000
- title: 'Greedy and IHT Algorithms for Non-convex Optimization with Monotone Costs of Non-zeros'
abstract: 'Non-convex optimization methods, such as greedy-style algorithms and iterative hard thresholding (IHT), for $\ell_0$-constrained minimization have been extensively studied thanks to their high empirical performances and strong guarantees. However, few works have considered non-convex optimization with general non-zero patterns; this is unfortunate since various non-zero patterns are quite common in practice. In this paper, we consider the case where non-zero patterns are specified by monotone set functions. We first prove an approximation guarantee of a cost-benefit greedy (CBG) algorithm by using the {\it weak submodularity} of the problem. We then consider an IHT-style algorithm, whose projection step uses CBG, and prove its convergence guarantee. We also provide many applications and experimental results that confirm the advantages of the algorithms introduced.'
volume: 89
URL: http://proceedings.mlr.press/v89/sakaue19a.html
PDF: http://proceedings.mlr.press/v89/sakaue19a/sakaue19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-sakaue19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Sakaue
given: Shinsaku
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 206-215
id: sakaue19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 206
lastpage: 215
published: 2019-04-11 00:00:00 +0000
- title: 'Block Stability for MAP Inference'
abstract: 'Recent work (Lang et al., 2018) has shown that some popular approximate MAP inference algorithms perform very well when the input instance is stable. The simplest stability condition assumes that the MAP solution does not change at all when some of the pairwise potentials are adversarially perturbed. Unfortunately, this strong condition does not seem to hold in practice. We introduce a significantly more relaxed condition that only requires portions of an input instance to be stable. Under this block stability condition, we prove that the pairwise LP relaxation is persistent on the stable blocks. We complement our theoretical results with an evaluation of real-world examples from computer vision, and we find that these instances have large stable regions.'
volume: 89
URL: http://proceedings.mlr.press/v89/lang19a.html
PDF: http://proceedings.mlr.press/v89/lang19a/lang19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lang19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lang
given: Hunter
- family: Sontag
given: David
- family: Vijayaraghavan
given: Aravindan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 216-225
id: lang19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 216
lastpage: 225
published: 2019-04-11 00:00:00 +0000
- title: 'A Stein–Papangelou Goodness-of-Fit Test for Point Processes'
abstract: 'Point processes provide a powerful framework for modeling the distribution and interactions of events in time or space. Their flexibility has given rise to a variety of sophisticated models in statistics and machine learning, yet model diagnostic and criticism techniques remain underdeveloped. In this work, we propose a general Stein operator for point processes based on the Papangelou conditional intensity function. We then establish a kernel goodness-of-fit test by defining a Stein discrepancy measure for general point processes. Notably, our test also applies to non-Poisson point processes whose intensity functions contain intractable normalization constants due to the presence of complex interactions among points. We apply our proposed test to several point process models, and show that it outperforms a two-sample test based on the maximum mean discrepancy.'
volume: 89
URL: http://proceedings.mlr.press/v89/yang19a.html
PDF: http://proceedings.mlr.press/v89/yang19a/yang19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-yang19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Yang
given: Jiasen
- family: Rao
given: Vinayak
- family: Neville
given: Jennifer
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 226-235
id: yang19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 226
lastpage: 235
published: 2019-04-11 00:00:00 +0000
- title: 'KAMA-NNs: Low-dimensional Rotation Based Neural Networks'
abstract: 'We present new architectures for feedforward neural networks built from products of learned or random low-dimensional rotations that offer substantial space compression and computational speedups in comparison to the unstructured baselines. Models using them are also competitive with the baselines and often, due to imposed orthogonal structure, outperform baselines accuracy-wise. We propose to use our architectures in two settings. We show that in the non-adaptive scenario (random neural networks) they lead to asymptotically more accurate, space-efficient and faster estimators of the so-called PNG-kernels (for any activation function defining the PNG). This generalizes several recent theoretical results about orthogonal estimators (e.g. orthogonal JLTs, orthogonal estimators of angular kernels and more). In the adaptive setting we propose efficient algorithms for learning products of low-dimensional rotations and show how our architectures can be used to improve space and time complexity of state of the art reinforcement learning (RL) algorithms (e.g. PPO, TRPO). Here they offer up to 7x compression of the network in comparison to the unstructured baselines and outperform reward-wise state of the art structured neural networks offering similar computational gains and based on low displacement rank matrices.'
volume: 89
URL: http://proceedings.mlr.press/v89/choromanski19a.html
PDF: http://proceedings.mlr.press/v89/choromanski19a/choromanski19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-choromanski19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Choromanski
given: Krzysztof
- family: Pacchiano
given: Aldo
- family: Pennington
given: Jeffrey
- family: Tang
given: Yunhao
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 236-245
id: choromanski19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 236
lastpage: 245
published: 2019-04-11 00:00:00 +0000
- title: 'Statistical Windows in Testing for the Initial Distribution of a Reversible Markov Chain'
abstract: 'We study the problem of hypothesis testing between two discrete distributions, where we only have access to samples after the action of a known reversible Markov chain, playing the role of noise. We derive instance-dependent minimax rates for the sample complexity of this problem, and show how its dependence in time is related to the spectral properties of the Markov chain. We show that there exists a wide statistical window, in terms of sample complexity for hypothesis testing between different pairs of initial distributions. We illustrate these results in several concrete examples.'
volume: 89
URL: http://proceedings.mlr.press/v89/berthet19a.html
PDF: http://proceedings.mlr.press/v89/berthet19a/berthet19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-berthet19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Berthet
given: Quentin
- family: Kanade
given: Varun
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 246-255
id: berthet19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 246
lastpage: 255
published: 2019-04-11 00:00:00 +0000
- title: 'Sketching for Latent Dirichlet-Categorical Models'
abstract: 'Recent work has explored transforming data sets into smaller, approximate summaries in order to scale Bayesian inference. We examine a related problem in which the parameters of a Bayesian model are very large and expensive to store in memory, and propose more compact representations of parameter values that can be used during inference. We focus on a class of graphical models that we refer to as latent Dirichlet-Categorical models, and show how a combination of two sketching algorithms known as count-min sketch and approximate counters provide an efficient representation for them. We show that this sketch combination – which, despite having been used before in NLP applications, has not been previously analyzed – enjoys desirable properties. We prove that for this class of models, when the sketches are used during Markov Chain Monte Carlo inference, the equilibrium of sketched MCMC converges to that of the exact chain as sketch parameters are tuned to reduce the error rate.'
volume: 89
URL: http://proceedings.mlr.press/v89/tassarotti19a.html
PDF: http://proceedings.mlr.press/v89/tassarotti19a/tassarotti19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-tassarotti19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Tassarotti
given: Joseph
- family: Tristan
given: Jean-Baptiste
- family: Wick
given: Michael
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 256-265
id: tassarotti19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 256
lastpage: 265
published: 2019-04-11 00:00:00 +0000
- title: 'Adaptive Activity Monitoring with Uncertainty Quantification in Switching Gaussian Process Models'
abstract: 'Emerging wearable sensors have enabled the unprecedented ability to continuously monitor human activities for healthcare purposes. However, with so many ambient sensors collecting different measurements, it becomes important not only to maintain good monitoring accuracy, but also low power consumption to ensure sustainable monitoring. This power-efficient sensing scheme can be achieved by deciding which group of sensors to use at a given time, requiring an accurate characterization of the trade-off between sensor energy usage and the uncertainty in ignoring certain sensor signals while monitor- ing. To address this challenge in the context of activity monitoring, we have designed an adaptive activity monitoring framework. We first propose a switching Gaussian process to model the observed sensor signals emitting from the underlying activity states. To efficiently compute the Gaussian process model likelihood and quantify the context prediction uncertainty, we propose a block circulant embedding technique and use Fast Fourier Transforms (FFT) for inference. By computing the Bayesian loss function tailored to switching Gaussian processes, an adaptive monitoring procedure is developed to select features from available sensors that optimize the trade-off between sensor power consumption and the prediction performance quantified by state prediction entropy. We demonstrate the effectiveness of our framework on the popular benchmark of UCI Human Activity Recognition using Smartphones.'
volume: 89
URL: http://proceedings.mlr.press/v89/ardywibowo19a.html
PDF: http://proceedings.mlr.press/v89/ardywibowo19a/ardywibowo19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ardywibowo19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ardywibowo
given: Randy
- family: Zhao
given: Guang
- family: Wang
given: Zhangyang
- family: Mortazavi
given: Bobak
- family: Huang
given: Shuai
- family: Qian
given: Xiaoning
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 266-275
id: ardywibowo19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 266
lastpage: 275
published: 2019-04-11 00:00:00 +0000
- title: 'Near Optimal Algorithms for Hard Submodular Programs with Discounted Cooperative Costs'
abstract: 'In this paper, we investigate a class of submodular problems which in general are very hard. These include minimizing a submodular cost function under combinatorial constraints, which include cuts, matchings, paths, etc., optimizing a submodular function under submodular cover and submodular knapsack constraints, and minimizing a ratio of submodular functions. All these problems appear in several real world problems but have hardness factors of $\Omega(\sqrt{n})$ for general submodular cost functions. We show how we can achieve constant approximation factors when we restrict the cost functions to low rank sums of concave over modular functions. A wide variety of machine learning applications are very naturally modeled via this subclass of submodular functions. Our work therefore provides a tighter connection between theory and practice by enabling theoretically satisfying guarantees for a rich class of expressible, natural, and useful submodular cost models. We empirically demonstrate the utility of our models on real world problems of cooperative image matching and sensor placement with cooperative costs.'
volume: 89
URL: http://proceedings.mlr.press/v89/iyer19a.html
PDF: http://proceedings.mlr.press/v89/iyer19a/iyer19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-iyer19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Iyer
given: Rishabh
- family: Bilmes
given: Jeffrey
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 276-285
id: iyer19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 276
lastpage: 285
published: 2019-04-11 00:00:00 +0000
- title: 'Fast Stochastic Algorithms for Low-rank and Nonsmooth Matrix Problems'
abstract: 'Composite convex optimization problems which include both a nonsmooth term and a low-rank promoting term have important applications in machine learning and signal processing, such as when one wishes to recover an unknown matrix that is simultaneously low-rank and sparse. However, such problems are highly challenging to solve in large-scale: the low-rank promoting term prohibits efficient implementations of proximal methods for composite optimization and even simple subgradient methods. On the other hand, methods which are tailored for low-rank optimization, such as conditional gradient-type methods, which are often applied to a smooth approximation of the nonsmooth objective, are slow since their runtime scales with both the large Lipchitz parameter of the smoothed gradient vector and with $1/\epsilon$, where $\epsilon$ is the target accuracy. In this paper we develop efficient algorithms for \textit{stochastic} optimization of a strongly-convex objective which includes both a nonsmooth term and a low-rank promoting term. In particular, to the best of our knowledge, we present the first algorithm that enjoys all following critical properties for large-scale problems: i) (nearly) optimal sample complexity, ii) each iteration requires only a single \textit{low-rank} SVD computation, and iii) overall number of thin-SVD computations scales only with $\log{1/\epsilon}$ (as opposed to $\textrm{poly}(1/\epsilon)$ in previous methods). We also give an algorithm for the closely-related finite-sum setting. We empirically demonstrate our results on the problem of recovering a simultaneously low-rank and sparse matrix.'
volume: 89
URL: http://proceedings.mlr.press/v89/garber19a.html
PDF: http://proceedings.mlr.press/v89/garber19a/garber19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-garber19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Garber
given: Dan
- family: Kaplan
given: Atara
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 286-294
id: garber19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 286
lastpage: 294
published: 2019-04-11 00:00:00 +0000
- title: 'Logarithmic Regret for Online Gradient Descent Beyond Strong Convexity'
abstract: 'Hoffman’s classical result gives a bound on the distance of a point from a convex and compact polytope in terms of the magnitude of violation of the constraints. Recently, several results showed that Hoffman’s bound can be used to derive strongly-convex-like rates for first-order methods for \textit{offline} convex optimization of curved, though not strongly convex, functions, over polyhedral sets. In this work, we use this classical result for the first time to obtain faster rates for \textit{online convex optimization} over polyhedral sets with curved convex, though not strongly convex, loss functions. We show that under several reasonable assumptions on the data, the standard \textit{Online Gradient Descent} algorithm guarantees logarithmic regret. To the best of our knowledge, the only previous algorithm to achieve logarithmic regret in the considered settings is the \textit{Online Newton Step} algorithm which requires quadratic (in the dimension) memory and at least quadratic runtime per iteration, which greatly limits its applicability to large-scale problems. In particular, our results hold for \textit{semi-adversarial} settings in which the data is a combination of an arbitrary (adversarial) sequence and a stochastic sequence, which might provide reasonable approximation for many real-world sequences, or under a natural assumption that the data is low-rank. We demonstrate via experiments that the regret of OGD is indeed comparable to that of ONS (and even far better) on curved though not strongly-convex losses.'
volume: 89
URL: http://proceedings.mlr.press/v89/garber19b.html
PDF: http://proceedings.mlr.press/v89/garber19b/garber19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-garber19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Garber
given: Dan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 295-303
id: garber19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 295
lastpage: 303
published: 2019-04-11 00:00:00 +0000
- title: 'Accelerated Coordinate Descent with Arbitrary Sampling and Best Rates for Minibatches'
abstract: 'Accelerated coordinate descent is a widely popular optimization algorithm due to its efficiency on large-dimensional problems. It achieves state-of-the-art complexity on an important class of empirical risk minimization problems. In this paper we design and analyze an accelerated coordinate descent (\texttt{ACD}) method which in each iteration updates a random subset of coordinates according to an arbitrary but fixed probability law, which is a parameter of the method. While mini-batch variants of \texttt{ACD} are more popular and relevant in practice, there is no importance sampling for \texttt{ACD} that outperforms the standard uniform mini-batch sampling. Through insights enabled by our general analysis, we design new importance sampling for mini-batch \texttt{ACD} which significantly outperforms previous state-of-the-art minibatch \texttt{ACD} in practice. We prove a rate that is at most $\mathcal{O}(\sqrt{\tau})$ times worse than the rate of minibatch \texttt{ACD} with uniform sampling, but can be $\mathcal{O}(n/\tau)$ times better, where $\tau$ is the minibatch size. Since in modern supervised learning training systems it is standard practice to choose $\tau \ll n$, and often $\tau=\mathcal{O}(1)$, our method can lead to dramatic speedups. Lastly, we obtain similar results for minibatch nonaccelerated \texttt{CD} as well, achieving improvements on previous best rates.'
volume: 89
URL: http://proceedings.mlr.press/v89/hanzely19a.html
PDF: http://proceedings.mlr.press/v89/hanzely19a/hanzely19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hanzely19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hanzely
given: Filip
- family: Richtarik
given: Peter
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 304-312
id: hanzely19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 304
lastpage: 312
published: 2019-04-11 00:00:00 +0000
- title: 'Globally-convergent Iteratively Reweighted Least Squares for Robust Regression Problems'
abstract: 'We provide the first global model recovery results for the IRLS (iteratively reweighted least squares) heuristic for robust regression problems. IRLS is known to offer excellent performance, despite bad initializations and data corruption, for several parameter estimation problems. Existing analyses of IRLS frequently require careful initialization, thus offering only local convergence guarantees. We remedy this by proposing augmentations to the basic IRLS routine that not only offer guaranteed global recovery, but in practice also outperform state-of-the-art algorithms for robust regression. Our routines are more immune to hyperparameter misspecification in basic regression tasks, as well as applied tasks such as linear-armed bandit problems. Our theoretical analyses rely on a novel extension of the notions of strong convexity and smoothness to weighted strong convexity and smoothness, and establishing that sub-Gaussian designs offer bounded weighted condition numbers. These notions may be useful in analyzing other algorithms as well.'
volume: 89
URL: http://proceedings.mlr.press/v89/mukhoty19a.html
PDF: http://proceedings.mlr.press/v89/mukhoty19a/mukhoty19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-mukhoty19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Mukhoty
given: Bhaskar
- family: Gopakumar
given: Govind
- family: Jain
given: Prateek
- family: Kar
given: Purushottam
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 313-322
id: mukhoty19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 313
lastpage: 322
published: 2019-04-11 00:00:00 +0000
- title: 'Modularity-based Sparse Soft Graph Clustering'
abstract: 'Clustering is a central problem in machine learning for which graph-based approaches have proven their efficiency. In this paper, we study a relaxation of the modularity maximization problem, well-known in the graph partitioning literature. A solution of this relaxation gives to each element of the dataset a probability to belong to a given cluster, whereas a solution of the standard modularity problem is a partition. We introduce an efficient optimization algorithm to solve this relaxation, that is both memory efficient and local. Furthermore, we prove that our method includes, as a special case, the Louvain optimization scheme, a state-of-the-art technique to solve the traditional modularity problem. Experiments on both synthetic and real-world data illustrate that our approach provides meaningful information on various types of data.'
volume: 89
URL: http://proceedings.mlr.press/v89/hollocou19a.html
PDF: http://proceedings.mlr.press/v89/hollocou19a/hollocou19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hollocou19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hollocou
given: Alexandre
- family: Bonald
given: Thomas
- family: Lelarge
given: Marc
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 323-332
id: hollocou19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 323
lastpage: 332
published: 2019-04-11 00:00:00 +0000
- title: 'Pathwise Derivatives for Multivariate Distributions'
abstract: 'We exploit the link between the transport equation and derivatives of expectations to construct efficient pathwise gradient estimators for multivariate distributions. We focus on two main threads. First, we use null solutions of the transport equation to construct adaptive control variates that can be used to construct gradient estimators with reduced variance. Second, we consider the case of multivariate mixture distributions. In particular we show how to compute pathwise derivatives for mixtures of multivariate Normal distributions with arbitrary means and diagonal covariances. We demonstrate in a variety of experiments in the context of variational inference that our gradient estimators can outperform other methods, especially in high dimensions.'
volume: 89
URL: http://proceedings.mlr.press/v89/jankowiak19a.html
PDF: http://proceedings.mlr.press/v89/jankowiak19a/jankowiak19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-jankowiak19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Jankowiak
given: Martin
- family: Karaletsos
given: Theofanis
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 333-342
id: jankowiak19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 333
lastpage: 342
published: 2019-04-11 00:00:00 +0000
- title: 'Distributed Inexact Newton-type Pursuit for Non-convex Sparse Learning'
abstract: 'In this paper, we present a sample distributed greedy pursuit method for non-convex sparse learning under cardinality constraint. Given the training samples uniformly randomly partitioned across multiple machines, the proposed method alternates between local inexact sparse minimization of a Newton-type approximation and centralized global results aggregation. Theoretical analysis shows that for a general class of convex functions with Lipschitze continues Hessian, the method converges linearly with contraction factor scaling inversely to the local data size; whilst the communication complexity required to reach desirable statistical accuracy scales logarithmically with respect to the number of machines for some popular statistical learning models. For nonconvex objective functions, up to a local estimation error, our method can be shown to converge to a local stationary sparse solution with sub-linear communication complexity. Numerical results demonstrate the efficiency and accuracy of our method when applied to large-scale sparse learning tasks including deep neural nets pruning'
volume: 89
URL: http://proceedings.mlr.press/v89/liu19a.html
PDF: http://proceedings.mlr.press/v89/liu19a/liu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-liu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Liu
given: Bo
- family: Yuan
given: Xiao-Tong
- family: Wang
given: Lezi
- family: Liu
given: Qingshan
- family: Huang
given: Junzhou
- family: Metaxas
given: Dimitris N.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 343-352
id: liu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 343
lastpage: 352
published: 2019-04-11 00:00:00 +0000
- title: 'Vine copula structure learning via Monte Carlo tree search'
abstract: 'Monte Carlo tree search (MCTS) has been widely adopted in various game and planning problems. It can efficiently explore a search space with guided random sampling. In statistics, vine copulas are flexible multivariate dependence models that adopt vine structures, which are based on a hierarchy of trees to express conditional dependence, and bivariate copulas on the edges of the trees. The vine structure learning problem has been challenging due to the large search space. To tackle this problem, we propose a novel approach to learning vine structures using MCTS. The proposed method has significantly better performance over the existing methods under various experimental setups.'
volume: 89
URL: http://proceedings.mlr.press/v89/chang19a.html
PDF: http://proceedings.mlr.press/v89/chang19a/chang19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chang19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chang
given: Bo
- family: Pan
given: Shenyi
- family: Joe
given: Harry
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 353-361
id: chang19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 353
lastpage: 361
published: 2019-04-11 00:00:00 +0000
- title: 'Blind Demixing via Wirtinger Flow with Random Initialization'
abstract: 'This paper concerns the problem of demixing a series of source signals from the sum of bilinear measurements. This problem spans diverse areas such as communication, imaging processing, machine learning, etc. However, semidefinite programming for blind demixing is prohibitive to large-scale problems due to high computational complexity and storage cost. Although several efficient algorithms have been developed recently that enjoy the benefits of fast convergence rates and even regularization free, they still call for spectral initialization. To find simple initialization approach that works equally well as spectral initialization, we propose to solve blind demixing problem via Wirtinger flow with random initialization, which yields a natural implementation. To reveal the efficiency of this algorithm, we provide the global convergence guarantee concerning randomly initialized Wirtinger flow for blind demixing. Specifically, it shows that with sufficient samples, the iterates of randomly initialized Wirtinger flow can enter a local region that enjoys strong convexity and strong smoothness within a few iterations at the first stage. At the second stage, iterates of randomly initialized Wirtinger flow further converge linearly to the ground truth.'
volume: 89
URL: http://proceedings.mlr.press/v89/dong19a.html
PDF: http://proceedings.mlr.press/v89/dong19a/dong19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dong19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dong
given: Jialin
- family: Shi
given: Yuanming
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 362-370
id: dong19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 362
lastpage: 370
published: 2019-04-11 00:00:00 +0000
- title: 'Performance Metric Elicitation from Pairwise Classifier Comparisons'
abstract: 'Given a binary prediction problem, which performance metric should the classifier optimize? We address this question by formalizing the problem of Metric Elicitation. The goal of metric elicitation is to discover the performance metric of a practitioner, which reflects her innate rewards (costs) for correct (incorrect) classification. In particular, we focus on eliciting binary classification performance metrics from pairwise feedback, where a practitioner is queried to provide relative preference between two classifiers. By exploiting key geometric properties of the space of confusion matrices, we obtain provably query efficient algorithms for eliciting linear and linear-fractional performance metrics. We further show that our method is robust to feedback and finite sample noise.'
volume: 89
URL: http://proceedings.mlr.press/v89/hiranandani19a.html
PDF: http://proceedings.mlr.press/v89/hiranandani19a/hiranandani19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hiranandani19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hiranandani
given: Gaurush
- family: Boodaghians
given: Shant
- family: Mehta
given: Ruta
- family: Koyejo
given: Oluwasanmi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 371-379
id: hiranandani19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 371
lastpage: 379
published: 2019-04-11 00:00:00 +0000
- title: 'Analysis of Network Lasso for Semi-Supervised Regression'
abstract: 'We apply network Lasso to semi-supervised regression problems involving network-structured data. This approach lends quite naturally to highly scalable learning algorithms in the form of message passing over an empirical graph which represents the network structure of the data. By using a simple non-parametric regression model, which is motivated by a clustering hypothesis, we provide an analysis of the estimation error incurred by network Lasso. This analysis reveals conditions on the network structure and the available training data which guarantee network Lasso to be accurate. Remarkably, the accuracy of network Lasso is related to the existence of suciently large network flows over the empirical graph. Thus, our analysis reveals a connection between network Lasso and maximum network flow problems.'
volume: 89
URL: http://proceedings.mlr.press/v89/jung19a.html
PDF: http://proceedings.mlr.press/v89/jung19a/jung19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-jung19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Jung
given: Alexander
- family: Vesselinova
given: Natalia
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 380-387
id: jung19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 380
lastpage: 387
published: 2019-04-11 00:00:00 +0000
- title: 'Learning Mixtures of Smooth Product Distributions: Identifiability and Algorithm'
abstract: 'We study the problem of learning a mixture model of non-parametric product distributions. The problem of learning a mixture model is that of finding the component distributions along with the mixing weights using observed samples generated from the mixture. The problem is well-studied in the parametric setting, i.e., when the component distributions are members of a parametric family - such as Gaussian distributions. In this work, we focus on multivariate mixtures of non-parametric product distributions and propose a two-stage approach which recovers the component distributions of the mixture under a smoothness condition. Our approach builds upon the identifiability properties of the canonical polyadic (low-rank) decomposition of tensors, in tandem with Fourier and Shannon-Nyquist sampling staples from signal processing. We demonstrate the effectiveness of the approach on synthetic and real datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/kargas19a.html
PDF: http://proceedings.mlr.press/v89/kargas19a/kargas19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kargas19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kargas
given: Nikos
- family: Sidiropoulos
given: Nicholas D.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 388-396
id: kargas19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 388
lastpage: 396
published: 2019-04-11 00:00:00 +0000
- title: 'Robust Matrix Completion from Quantized Observations'
abstract: '1-bit matrix completion refers to the problem of recovering a real-valued low-rank matrix from a small fraction of its sign patterns. In many real-world applications, however, the observations are not only highly quantized, but also grossly corrupted. In this work, we consider the noisy statistical model where each observed entry can be flipped with some probability after quantization. We propose a simple maximum likelihood estimator which is shown to be robust to the sign flipping noise. In particular, we prove an upper bound on the statistical error, showing that with overwhelming probability $n = O(poly(1-2E[\tau])^{-2} rd \log d)$ samples are sufficient for accurate recovery, where $r$ and $d$ are the rank and dimension of the underlying matrix respectively, and tau in $[0, 1/2)$ is a random variable that parameterizes the sign flipping noise. Furthermore, a lower bound is established showing that the obtained sample complexity is near-optimal for prevalent statistical models. Finally, we substantiate our theoretical findings with a comprehensive study on synthetic and realistic data sets, and demonstrate the state-of-the-art performance.'
volume: 89
URL: http://proceedings.mlr.press/v89/shen19a.html
PDF: http://proceedings.mlr.press/v89/shen19a/shen19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-shen19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Shen
given: Jie
- family: Awasthi
given: Pranjal
- family: Li
given: Ping
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 397-407
id: shen19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 397
lastpage: 407
published: 2019-04-11 00:00:00 +0000
- title: 'Foundations of Sequence-to-Sequence Modeling for Time Series'
abstract: 'The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practitioners choosing between different modeling methodologies.'
volume: 89
URL: http://proceedings.mlr.press/v89/mariet19a.html
PDF: http://proceedings.mlr.press/v89/mariet19a/mariet19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-mariet19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Mariet
given: Zelda
- family: Kuznetsov
given: Vitaly
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 408-417
id: mariet19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 408
lastpage: 417
published: 2019-04-11 00:00:00 +0000
- title: 'Nearly Optimal Adaptive Procedure with Change Detection for Piecewise-Stationary Bandit'
abstract: 'Multi-armed bandit (MAB) is a class of online learning problems where a learning agent aims to maximize its expected cumulative reward while repeatedly selecting to pull arms with unknown reward distributions. We consider a scenario where the reward distributions may change in a piecewise-stationary fashion at unknown time steps. We show that by incorporating a simple change-detection component with classic UCB algorithms to detect and adapt to changes, our so-called M-UCB algorithm can achieve nearly optimal regret bound on the order of $O(\sqrt{MKT\log T})$, where $T$ is the number of time steps, $K$ is the number of arms, and $M$ is the number of stationary segments. Comparison with the best available lower bound shows that our M-UCB is nearly optimal in $T$ up to a logarithmic factor. We also compare M-UCB with the state-of-the-art algorithms in numerical experiments using a public Yahoo! dataset and a real-world digital marketing dataset to demonstrate its superior performance.'
volume: 89
URL: http://proceedings.mlr.press/v89/cao19a.html
PDF: http://proceedings.mlr.press/v89/cao19a/cao19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-cao19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Cao
given: Yang
- family: Wen
given: Zheng
- family: Kveton
given: Branislav
- family: Xie
given: Yao
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 418-427
id: cao19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 418
lastpage: 427
published: 2019-04-11 00:00:00 +0000
- title: 'An Optimal Algorithm for Stochastic Three-Composite Optimization'
abstract: 'We develop an optimal primal-dual first-order algorithm for a class of stochastic three-composite convex minimization problems. The convergence rate of our method not only improves upon the existing methods, but also matches a lower bound derived for all first-order methods that solve this problem. We extend our proposed algorithm to solve a composite stochastic program with any finite number of nonsmooth functions. In addition, we generalize an optimal stochastic alternating direction method of multipliers (SADMM) algorithm proposed for the two-composite case to solve this problem, and establish its connection to our optimal primal-dual algorithm. We perform extensive numerical experiments on a variety of machine learning applications to demonstrate the superiority of our method via-a-vis the state-of-the-art.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhao19a.html
PDF: http://proceedings.mlr.press/v89/zhao19a/zhao19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhao19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhao
given: Renbo
- family: Haskell
given: William B.
- family: Tan
given: Vincent Y. F.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 428-437
id: zhao19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 428
lastpage: 437
published: 2019-04-11 00:00:00 +0000
- title: 'A Thompson Sampling Algorithm for Cascading Bandits'
abstract: 'We design and analyze TS-Cascade, a Thompson sampling algorithm for the cascading bandit problem. In TS-Cascade, Bayesian estimates of the click probability are constructed using a univariate Gaussian; this leads to a more efficient exploration procedure vis-ã-vis existing UCB-based approaches. We also incorporate the empirical variance of each item’s click probability into the Bayesian updates. These two novel features allow us to prove an expected regret bound of the form $\tilde{O}(\sqrt{KLT})$ where $L$ and $K$ are the number of ground items and the number of items in the chosen list respectively and $T\ge L$ is the number of Thompson sampling update steps. This matches the state-of-the-art regret bounds for UCB-based algorithms. More importantly, it is the first theoretical guarantee on a Thompson sampling algorithm for any stochastic combinatorial bandit problem model with partial feedback. Empirical experiments demonstrate superiority of TS-Cascade compared to existing UCB-based procedures in terms of the expected cumulative regret and the time complexity.'
volume: 89
URL: http://proceedings.mlr.press/v89/cheung19a.html
PDF: http://proceedings.mlr.press/v89/cheung19a/cheung19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-cheung19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Cheung
given: Wang Chi
- family: Tan
given: Vincent
- family: Zhong
given: Zixin
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 438-447
id: cheung19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 438
lastpage: 447
published: 2019-04-11 00:00:00 +0000
- title: 'Lifelong Optimization with Low Regret'
abstract: 'In this work, we study a problem arising from two lines of works: online optimization and lifelong learning. In the problem, there is a sequence of tasks arriving sequentially, and within each task, we have to make decisions one after one and then suffer corresponding losses. The tasks are related as they share some common representation, but they are different as each requires a different predictor on top of the representation. As learning a representation is usually costly in lifelong learning scenarios, the goal is to learn it continuously through time across different tasks, making the learning of later tasks easier than previous ones. We provide such learning algorithms with good regret bounds which can be seen as natural generalization of prior works on online optimization.'
volume: 89
URL: http://proceedings.mlr.press/v89/wu19a.html
PDF: http://proceedings.mlr.press/v89/wu19a/wu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-wu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Wu
given: Yi-Shan
- family: Wang
given: Po-An
- family: Lu
given: Chi-Jen
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 448-456
id: wu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 448
lastpage: 456
published: 2019-04-11 00:00:00 +0000
- title: 'Sparse Multivariate Bernoulli Processes in High Dimensions'
abstract: 'We consider the problem of estimating the parameters of a multivariate Bernoulli process with auto-regressive feedback in the high-dimensional setting where the number of samples available is much less than the number of parameters. This problem arises in learning interconnections of networks of dynamical systems with spiking or binary valued data. We also allow the process to depend on its past up to a lag p, for a general $p \geq 1$, allowing for more realistic modeling in many applications. We propose and analyze an $\ell_1$-regularized maximum likelihood (ML) estimator under the assumption that the parameter tensor is approximately sparse. Rigorous analysis of such estimators is made challenging by the dependent and non-Gaussian nature of the process as well as the presence of the nonlinearities and multi-level feedback. We derive precise upper bounds on the mean-squared estimation error in terms of the number of samples, dimensions of the process, the lag $p$ and other key statistical properties of the model. The ideas presented can be used in the rigorous high-dimensional analysis of regularized $M$-estimators for other sparse nonlinear and non-Gaussian processes with long-range dependence.'
volume: 89
URL: http://proceedings.mlr.press/v89/pandit19a.html
PDF: http://proceedings.mlr.press/v89/pandit19a/pandit19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-pandit19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Pandit
given: Parthe
- family: Sahraee-Ardakan
given: Mojtaba
- family: Amini
given: Arash
- family: Rangan
given: Sundeep
- family: Fletcher
given: Alyson K.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 457-466
id: pandit19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 457
lastpage: 466
published: 2019-04-11 00:00:00 +0000
- title: 'An Optimal Algorithm for Stochastic and Adversarial Bandits'
abstract: 'We derive an algorithm that achieves the optimal (up to constants) pseudo-regret in both adversarial and stochastic multi-armed bandits without prior knowledge of the regime and time horizon. The algorithm is based on online mirror descent with Tsallis entropy regularizer. We provide a complete characterization of such algorithms and show that Tsallis entropy with power $\alpha = 1/2$ achieves the goal. In addition, the proposed algorithm enjoys improved regret guarantees in two intermediate regimes: the moderately contaminated stochastic regime defined by Seldin and Slivkins [22] and the stochastically constrained adversary studied by Wei and Luo [26]. The algorithm also obtains adversarial and stochastic optimality in the utility-based dueling bandit setting. We provide empirical evaluation of the algorithm demonstrating that it outperforms Ucb1 and Exp3 in stochastic environments. In certain adversarial regimes the algorithm significantly outperforms Ucb1 and Thompson Sampling, which exhibit close to linear regret.'
volume: 89
URL: http://proceedings.mlr.press/v89/zimmert19a.html
PDF: http://proceedings.mlr.press/v89/zimmert19a/zimmert19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zimmert19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zimmert
given: Julian
- family: Seldin
given: Yevgeny
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 467-475
id: zimmert19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 467
lastpage: 475
published: 2019-04-11 00:00:00 +0000
- title: 'Efficient Bayesian Experimental Design for Implicit Models'
abstract: 'Bayesian experimental design involves the optimal allocation of resources in an experiment, with the aim of optimising cost and performance. For implicit models, where the likelihood is intractable but sampling from the model is possible, this task is particularly difficult and therefore largely unexplored. This is mainly due to technical difficulties associated with approximating posterior distributions and utility functions. We devise a novel experimental design framework for implicit models that improves upon previous work in two ways. First, we use the mutual information between parameters and data as the utility function, which has previously not been feasible. We achieve this by utilising Likelihood-Free Inference by Ratio Estimation (LFIRE) to approximate posterior distributions, instead of the traditional approximate Bayesian computation or synthetic likelihood methods. Secondly, we use Bayesian optimisation in order to solve the optimal design problem, as opposed to the typically used grid search or sampling-based methods. We find that this increases efficiency and allows us to consider higher design dimensions.'
volume: 89
URL: http://proceedings.mlr.press/v89/kleinegesse19a.html
PDF: http://proceedings.mlr.press/v89/kleinegesse19a/kleinegesse19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kleinegesse19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kleinegesse
given: Steven
- family: Gutmann
given: Michael U.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 476-485
id: kleinegesse19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 476
lastpage: 485
published: 2019-04-11 00:00:00 +0000
- title: 'Local Saddle Point Optimization: A Curvature Exploitation Approach'
abstract: 'Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems. Here, we highlight a systemic issue of gradient dynamics that arise for saddle point problems, namely the presence of undesired stable stationary points that are no local optima. We propose a novel optimization approach that exploits curvature information in order to escape from these undesired stationary points. We prove that different optimization methods, including gradient method and Adagrad, equipped with curvature exploitation can escape non-optimal stationary points. We also provide empirical results on common saddle point problems which confirm the advantage of using curvature exploitation.'
volume: 89
URL: http://proceedings.mlr.press/v89/adolphs19a.html
PDF: http://proceedings.mlr.press/v89/adolphs19a/adolphs19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-adolphs19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Adolphs
given: Leonard
- family: Daneshmand
given: Hadi
- family: Lucchi
given: Aurelien
- family: Hofmann
given: Thomas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 486-495
id: adolphs19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 486
lastpage: 495
published: 2019-04-11 00:00:00 +0000
- title: 'Testing Conditional Independence on Discrete Data using Stochastic Complexity'
abstract: 'Testing for conditional independence is a core aspect of constraint-based causal discovery. Although commonly used tests are perfect in theory, they often fail to reject independence in practice—especially when conditioning on multiple variables. We focus on discrete data and propose a new test based on the notion of algorithmic independence that we instantiate using stochastic complexity. Amongst others, we show that our proposed test, SCI, is an asymptotically unbiased as well as L2 consistent estimator for conditional mutual information (CMI). Further, we show that SCI can be reformulated to find a sensible threshold for CMI that works well on limited samples. Empirical evaluation shows that SCI has a lower type II error than commonly used tests. As a result, we obtain a higher recall when we use SCI in causal discovery algorithms, without compromising the precision.'
volume: 89
URL: http://proceedings.mlr.press/v89/marx19a.html
PDF: http://proceedings.mlr.press/v89/marx19a/marx19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-marx19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Marx
given: Alexander
- family: Vreeken
given: Jilles
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 496-505
id: marx19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 496
lastpage: 505
published: 2019-04-11 00:00:00 +0000
- title: 'Distributionally Robust Submodular Maximization'
abstract: 'Submodular functions have applications throughout machine learning, but in many settings, we do not have direct access to the underlying function f. We focus on stochastic functions that are given as an expectation of functions over a distribution P. In practice, we often have only a limited set of samples f_i from P. The standard approach indirectly optimizes f by maximizing the sum of f_i. However, this ignores generalization to the true (unknown) distribution. In this paper, we achieve better performance on the actual underlying function f by directly optimizing a combination of bias and variance. Algorithmically, we accomplish this by showing how to carry out distributionally robust optimization (DRO) for submodular functions, providing efficient algorithms backed by theoretical guarantees which leverage several novel contributions to the general theory of DRO. We also show compelling empirical evidence that DRO improves generalization to the unknown stochastic submodular function.'
volume: 89
URL: http://proceedings.mlr.press/v89/staib19a.html
PDF: http://proceedings.mlr.press/v89/staib19a/staib19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-staib19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Staib
given: Matthew
- family: Wilder
given: Bryan
- family: Jegelka
given: Stefanie
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 506-516
id: staib19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 506
lastpage: 516
published: 2019-04-11 00:00:00 +0000
- title: 'A Robust Zero-Sum Game Framework for Pool-based Active Learning'
abstract: 'In this paper, we present a novel robust zero- sum game framework for pool-based active learning grounded on advanced statistical learning theory. Pool-based active learning usually consists of two components, namely, learning of a classifier given labeled data and querying of unlabeled data for labeling. Most previous studies on active learning consider these as two separate tasks and propose various heuristics for selecting important unlabeled data for labeling, which may render the selection of unlabeled examples sub-optimal for minimizing the classification error. In contrast, the present work formulates active learning as a unified optimization framework for learning the classifier, i.e., the querying of labels and the learning of models are unified to minimize a common objective for statistical learning. In addition, the proposed method avoids the issues of many previous algorithms such as inefficiency, sampling bias and sensitivity to imbalanced data distribution. Besides theoretical analysis, we conduct extensive experiments on benchmark datasets and demonstrate the superior performance of the proposed active learning method compared with the state-of-the-art methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhu19a.html
PDF: http://proceedings.mlr.press/v89/zhu19a/zhu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhu
given: Dixian
- family: Li
given: Zhe
- family: Wang
given: Xiaoyu
- family: Gong
given: Boqing
- family: Yang
given: Tianbao
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 517-526
id: zhu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 517
lastpage: 526
published: 2019-04-11 00:00:00 +0000
- title: 'Support and Invertibility in Domain-Invariant Representations'
abstract: 'Learning domain-invariant representations has become a popular approach to unsupervised domain adaptation and is often justified by invoking a particular suite of theoretical results. We argue that there are two significant flaws in such arguments. First, the results in question hold only for a fixed representation and do not account for information lost in non-invertible transformations. Second, domain invariance is often a far too strict requirement and does not always lead to consistent estimation, even under strong and favorable assumptions. In this work, we give generalization bounds for unsupervised domain adaptation that hold for any representation function by acknowledging the cost of non-invertibility. In addition, we show that penalizing distance between densities is often wasteful and propose a bound based on measuring the extent to which the support of the source domain covers the target domain. We perform experiments on well-known benchmarks that illustrate the short-comings of current standard practice.'
volume: 89
URL: http://proceedings.mlr.press/v89/johansson19a.html
PDF: http://proceedings.mlr.press/v89/johansson19a/johansson19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-johansson19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Johansson
given: Fredrik D.
- family: Sontag
given: David
- family: Ranganath
given: Rajesh
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 527-536
id: johansson19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 527
lastpage: 536
published: 2019-04-11 00:00:00 +0000
- title: 'Efficient Inference in Multi-task Cox Process Models'
abstract: 'We generalize the log Gaussian Cox process (LGCP) framework to model multiple correlated point data jointly. The observations are treated as realizations of multiple LGCPs, whose log intensities are given by linear combinations of latent functions drawn from Gaussian process priors. The combination coefficients are also drawn from Gaussian processes and can incorporate additional dependencies. We derive closed-form expressions for the moments of the intensity functions and develop an efficient variational inference algorithm that is orders of magnitude faster than competing deterministic and stochastic approximations of multivariate LGCPs, coregionalization models, and multi-task permanental processes. Our approach outperforms these benchmarks in multiple problems, offering the current state of the art in modeling multivariate point processes.'
volume: 89
URL: http://proceedings.mlr.press/v89/aglietti19a.html
PDF: http://proceedings.mlr.press/v89/aglietti19a/aglietti19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-aglietti19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Aglietti
given: Virginia
- family: Damoulas
given: Theodoros
- family: Bonilla
given: Edwin V.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 537-546
id: aglietti19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 537
lastpage: 546
published: 2019-04-11 00:00:00 +0000
- title: 'Optimization of Inf-Convolution Regularized Nonconvex Composite Problems'
abstract: 'In this work, we consider nonconvex composite problems that involve inf-convolution with a Legendre function, which gives rise to an anisotropic generalization of the proximal mapping and Moreau-envelope. In a convex setting such problems can be solved via alternating minimization of a splitting formulation, where the consensus constraint is penalized with a Legendre function. In contrast, for nonconvex models it is in general unclear that this approach yields stationary points to the infimal convolution problem. To this end we analytically investigate local regularity properties of the Moreau-envelope function under prox-regularity, which allows us to establish the equivalence between stationary points of the splitting model and the original inf-convolution model. We apply our theory to characterize stationary points of the penalty objective, which is minimized by the elastic averaging SGD (EASGD) method for distributed training, showing that perfect consensus between the workers is attainable via a finite penalty parameter. Numerically, we demonstrate the practical relevance of the proposed approach on the important task of distributed training of deep neural networks.'
volume: 89
URL: http://proceedings.mlr.press/v89/laude19a.html
PDF: http://proceedings.mlr.press/v89/laude19a/laude19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-laude19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Laude
given: Emanuel
- family: Wu
given: Tao
- family: Cremers
given: Daniel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 547-556
id: laude19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 547
lastpage: 556
published: 2019-04-11 00:00:00 +0000
- title: 'On Connecting Stochastic Gradient MCMC and Differential Privacy'
abstract: 'Concerns related to data security and confidentiality have been raised when applying machine learning to real-world applications. Differential privacy provides a principled and rigorous privacy guarantee for machine learning models. While it is common to inject noise to design a model satisfying a required differential-privacy property, it is generally hard to balance the trade-off between privacy and utility. We show that stochastic gradient Markov chain Monte Carlo (SG-MCMC) – a class of scalable Bayesian posterior sampling algorithms – satisfies strong differential privacy, when carefully chosen stepsizes are employed. We develop theory on the performance of the proposed differentially-private SG-MCMC method. We conduct experiments to support our analysis, and show that a standard SG-MCMC sampler with minor modification can reach state-of-the-art performance in terms of both privacy and utility on Bayesian learning.'
volume: 89
URL: http://proceedings.mlr.press/v89/li19a.html
PDF: http://proceedings.mlr.press/v89/li19a/li19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-li19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Li
given: Bai
- family: Chen
given: Changyou
- family: Liu
given: Hao
- family: Carin
given: Lawrence
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 557-566
id: li19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 557
lastpage: 566
published: 2019-04-11 00:00:00 +0000
- title: 'What made you do this? Understanding black-box decisions with sufficient input subsets'
abstract: 'Local explanation frameworks aim to rationalize particular decisions made by a black-box prediction model. Existing techniques are often restricted to a specific type of predictor or based on input saliency, which may be undesirably sensitive to factors unrelated to the model’s decision making process. We instead propose sufficient input subsets that identify minimal subsets of features whose observed values alone suffice for the same decision to be reached, even if all other input feature values are missing. General principles that globally govern a model’s decision-making can also be revealed by searching for clusters of such input patterns across many data points. Our approach is conceptually straightforward, entirely model-agnostic, simply implemented using instance-wise backward selection, and able to produce more concise rationales than existing techniques. We demonstrate the utility of our interpretation method on various neural network models trained on text, image, and genomic data.'
volume: 89
URL: http://proceedings.mlr.press/v89/carter19a.html
PDF: http://proceedings.mlr.press/v89/carter19a/carter19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-carter19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Carter
given: Brandon
- family: Mueller
given: Jonas
- family: Jain
given: Siddhartha
- family: Gifford
given: David
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 567-576
id: carter19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 567
lastpage: 576
published: 2019-04-11 00:00:00 +0000
- title: 'Computation Efficient Coded Linear Transform'
abstract: 'In large-scale distributed linear transform problems, coded computation plays an important role to reduce the delay caused by slow machines. However, existing coded schemes could end up destroying the significant sparsity that exists in large-scale machine learning problems, and in turn increase the computational delay. In this paper, we propose a coded computation strategy, referred to as diagonal code, that achieves the optimum recovery threshold and the optimum computation load. Furthermore, by leveraging the ideas from random proposal graph theory, we design a random code that achieves a constant computation load, which significantly outperforms the existing best known result. We apply our schemes to the distributed gradient descent problem and demonstrate the advantage of the approach over current fastest coded schemes.'
volume: 89
URL: http://proceedings.mlr.press/v89/wang19a.html
PDF: http://proceedings.mlr.press/v89/wang19a/wang19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-wang19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Wang
given: Sinong
- family: Liu
given: Jiashang
- family: Shroff
given: Ness
- family: Yang
given: Pengyu
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 577-585
id: wang19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 577
lastpage: 585
published: 2019-04-11 00:00:00 +0000
- title: 'Mixing of Hamiltonian Monte Carlo on strongly log-concave distributions 2: Numerical integrators'
abstract: 'We obtain quantitative bounds on the mixing properties of the Hamiltonian Monte Carlo (HMC) algorithm with target distribution in d-dimensional Euclidean space, showing that HMC mixes quickly whenever the target log-distribution is strongly concave and has Lipschitz gradients. We use a coupling argument to show that the popular leapfrog implementation of HMC can sample approximately from the target distribution in a number of gradient evaluations which grows like d^1/2 with the dimension and grows at most polynomially in the strong convexity and Lipschitz-gradient constants. Our results significantly extend and improve on the dimension dependence of previous quantitative bounds on the mixing of HMC and of the unadjusted Langevin algorithm in this setting.'
volume: 89
URL: http://proceedings.mlr.press/v89/mangoubi19a.html
PDF: http://proceedings.mlr.press/v89/mangoubi19a/mangoubi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-mangoubi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Mangoubi
given: Oren
- family: Smith
given: Aaron
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 586-595
id: mangoubi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 586
lastpage: 595
published: 2019-04-11 00:00:00 +0000
- title: 'Temporal Quilting for Survival Analysis'
abstract: 'The importance of survival analysis in many disciplines (especially in medicine) has led to the development of a variety of approaches to modeling the survival function. Models constructed via various approaches offer different strengths and weaknesses in terms of discriminative performance and calibration, but no one model is best across all datasets or even across all time horizons within a single dataset. Because we require both good calibration and good discriminative performance over different time horizons, conventional model selection and ensemble approaches are not applicable. This paper develops a novel approach that combines the collective intelligence of different underlying survival models to produce a valid survival function that is well-calibrated and offers superior discriminative performance at different time horizons. Empirical results show that our approach provides significant gains over the benchmarks on a variety of real-world datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/lee19a.html
PDF: http://proceedings.mlr.press/v89/lee19a/lee19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lee19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lee
given: Changhee
- family: Zame
given: William
- family: Alaa
given: Ahmed
- family: Schaar
given: Mihaela
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 596-605
id: lee19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 596
lastpage: 605
published: 2019-04-11 00:00:00 +0000
- title: 'Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms'
abstract: 'This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function. We analyze their properties in depth, showing that they unify many well-known loss functions and allow to create useful new ones easily. Fenchel-Young losses constructed from a generalized entropy, including the Shannon and Tsallis entropies, induce predictive probability distributions. We formulate conditions for a generalized entropy to yield losses with a separation margin, and probability distributions with sparse support. Finally, we derive efficient algorithms, making Fenchel-Young losses appealing both in theory and practice.'
volume: 89
URL: http://proceedings.mlr.press/v89/blondel19a.html
PDF: http://proceedings.mlr.press/v89/blondel19a/blondel19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-blondel19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Blondel
given: Mathieu
- family: Martins
given: Andre
- family: Niculae
given: Vlad
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 606-615
id: blondel19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 606
lastpage: 615
published: 2019-04-11 00:00:00 +0000
- title: 'On Target Shift in Adversarial Domain Adaptation'
abstract: 'Discrepancy between training and testing domains is a fundamental problem in the generalization of machine learning techniques. Recently, several approaches have been proposed to learn domain invariant feature representations through adversarial deep learning. However, label shift, where the percentage of data in each class is different between domains, has received less attention. Label shift naturally arises in many contexts, especially in behavioral studies where the behaviors are freely chosen. In this work, we propose a method called Domain Adversarial nets for Target Shift (DATS) to address label shift while learning a domain invariant representation. This is accomplished by using distribution matching to estimate label proportions in a blind test set. We extend this framework to handle multiple domains by developing a scheme to upweight source domains most similar to the target domain. Empirical results show that this framework performs well under large label shift in synthetic and real experiments, demonstrating the practical importance.'
volume: 89
URL: http://proceedings.mlr.press/v89/li19b.html
PDF: http://proceedings.mlr.press/v89/li19b/li19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-li19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Li
given: Yitong
- family: Murias
given: Michael
- family: Major
given: Samantha
- family: Dawson
given: Geraldine
- family: Carlson
given: David
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 616-625
id: li19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 616
lastpage: 625
published: 2019-04-11 00:00:00 +0000
- title: 'Optimal Testing in the Experiment-rich Regime'
abstract: 'Motivated by the widespread adoption of large-scale A/B testing in industry, we propose a new experimentation framework for the setting where potential experiments are abundant (i.e., many hypotheses are available to test), and observations are costly; we refer to this as the experiment-rich regime. Such scenarios require the experimenter to internalize the opportunity cost of assigning a sample to a particular experiment. We fully characterize the optimal policy and give an algorithm to compute it. Furthermore, we develop a simple heuristic that also provides intuition for the optimal policy. We use simulations based on real data to compare both the optimal algorithm and the heuristic to other natural alternative experimental design frameworks. In particular, we discuss the paradox of power: high-powered "classical" tests can lead to highly inefficient sampling in the experiment-rich regime.'
volume: 89
URL: http://proceedings.mlr.press/v89/schmit19a.html
PDF: http://proceedings.mlr.press/v89/schmit19a/schmit19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-schmit19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Schmit
given: Sven
- family: Shah
given: Virag
- family: Johari
given: Ramesh
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 626-633
id: schmit19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 626
lastpage: 633
published: 2019-04-11 00:00:00 +0000
- title: 'Reversible Jump Probabilistic Programming'
abstract: 'In this paper we present a method for automatically deriving a Reversible Jump Markov chain Monte Carlo sampler from probabilistic programs that specify the target and proposal distributions. The main challenge in automatically deriving such an inference procedure, in comparison to deriving a generic Metropolis-Hastings sampler, is in calculating the Jacobian adjustment to the proposal acceptance ratio. To achieve this, our approach relies on the interaction of several different components, including automatic differentiation, transformation inversion, and optimised code generation. We also present Stochaskell, a new probabilistic programming language embedded in Haskell, which provides an implementation of our method.'
volume: 89
URL: http://proceedings.mlr.press/v89/roberts19a.html
PDF: http://proceedings.mlr.press/v89/roberts19a/roberts19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-roberts19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Roberts
given: David A.
- family: Gallagher
given: Marcus
- family: Taimre
given: Thomas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 634-643
id: roberts19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 634
lastpage: 643
published: 2019-04-11 00:00:00 +0000
- title: 'Graph Embedding with Shifted Inner Product Similarity and Its Improved Approximation Capability'
abstract: 'We propose shifted inner-product similarity (SIPS), which is a novel yet very simple extension of the ordinary inner-product similarity (IPS) for neural-network based graph embedding (GE). In contrast to IPS, that is limited to approximating positive-definite (PD) similarities, SIPS goes beyond the limitation by introducing bias terms in IPS; we theoretically prove that SIPS is capable of approximating not only PD but also conditionally PD (CPD) similarities with many examples such as cosine similarity, negative Poincare distance and negative Wasserstein distance. Since SIPS with sufficiently large neural networks learns a variety of similarities, SIPS alleviates the need for configuring the similarity function of GE. Approximation error rate is also evaluated, and experiments on two real-world datasets demonstrate that graph embedding using SIPS indeed outperforms existing methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/okuno19a.html
PDF: http://proceedings.mlr.press/v89/okuno19a/okuno19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-okuno19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Okuno
given: Akifumi
- family: Kim
given: Geewook
- family: Shimodaira
given: Hidetoshi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 644-653
id: okuno19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 644
lastpage: 653
published: 2019-04-11 00:00:00 +0000
- title: 'High-dimensional Mixed Graphical Model with Ordinal Data: Parameter Estimation and Statistical Inference'
abstract: 'We consider parameter estimation and statistical inference of high-dimensional undirected graphical models for mixed data comprising both ordinal and continuous variables. We propose a flexible model called Latent Mixed Gaussian Copula Model that simultaneously deals with such mixed data by assuming that the observed ordinal variables are generated by latent variables. For parameter estimation, we introduce a convenient rank-based ensemble approach to estimate the latent correlation matrix, which can be subsequently applied to recover the latent graph structure. In addition, based on the ensemble estimator, we develop test statistics via a pseudo-likelihood approach to quantify the uncertainty associated with the low dimensional components of high-dimensional parameters. Our theoretical analysis shows the consistency of the estimator and asymptotic normality of the test statistic. Experiments on simulated and real gene expression data are conducted to validate our approach.'
volume: 89
URL: http://proceedings.mlr.press/v89/feng19a.html
PDF: http://proceedings.mlr.press/v89/feng19a/feng19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-feng19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Feng
given: Huijie
- family: Ning
given: Yang
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 654-663
id: feng19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 654
lastpage: 663
published: 2019-04-11 00:00:00 +0000
- title: 'Robust Graph Embedding with Noisy Link Weights'
abstract: 'We propose $\beta$-graph embedding for robustly learning feature vectors from data vectors and noisy link weights. A newly introduced empirical moment $\beta$-score reduces the influence of contamination and robustly measures the difference between the underlying correct expected weights of links and the specified generative model. The proposed method is computationally tractable; we employ a minibatch-based efficient stochastic algorithm and prove that this algorithm locally minimizes the empirical moment $\beta$-score. We conduct numerical experiments on synthetic and real-world datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/okuno19b.html
PDF: http://proceedings.mlr.press/v89/okuno19b/okuno19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-okuno19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Okuno
given: Akifumi
- family: Shimodaira
given: Hidetoshi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 664-673
id: okuno19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 664
lastpage: 673
published: 2019-04-11 00:00:00 +0000
- title: 'Exploring Fast and Communication-Efficient Algorithms in Large-Scale Distributed Networks'
abstract: 'The communication overhead has become a significant bottleneck in data-parallel network with the increasing of model size and data samples. In this work, we propose a new algorithm LPC-SVRG with quantized gradients and its acceleration ALPC-SVRG to effectively reduce the communication complexity while maintaining the same convergence as the unquantized algorithms. Specifically, we formulate the heuristic gradient clipping technique within the quantization scheme and show that unbiased quantization methods in related works [3, 33, 38] are special cases of ours. We introduce double sampling in the accelerated algorithm ALPC-SVRG to fully combine the gradients of full-precision and low-precision, and then achieve acceleration with fewer communication overhead. Our analysis focuses on the nonsmooth composite problem, which makes our algorithms more general. The experiments on linear models and deep neural networks validate the effectiveness of our algorithms.'
volume: 89
URL: http://proceedings.mlr.press/v89/yu19a.html
PDF: http://proceedings.mlr.press/v89/yu19a/yu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-yu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Yu
given: Yue
- family: Wu
given: Jiaxiang
- family: Huang
given: Junzhou
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 674-683
id: yu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 674
lastpage: 683
published: 2019-04-11 00:00:00 +0000
- title: 'Defending against Whitebox Adversarial Attacks via Randomized Discretization'
abstract: 'Adversarial perturbations dramatically decrease the accuracy of state-of-the-art image classifiers. In this paper, we propose and analyze a simple and computationally efficient defense strategy: inject random Gaussian noise, discretize each pixel, and then feed the result into any pre-trained classifier. Theoretically, we show that our randomized discretization strategy reduces the KL divergence between original and adversarial inputs, leading to a lower bound on the classification accuracy of any classifier against any (potentially whitebox) $L_{\infty}$-bounded adversarial attack. Empirically, we evaluate our defense on adversarial examples generated by a strong iterative PGD attack. On ImageNet, our defense is more robust than adversarially-trained networks and the winning defenses of the NIPS 2017 Adversarial Attacks & Defenses competition.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhang19b.html
PDF: http://proceedings.mlr.press/v89/zhang19b/zhang19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhang19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhang
given: Yuchen
- family: Liang
given: Percy
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 684-693
id: zhang19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 684
lastpage: 693
published: 2019-04-11 00:00:00 +0000
- title: 'Fisher Information and Natural Gradient Learning in Random Deep Networks'
abstract: 'The parameter space of a deep neural network is a Riemannian manifold, where the metric is defined by the Fisher information matrix. The natural gradient method uses the steepest descent direction in a Riemannian manifold, but it requires inversion of the Fisher matrix, however, which is practically difficult. The present paper uses statistical neurodynamical method to reveal the properties of the Fisher information matrix in a net of random connections. We prove that the Fisher information matrix is unit-wise block diagonal supplemented by small order terms of off-block-diagonal elements. We further prove that the Fisher information matrix of a single unit has a simple reduced form, a sum of a diagonal matrix and a rank 2 matrix of weight-bias correlations. We obtain the inverse of Fisher information explicitly. We then have an explicit form of the approximate natural gradient, without relying on the matrix inversion.'
volume: 89
URL: http://proceedings.mlr.press/v89/amari19a.html
PDF: http://proceedings.mlr.press/v89/amari19a/amari19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-amari19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Amari
given: Shun-ichi
- family: Karakida
given: Ryo
- family: Oizumi
given: Masafumi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 694-702
id: amari19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 694
lastpage: 702
published: 2019-04-11 00:00:00 +0000
- title: 'Robust descent using smoothed multiplicative noise'
abstract: 'In this work, we propose a novel robust gradient descent procedure which makes use of a smoothed multiplicative noise applied directly to observations before constructing a sum of soft-truncated gradient coordinates. We show that the procedure has competitive theoretical guarantees, with the major advantage of a simple implementation that does not require an iterative sub-routine for robustification. Empirical tests reinforce the theory, showing more efficient generalization over a much wider class of data distributions.'
volume: 89
URL: http://proceedings.mlr.press/v89/holland19a.html
PDF: http://proceedings.mlr.press/v89/holland19a/holland19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-holland19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Holland
given: Matthew J.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 703-711
id: holland19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 703
lastpage: 711
published: 2019-04-11 00:00:00 +0000
- title: 'Classification using margin pursuit'
abstract: 'In this work, we study a new approach to optimizing the margin distribution realized by binary classifiers, in which the learner searches the hypothesis space in such a way that a pre-set margin level ends up being a distribution-robust estimator of the margin location. This procedure is easily implemented using gradient descent, and admits finite-sample bounds on the excess risk under unbounded inputs, yielding competitive rates under mild assumptions. Empirical tests on real-world benchmark data reinforce the basic principles highlighted by the theory.'
volume: 89
URL: http://proceedings.mlr.press/v89/holland19b.html
PDF: http://proceedings.mlr.press/v89/holland19b/holland19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-holland19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Holland
given: Matthew J.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 712-720
id: holland19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 712
lastpage: 720
published: 2019-04-11 00:00:00 +0000
- title: 'Linear Queries Estimation with Local Differential Privacy'
abstract: 'We study the problem of estimating a set of d linear queries with respect to some unknown distribution p over a domain $[J]$ based on a sensitive data set of n individuals under the constraint of local differential privacy. This problem subsumes a wide range of estimation tasks, e.g., distribution estimation and d-dimensional mean estimation. We provide new algorithms for both the offline (non-adaptive) and adaptive versions of this problem. In the offline setting, the set of queries are fixed before the algorithm starts. In the regime where $n < d^2/\log(J)$, our algorithms attain $L_2$ estimation error that is independent of d. For the special case of distribution estimation, we show that projecting the output estimate of an algorithm due to [Acharya et al. 2018] on the probability simplex yields an $L_2$ error that depends only sub-logarithmically on $J$ in the regime where $n < J^2/\log(J)$. Our bounds are within a factor of at most $(\log(J))^{1/4}$ from the optimal $L_2$ error. These results show the possibility of accurate estimation of linear queries in the high-dimensional settings under the $L_2$ error criterion. In the adaptive setting, the queries are generated over d rounds; one query at a time. In each round, a query can be chosen adaptively based on all the history of previous queries and answers. We give an algorithm for this problem with optimal $L_{\infty}$ estimation error (worst error in the estimated values for the queries w.r.t. the data distribution). Our bound matches a lower bound on the $L_{\infty}$ error in the offline version of this problem [Duchi et al. 2013].'
volume: 89
URL: http://proceedings.mlr.press/v89/bassily19a.html
PDF: http://proceedings.mlr.press/v89/bassily19a/bassily19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-bassily19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Bassily
given: Raef
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 721-729
id: bassily19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 721
lastpage: 729
published: 2019-04-11 00:00:00 +0000
- title: 'Bayesian Learning of Neural Network Architectures'
abstract: 'In this paper we propose a Bayesian method for estimating architectural parameters of neural networks, namely layer size and network depth. We do this by learning concrete distributions over these parameters. Our results show that regular networks with a learned structure can generalise better on small datasets, while fully stochastic networks can be more robust to parameter initialisation. The proposed method relies on standard neural variational learning and, unlike randomised architecture search, does not require a retraining of the model, thus keeping the computational overhead at minimum.'
volume: 89
URL: http://proceedings.mlr.press/v89/dikov19a.html
PDF: http://proceedings.mlr.press/v89/dikov19a/dikov19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dikov19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dikov
given: Georgi
- family: Bayer
given: Justin
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 730-738
id: dikov19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 730
lastpage: 738
published: 2019-04-11 00:00:00 +0000
- title: 'Nonlinear Acceleration of Primal-Dual Algorithms'
abstract: 'We describe a convergence acceleration scheme for multi-step optimization algorithms. The extrapolated solution is written as a nonlinear average of the iterates produced by the original optimization algorithm. Our scheme does not need the underlying fixed-point operator to be symmetric, hence handles e.g. algorithms with momentum terms such as Nesterov’s accelerated method, or primal-dual methods such as Chambolle-Pock. The weights are computed via a simple linear system and we analyze performance in both online and offline modes. We use Crouzeix’s conjecture to show that acceleration is controlled by the solution of a Chebyshev problem on the numerical range of a nonsymmetric operator modelling the behavior of iterates near the optimum. Numerical experiments are detailed on image processing and logistic regression problems.'
volume: 89
URL: http://proceedings.mlr.press/v89/bollapragada19a.html
PDF: http://proceedings.mlr.press/v89/bollapragada19a/bollapragada19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-bollapragada19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Bollapragada
given: Raghu
- family: Scieur
given: Damien
- family: d’Aspremont
given: Alexandre
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 739-747
id: bollapragada19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 739
lastpage: 747
published: 2019-04-11 00:00:00 +0000
- title: 'Gaussian Process Latent Variable Alignment Learning'
abstract: 'We present a model that can automatically learn alignments between high-dimensional data in an unsupervised manner. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. Further, we automatically infer groupings of different types of sequences within the same dataset. We derive a probabilistic model built on non-parametric priors that allows for flexible warps while at the same time providing means to specify interpretable constraints. We demonstrate the efficacy of our approach with superior quantitative performance to the state-of-the-art approaches and provide examples to illustrate the versatility of our model in automatic inference of sequence groupings, absent from previous approaches, as well as easy specification of high level priors for different modalities of data.'
volume: 89
URL: http://proceedings.mlr.press/v89/kazlauskaite19a.html
PDF: http://proceedings.mlr.press/v89/kazlauskaite19a/kazlauskaite19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kazlauskaite19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kazlauskaite
given: Ieva
- family: Ek
given: Carl Henrik
- family: Campbell
given: Neill
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 748-757
id: kazlauskaite19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 748
lastpage: 757
published: 2019-04-11 00:00:00 +0000
- title: 'A Bayesian model for sparse graphs with flexible degree distribution and overlapping community structure'
abstract: 'We consider a non-projective class of inhomogeneous random graph models with interpretable parameters and a number of interesting asymptotic properties. Using the results of Bollobás et al. (2007), we show that i) the class of models is sparse and ii) depending on the choice of the parameters, the model is either scale-free, with power-law exponent greater than 2, or with an asymptotic degree distribution which is power-law with exponential cut-off. We propose an extension of the model that can accommodate an overlapping community structure. Scalable posterior inference can be performed due to the specific choice of the link probability. We present experiments on five different real world networks with up to 100,000 nodes and edges, showing that the model can provide a good fit to the degree distribution and recovers well the latent community structure.'
volume: 89
URL: http://proceedings.mlr.press/v89/lee19b.html
PDF: http://proceedings.mlr.press/v89/lee19b/lee19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lee19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lee
given: Juho
- family: James
given: Lancelot
- family: Choi
given: Seungjin
- family: Caron
given: Francois
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 758-767
id: lee19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 758
lastpage: 767
published: 2019-04-11 00:00:00 +0000
- title: 'Pseudo-Bayesian Learning with Kernel Fourier Transform as Prior'
abstract: 'We revisit Rahimi and Recht (2007)’s kernel random Fourier features (RFF) method through the lens of the PAC-Bayesian theory. While the primary goal of RFF is to approximate a kernel, we look at the Fourier transform as a prior distribution over trigonometric hypotheses. It naturally suggests learning a posterior on these hypotheses. We derive generalization bounds that are optimized by learning a pseudo-posterior obtained from a closed-form expression. Based on this study, we consider two learning strategies: The first one finds a compact landmarks-based representation of the data where each landmark is given by a distribution-tailored similarity measure, while the second one provides a PAC-Bayesian justification to the kernel alignment method of Sinha and Duchi (2016).'
volume: 89
URL: http://proceedings.mlr.press/v89/letarte19a.html
PDF: http://proceedings.mlr.press/v89/letarte19a/letarte19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-letarte19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Letarte
given: Gaël
- family: Morvant
given: Emilie
- family: Germain
given: Pascal
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 768-776
id: letarte19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 768
lastpage: 776
published: 2019-04-11 00:00:00 +0000
- title: 'Forward Amortized Inference for Likelihood-Free Variational Marginalization'
abstract: 'In this paper, we introduce a new form of amortized variational inference by using the forward KL divergence in a joint-contrastive variational loss. The resulting forward amortized variational inference is a likelihood-free method as its gradient can be sampled without bias and without requiring any evaluation of either the model joint distribution or its derivatives. We prove that our new variational loss is optimized by the exact posterior marginals in the fully factorized mean-field approximation, a property that is not shared with the more conventional reverse KL inference. Furthermore, we show that forward amortized inference can be easily marginalized over large families of latent variables in order to obtain a marginalized variational posterior. We consider two examples of variational marginalization. In our first example we train a Bayesian forecaster for predicting a simplified chaotic model of atmospheric convection. In the second example we train an amortized variational approximation of a Bayesian optimal classifier by marginalizing over the model space. The result is a powerful meta-classification network that can solve arbitrary classification problems without further training.'
volume: 89
URL: http://proceedings.mlr.press/v89/ambrogioni19a.html
PDF: http://proceedings.mlr.press/v89/ambrogioni19a/ambrogioni19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ambrogioni19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ambrogioni
given: Luca
- family: Güçlü
given: Umut
- family: Berezutskaya
given: Julia
- family: Borne
given: Eva
- family: Güçlütürk
given: Yaǧmur
- family: Hinne
given: Max
- family: Maris
given: Eric
- family: Gerven
given: Marcel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 777-786
id: ambrogioni19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 777
lastpage: 786
published: 2019-04-11 00:00:00 +0000
- title: 'SpikeCaKe: Semi-Analytic Nonparametric Bayesian Inference for Spike-Spike Neuronal Connectivity'
abstract: 'In this paper we introduce a semi-analytic variational framework for approximating the posterior of a Gaussian processes coupled through non-linear emission models. While the semi-analytic method can be applied to a large class of models, the present paper is devoted to the analysis of causal connectivity between biological spiking neurons. Estimating causal connectivity between spiking neurons from measured spike sequences is one of the main challenges of systems neuroscience. This semi-analytic method exploits the tractability of GP regression when the membrane potential is observed. The resulting posterior is then marginalized analytically in order to obtain the posterior of the response functions given the spike sequences alone. We validate our methods on both simulated data and real neuronal recordings.'
volume: 89
URL: http://proceedings.mlr.press/v89/ambrogioni19b.html
PDF: http://proceedings.mlr.press/v89/ambrogioni19b/ambrogioni19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ambrogioni19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ambrogioni
given: Luca
- family: Ebel
given: Patrick
- family: Hinne
given: Max
- family: Güçlü
given: Umut
- family: Gerven
given: Marcel
- family: Maris
given: Eric
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 787-795
id: ambrogioni19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 787
lastpage: 795
published: 2019-04-11 00:00:00 +0000
- title: 'Scalable Gaussian Process Inference with Finite-data Mean and Variance Guarantees'
abstract: 'Gaussian processes (GPs) offer a flexible class of priors for nonparametric Bayesian regression, but popular GP posterior inference methods are typically prohibitively slow or lack desirable finite-data guarantees on quality. We develop a scalable approach to approximate GP regression, with finite-data guarantees on the accuracy of our pointwise posterior mean and variance estimates. Our main contribution is a novel objective for approximate inference in the nonparametric setting: the preconditioned Fisher (pF) divergence. We show that unlike the Kullback–Leibler divergence (used in variational inference), the pF divergence bounds bounds the 2-Wasserstein distance, which in turn provides tight bounds on the pointwise error of mean and variance estimates. We demonstrate that, for sparse GP likelihood approximations, we can minimize the pF divergence bounds efficiently. Our experiments show that optimizing the pF divergence bounds has the same computational requirements as variational sparse GPs while providing comparable empirical performance—in addition to our novel finite-data quality guarantees.'
volume: 89
URL: http://proceedings.mlr.press/v89/huggins19a.html
PDF: http://proceedings.mlr.press/v89/huggins19a/huggins19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-huggins19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Huggins
given: Jonathan H.
- family: Campbell
given: Trevor
- family: Kasprzak
given: Mikolaj
- family: Broderick
given: Tamara
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 796-805
id: huggins19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 796
lastpage: 805
published: 2019-04-11 00:00:00 +0000
- title: 'Exponential convergence rates for Batch Normalization: The power of length-direction decoupling in non-convex optimization'
abstract: 'Normalization techniques such as Batch Normalization have been applied very successfully for training deep neural networks. Yet, despite its apparent empirical benefits, the reasons behind the success of Batch Normalization are mostly hypothetical. We here aim to provide a more thorough theoretical understanding from a classical optimization perspective. Our main contribution towards this goal is the identification of various problem instances in the realm of machine learning where Batch Normalization can provably accelerate optimization. We argue that this acceleration is due to the fact that Batch Normalization splits the optimization task into optimizing length and direction of the parameters separately. This allows gradient-based methods to leverage a favourable global structure in the loss landscape that we prove to exist in Learning Halfspace problems and neural network training with Gaussian inputs. We thereby turn Batch Normalization from an effective practical heuristic into a provably converging algorithm for these settings. Furthermore, we substantiate our analysis with empirical evidence that suggests the validity of our theoretical results in a broader context.'
volume: 89
URL: http://proceedings.mlr.press/v89/kohler19a.html
PDF: http://proceedings.mlr.press/v89/kohler19a/kohler19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kohler19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kohler
given: Jonas
- family: Daneshmand
given: Hadi
- family: Lucchi
given: Aurelien
- family: Hofmann
given: Thomas
- family: Zhou
given: Ming
- family: Neymeyr
given: Klaus
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 806-815
id: kohler19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 806
lastpage: 815
published: 2019-04-11 00:00:00 +0000
- title: 'A new evaluation framework for topic modeling algorithms based on synthetic corpora'
abstract: 'Topic models are in widespread use in natural language processing and beyond. Here, we propose a new framework for the evaluation of topic modeling algorithms based on synthetic corpora containing an unambiguously defined ground truth topic structure. The major innovation of our approach is the ability to quantify the agreement between the planted and inferred topic structures by comparing the assigned topic labels at the level of the tokens. In experiments, our approach yields novel insights about the relative strengths of topic models as corpus characteristics vary, and the first evidence of an “undetectable phase” for topic models when the planted structure is weak. We also establish the practical relevance of the insights gained for synthetic corpora by predicting the performance of topic modeling algorithms in classification tasks in real-world corpora.'
volume: 89
URL: http://proceedings.mlr.press/v89/shi19a.html
PDF: http://proceedings.mlr.press/v89/shi19a/shi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-shi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Shi
given: Hanyu
- family: Gerlach
given: Martin
- family: Diersen
given: Isabel
- family: Downey
given: Doug
- family: Amaral
given: Luis
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 816-826
id: shi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 816
lastpage: 826
published: 2019-04-11 00:00:00 +0000
- title: 'On Kernel Derivative Approximation with Random Fourier Features'
abstract: 'Random Fourier features (RFF) represent one of the most popular and wide-spread techniques in machine learning to scale up kernel algorithms. Despite the numerous successful applications of RFFs, unfortunately, quite little is understood theoretically on their optimality and limitations of their performance. Only recently, precise statistical-computational trade-offs have been established for RFFs in the approximation of kernel values, kernel ridge regression, kernel PCA and SVM classification. Our goal is to spark the investigation of optimality of RFF-based approximations in tasks involving not only function values but derivatives, which naturally lead to optimization problems with kernel derivatives. Particularly, in this paper, we focus on the approximation quality of RFFs for kernel derivatives and prove that the existing finite-sample guarantees can be improved exponentially in terms of the domain where they hold, using recent tools from unbounded empirical process theory. Our result implies that the same approximation guarantee is attainable for kernel derivatives using RFF as achieved for kernel values.'
volume: 89
URL: http://proceedings.mlr.press/v89/szabo19a.html
PDF: http://proceedings.mlr.press/v89/szabo19a/szabo19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-szabo19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Szabo
given: Zoltan
- family: Sriperumbudur
given: Bharath
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 827-836
id: szabo19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 827
lastpage: 836
published: 2019-04-11 00:00:00 +0000
- title: 'Sequential Neural Likelihood: Fast Likelihood-free Inference with Autoregressive Flows'
abstract: 'We present Sequential Neural Likelihood (SNL), a new method for Bayesian inference in simulator models, where the likelihood is intractable but simulating data from the model is possible. SNL trains an autoregressive flow on simulated data in order to learn a model of the likelihood in the region of high posterior density. A sequential training procedure guides simulations and reduces simulation cost by orders of magnitude. We show that SNL is more robust, more accurate and requires less tuning than related neural-based methods, and we discuss diagnostics for assessing calibration, convergence and goodness-of-fit.'
volume: 89
URL: http://proceedings.mlr.press/v89/papamakarios19a.html
PDF: http://proceedings.mlr.press/v89/papamakarios19a/papamakarios19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-papamakarios19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Papamakarios
given: George
- family: Sterratt
given: David
- family: Murray
given: Iain
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 837-848
id: papamakarios19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 837
lastpage: 848
published: 2019-04-11 00:00:00 +0000
- title: 'Optimal Transport for Multi-source Domain Adaptation under Target Shift'
abstract: 'In this paper, we tackle the problem of reducing discrepancies between multiple domains, i.e. multi-source domain adaptation, and consider it under the target shift assumption: in all domains we aim to solve a classification problem with the same output classes, but with different labels proportions. This problem, generally ignored in the vast majority of domain adaptation papers, is nevertheless critical in real-world applications, and we theoretically show its impact on the success of the adaptation. Our proposed method is based on optimal transport, a theory that has been successfully used to tackle adaptation problems in machine learning. The introduced approach, Joint Class Proportion and Optimal Transport (JCPOT), performs multi-source adaptation and target shift correction simultaneously by learning the class probabilities of the unlabeled target sample and the coupling allowing to align two (or more) probability distributions. Experiments on both synthetic and real-world data (satellite image pixel classification) task show the superiority of the proposed method over the state-of-the-art.'
volume: 89
URL: http://proceedings.mlr.press/v89/redko19a.html
PDF: http://proceedings.mlr.press/v89/redko19a/redko19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-redko19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Redko
given: Ievgen
- family: Courty
given: Nicolas
- family: Flamary
given: Rémi
- family: Tuia
given: Devis
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 849-858
id: redko19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 849
lastpage: 858
published: 2019-04-11 00:00:00 +0000
- title: 'Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive Learning'
abstract: 'Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ICA have been proposed, leveraging the temporal structure of the independent components. Here, we propose a general framework for nonlinear ICA, which, as a special case, can make use of temporal structure. It is based on augmenting the data by an auxiliary variable, such as the time index, the history of the time series, or any other available information. We propose to learn nonlinear ICA by discriminating between true augmented data, or data in which the auxiliary variable has been randomized. This enables the framework to be implemented algorithmically through logistic regression, possibly in a neural network. We provide a comprehensive proof of the identifiability of the model as well as the consistency of our estimation method. The approach not only provides a general theoretical framework combining and generalizing previously proposed nonlinear ICA models and algorithms, but also brings practical advantages.'
volume: 89
URL: http://proceedings.mlr.press/v89/hyvarinen19a.html
PDF: http://proceedings.mlr.press/v89/hyvarinen19a/hyvarinen19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hyvarinen19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hyvarinen
given: Aapo
- family: Sasaki
given: Hiroaki
- family: Turner
given: Richard
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 859-868
id: hyvarinen19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 859
lastpage: 868
published: 2019-04-11 00:00:00 +0000
- title: 'Deep Neural Networks Learn Non-Smooth Functions Effectively'
abstract: 'We elucidate a theoretical reason that deep neural networks (DNNs) perform better than other models in some cases from the viewpoint of their statistical properties for non-smooth functions. While DNNs have empirically shown higher performance than other standard methods, understanding its mechanism is still a challenging problem. From an aspect of the statistical theory, it is known many standard methods attain the optimal rate of generalization errors for smooth functions in large sample asymptotics, and thus it has not been straightforward to find theoretical advantages of DNNs. This paper fills this gap by considering learning of a certain class of non-smooth functions, which was not covered by the previous theory. We derive the generalization error of estimators by DNNs with a ReLU activation, and show that convergence rates of the generalization by DNNs are almost optimal to estimate the non-smooth functions, while some of the popular models do not attain the optimal rate. In addition, our theoretical result provides guidelines for selecting an appropriate number of layers and edges of DNNs. We provide numerical experiments to support the theoretical results.'
volume: 89
URL: http://proceedings.mlr.press/v89/imaizumi19a.html
PDF: http://proceedings.mlr.press/v89/imaizumi19a/imaizumi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-imaizumi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Imaizumi
given: Masaaki
- family: Fukumizu
given: Kenji
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 869-878
id: imaizumi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 869
lastpage: 878
published: 2019-04-11 00:00:00 +0000
- title: 'Attenuating Bias in Word vectors'
abstract: 'Word vector representations are well developed tools for various NLP and Machine Learning tasks and are known to retain significant semantic and syntactic structure of languages. But they are prone to carrying and amplifying bias which can perpetrate discrimination in various applications. In this work, we explore new simple ways to detect the most stereotypically gendered words in an embedding and remove the bias from them. We verify how names are masked carriers of gender bias and then use that as a tool to attenuate bias in embeddings. Further, we extend this property of names to show how names can be used to detect other types of bias in the embeddings such as bias based on race, ethnicity, and age.'
volume: 89
URL: http://proceedings.mlr.press/v89/dev19a.html
PDF: http://proceedings.mlr.press/v89/dev19a/dev19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dev19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dev
given: Sunipa
- family: Phillips
given: Jeff
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 879-887
id: dev19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 879
lastpage: 887
published: 2019-04-11 00:00:00 +0000
- title: 'Fisher-Rao Metric, Geometry, and Complexity of Neural Networks'
abstract: 'We study the relationship between geometry and capacity measures for deep neural networks from an invariance viewpoint. We introduce a new notion of capacity — the Fisher-Rao norm — that possesses desirable invariance properties and is motivated by Information Geometry. We discover an analytical characterization of the new capacity measure, through which we establish norm-comparison inequalities and further show that the new measure serves as an umbrella for several existing norm-based complexity measures. We discuss upper bounds on the generalization error induced by the proposed measure. Extensive numerical experiments on CIFAR-10 support our theoretical findings. Our theoretical analysis rests on a key structural lemma about partial derivatives of multi-layer rectifier networks.'
volume: 89
URL: http://proceedings.mlr.press/v89/liang19a.html
PDF: http://proceedings.mlr.press/v89/liang19a/liang19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-liang19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Liang
given: Tengyuan
- family: Poggio
given: Tomaso
- family: Rakhlin
given: Alexander
- family: Stokes
given: James
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 888-896
id: liang19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 888
lastpage: 896
published: 2019-04-11 00:00:00 +0000
- title: 'Accelerated Decentralized Optimization with Local Updates for Smooth and Strongly Convex Objectives'
abstract: 'In this paper, we study the problem of minimizing a sum of smooth and strongly convex functions split over the nodes of a network in a decentralized fashion. We propose the algorithm ESDACD, a decentralized accelerated algorithm that only requires local synchrony. Its rate depends on the condition number $\kappa$ of the local functions as well as the network topology and delays. Under mild assumptions on the topology of the graph, ESDACD takes a time $O((\tau_{\max} + \Delta_{\max})\sqrt{{\kappa}/{\gamma}}\ln(\epsilon^{-1}))$ to reach a precision $\epsilon$ where $\gamma$ is the spectral gap of the graph, $\tau_{\max}$ the maximum communication delay and $\Delta_{\max}$ the maximum computation time. Therefore, it matches the rate of SSDA, which is optimal when $\tau_{\max} = \Omega\left(\Delta_{\max}\right)$. Applying ESDACD to quadratic local functions leads to an accelerated randomized gossip algorithm of rate $O( \sqrt{\theta_{\rm gossip}/n})$ where $\theta_{\rm gossip}$ is the rate of the standard randomized gossip. To the best of our knowledge, it is the first asynchronous algorithm with a provably improved rate of convergence of the second moment of the error. We illustrate these results with experiments in idealized settings.'
volume: 89
URL: http://proceedings.mlr.press/v89/hendrikx19a.html
PDF: http://proceedings.mlr.press/v89/hendrikx19a/hendrikx19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hendrikx19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hendrikx
given: Hadrien
- family: Bach
given: Francis
- family: Massoulie
given: Laurent
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 897-906
id: hendrikx19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 897
lastpage: 906
published: 2019-04-11 00:00:00 +0000
- title: 'Interaction Matters: A Note on Non-asymptotic Local Convergence of Generative Adversarial Networks'
abstract: 'Motivated by the pursuit of a systematic computational and algorithmic understanding of Generative Adversarial Networks (GANs), we present a simple yet unified non-asymptotic local convergence theory for smooth two-player games, which subsumes several discrete-time gradient-based saddle point dynamics. The analysis reveals the surprising nature of the off-diagonal interaction term as both a blessing and a curse. On the one hand, this interaction term explains the origin of the slow-down effect in the convergence of Simultaneous Gradient Ascent (SGA) to stable Nash equilibria. On the other hand, for the unstable equilibria, exponential convergence can be proved thanks to the interaction term, for four modified dynamics proposed to stabilize GAN training: Optimistic Mirror Descent (OMD), Consensus Optimization (CO), Implicit Updates (IU) and Predictive Method (PM). The analysis uncovers the intimate connections among these stabilizing techniques, and provides detailed characterization on the choice of learning rate. As a by-product, we present a new analysis for OMD proposed in Daskalakis, Ilyas, Syrgkanis, and Zeng [2017] with improved rates.'
volume: 89
URL: http://proceedings.mlr.press/v89/liang19b.html
PDF: http://proceedings.mlr.press/v89/liang19b/liang19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-liang19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Liang
given: Tengyuan
- family: Stokes
given: James
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 907-915
id: liang19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 907
lastpage: 915
published: 2019-04-11 00:00:00 +0000
- title: 'On Constrained Nonconvex Stochastic Optimization: A Case Study for Generalized Eigenvalue Decomposition'
abstract: 'We study constrained nonconvex optimization problems in machine learning and signal processing. It is well-known that these problems can be rewritten to a min-max problem in a Lagrangian form. However, due to the lack of convexity, their landscape is not well understood and how to find the stable equilibria of the Lagrangian function is still unknown. To bridge the gap, we study the landscape of the Lagrangian function. Further, we define a special class of Lagrangian functions. They enjoy the following two properties: 1. Equilibria are either stable or unstable (Formal definition in Section 2); 2.Stable equilibria correspond to the global optima of the original problem. We show that a generalized eigenvalue (GEV) problem, including canonical correlation analysis and other problems as special examples, belongs to the class. Specifically, we characterize its stable and unstable equilibria by leveraging an invariant group and symmetric property (more details in Section 3). Motivated by these neat geometric structures, we propose a simple, efficient, and stochastic primal-dual algorithm solving the online GEV problem. Theoretically, under sufficient conditions, we establish an asymptotic rate of convergence and obtain the first sample complexity result for the online GEV problem by diffusion approximations, which are widely used in applied probability. Numerical results are also provided to support our theory.'
volume: 89
URL: http://proceedings.mlr.press/v89/chen19a.html
PDF: http://proceedings.mlr.press/v89/chen19a/chen19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chen19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chen
given: Zhehui
- family: Li
given: Xingguo
- family: Yang
given: Lin
- family: Haupt
given: Jarvis
- family: Zhao
given: Tuo
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 916-925
id: chen19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 916
lastpage: 925
published: 2019-04-11 00:00:00 +0000
- title: 'Generalized Boltzmann Machine with Deep Neural Structure'
abstract: 'Restricted Boltzmann Machine (RBM) is an essential component in many machine learning applications. As a probabilistic graphical model, RBM posits a shallow structure, which makes it less capable of modeling real-world applications. In this paper, to bridge the gap between RBM and artificial neural network, we propose an energy-based probabilistic model that is more flexible on modeling continuous data. By introducing the pair-wise inverse autoregressive flow into RBM, we propose two generalized continuous RBMs which contain deep neural network structure to more flexibly track the practical data distribution while still keeping the inference tractable. In addition, we extend the generalized RBM structures into sequential setting to better model the stochastic process of time series. Performance improvements on probabilistic modeling and representation learning are demonstrated by the experiments on diverse datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/liu19b.html
PDF: http://proceedings.mlr.press/v89/liu19b/liu19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-liu19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Liu
given: Yingru
- family: Xie
given: Dongliang
- family: Wang
given: Xin
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 926-934
id: liu19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 926
lastpage: 934
published: 2019-04-11 00:00:00 +0000
- title: 'Extreme Stochastic Variational Inference: Distributed Inference for Large Scale Mixture Models'
abstract: 'Mixture of exponential family models are among the most fundamental and widely used statistical models. Stochastic variational inference (SVI), the state-of-the-art algorithm for parameter estimation in such models is inherently serial. Moreover, it requires the parameters to fit in the memory of a single processor; this poses serious limitations on scalability when the number of parameters is in billions. In this paper, we present extreme stochastic variational inference (ESVI), a distributed, asynchronous and lock-free algorithm to perform variational inference for mixture models on massive real world datasets. ESVI overcomes the limitations of SVI by requiring that each processor only access a subset of the data and a subset of the parameters, thus providing data and model parallelism simultaneously. Our empirical study demonstrates that ESVI not only outperforms VI and SVI in wallclock-time, but also achieves a better quality solution. To further speed up computation and save memory when fitting large number of topics, we propose a variant ESVI-TOPK which maintains only the top-k important topics. Empirically, we found that using top 25% topics suffices to achieve the same accuracy as storing all the topics.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhang19c.html
PDF: http://proceedings.mlr.press/v89/zhang19c/zhang19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhang19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhang
given: Jiong
- family: Raman
given: Parameswaran
- family: Ji
given: Shihao
- family: Yu
given: Hsiang-Fu
- family: Vishwanathan
given: S.V.N.
- family: Dhillon
given: Inderjit
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 935-943
id: zhang19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 935
lastpage: 943
published: 2019-04-11 00:00:00 +0000
- title: 'Correcting the bias in least squares regression with volume-rescaled sampling'
abstract: 'Consider linear regression where the examples are generated by an unknown distribution on R^d x R. Without any assumptions on the noise, the linear least squares solution for any i.i.d. sample will typically be biased w.r.t. the least squares optimum over the entire distribution. However, we show that if an i.i.d. sample of any size k is augmented by a certain small additional sample, then the solution of the combined sample becomes unbiased. We show this when the additional sample consists of d points drawn jointly according to the input distribution rescaled by the squared volume spanned by the points. Furthermore, we propose algorithms to sample from this volume-rescaled distribution when the data distribution is only known through an i.i.d sample.'
volume: 89
URL: http://proceedings.mlr.press/v89/derezinski19a.html
PDF: http://proceedings.mlr.press/v89/derezinski19a/derezinski19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-derezinski19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Derezinski
given: Michal
- family: Warmuth
given: Manfred K.
- family: Hsu
given: Daniel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 944-953
id: derezinski19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 944
lastpage: 953
published: 2019-04-11 00:00:00 +0000
- title: 'Conservative Exploration using Interleaving'
abstract: 'In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is much worse than a default production action. In general, this is impossible because the agent has to explore unknown actions, some of which can be bad, to learn better actions. However, when the actions are structured, this is possible if the unknown action can be evaluated by interleaving it with the default action. We formalize this concept as learning in stochastic combinatorial semi-bandits with exchangeable actions. We design efficient learning algorithms for this problem, bound their n-step regret, and evaluate them on both synthetic and real-world problems. Our real-world experiments show that our algorithms can learn to recommend K most attractive movies without ever making disastrous recommendations, both overall and subject to a diversity constraint.'
volume: 89
URL: http://proceedings.mlr.press/v89/katariya19a.html
PDF: http://proceedings.mlr.press/v89/katariya19a/katariya19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-katariya19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Katariya
given: Sumeet
- family: Kveton
given: Branislav
- family: Wen
given: Zheng
- family: Potluru
given: Vamsi K.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 954-963
id: katariya19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 954
lastpage: 963
published: 2019-04-11 00:00:00 +0000
- title: 'Conditionally Independent Multiresolution Gaussian Processes'
abstract: 'The multiresolution Gaussian process (GP) has gained increasing attention as a viable approach towards improving the quality of approximations in GPs that scale well to large-scale data. Most of the current constructions assume full independence across resolutions. This assumption simplifies the inference, but it underestimates the uncertainties in transitioning from one resolution to another. This in turn results in models which are prone to overfitting in the sense of excessive sensitivity to the chosen resolution, and predictions which are non-smooth at the boundaries. Our contribution is a new construction which instead assumes conditional independence among GPs across resolutions. We show that relaxing the full independence assumption enables robustness against overfitting, and that it delivers predictions that are smooth at the boundaries. Our new model is compared against current state of the art on 2 synthetic and 9 real-world datasets. In most cases, our new conditionally independent construction performed favorably when compared against models based on the full independence assumption. In particular, it exhibits little to no signs of overfitting.'
volume: 89
URL: http://proceedings.mlr.press/v89/taghia19a.html
PDF: http://proceedings.mlr.press/v89/taghia19a/taghia19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-taghia19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Taghia
given: Jalil
- family: Schön
given: Thomas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 964-973
id: taghia19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 964
lastpage: 973
published: 2019-04-11 00:00:00 +0000
- title: 'Active Exploration in Markov Decision Processes'
abstract: 'We introduce the active exploration problem in Markov decision processes (MDPs). Each state of the MDP is characterized by a random value and the learner should gather samples to estimate the mean value of each state as accurately as possible. Similarly to active exploration in multi-armed bandit (MAB), states may have different levels of noise, so that the higher the noise, the more samples are needed. As the noise level is initially unknown, we need to trade off the exploration of the environment to estimate the noise and the exploitation of these estimates to compute a policy maximizing the accuracy of the mean predictions. We introduce a novel learning algorithm to solve this problem showing that active exploration in MDPs may be significantly more difficult than in MAB. We also derive a heuristic procedure to mitigate the negative effect of slowly mixing policies. Finally, we validate our findings on simple numerical simulations.'
volume: 89
URL: http://proceedings.mlr.press/v89/tarbouriech19a.html
PDF: http://proceedings.mlr.press/v89/tarbouriech19a/tarbouriech19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-tarbouriech19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Tarbouriech
given: Jean
- family: Lazaric
given: Alessandro
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 974-982
id: tarbouriech19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 974
lastpage: 982
published: 2019-04-11 00:00:00 +0000
- title: 'On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes'
abstract: 'Stochastic gradient descent is the method of choice for large scale optimization of machine learning objective functions. Yet, its performance is greatly variable and heavily depends on the choice of the stepsizes. This has motivated a large body of research on adaptive stepsizes. However, there is currently a gap in our theoretical understanding of these methods, especially in the non-convex setting. In this paper, we start closing this gap: we theoretically analyze in the convex and non-convex settings a generalized version of the AdaGrad stepsizes. We show sufficient conditions for these stepsizes to achieve almost sure asymptotic convergence of the gradients to zero, proving the first guarantee for generalized AdaGrad stepsizes in the non-convex setting. Moreover, we show that these stepsizes allow to automatically adapt to the level of noise of the stochastic gradients in both the convex and non-convex settings, interpolating between O(1/T) and O(1/sqrt(T)), up to logarithmic terms.'
volume: 89
URL: http://proceedings.mlr.press/v89/li19c.html
PDF: http://proceedings.mlr.press/v89/li19c/li19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-li19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Li
given: Xiaoyu
- family: Orabona
given: Francesco
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 983-992
id: li19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 983
lastpage: 992
published: 2019-04-11 00:00:00 +0000
- title: 'Bandit Online Learning with Unknown Delays'
abstract: 'This paper deals with bandit online learning, where feedback of unknown delay can emerge in non-stochastic multi-armed bandit (MAB) and bandit convex optimization (BCO) settings. MAB and BCO require only values of the objective function to become available through feedback, and are used to estimate the gradient appearing in the corresponding iterative algorithms. Since the challenging case of feedback with unknown delays prevents one from constructing the sought gradient estimates, existing MAB and BCO algorithms become intractable. Delayed exploration, exploitation, and exponential (DEXP3) iterations, along with delayed bandit gradient descent (DBGD) iterations are developed for MAB and BCO with unknown delays, respectively. Based on a unifying analysis framework, it is established that both DEXP3 and DBGD guarantee an $\tilde{\cal O}\big( \sqrt{K(T+D)} \big)$ regret, where $D$ denotes the delay accumulated over $T$ slots, and $K$ represents the number of arms in MAB or the dimension of decision variables in BCO. Numerical tests using both synthetic and real data validate DEXP3 and DBGD.'
volume: 89
URL: http://proceedings.mlr.press/v89/li19d.html
PDF: http://proceedings.mlr.press/v89/li19d/li19d.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-li19d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Li
given: Bingcong
- family: Chen
given: Tianyi
- family: Giannakis
given: Georgios B.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 993-1002
id: li19d
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 993
lastpage: 1002
published: 2019-04-11 00:00:00 +0000
- title: 'Learning Invariant Representations with Kernel Warping'
abstract: 'Invariance is an effective prior that has been extensively used to bias supervised learning with a \emph{given} representation of data. In order to learn invariant representations, wavelet and scattering based methods “hard code” invariance over the \emph{entire} sample space, hence restricted to a limited range of transformations. Kernels based on Haar integration also work only on a \emph{group} of transformations. In this work, we break this limitation by designing a new representation learning algorithm that incorporates invariances \emph{beyond transformation}. Our approach, which is based on warping the kernel in a data-dependent fashion, is computationally efficient using random features, and leads to a deep kernel through multiple layers. We apply it to convolutional kernel networks and demonstrate its stability.'
volume: 89
URL: http://proceedings.mlr.press/v89/ma19a.html
PDF: http://proceedings.mlr.press/v89/ma19a/ma19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ma19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ma
given: Yingyi
- family: Ganapathiraman
given: Vignesh
- family: Zhang
given: Xinhua
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1003-1012
id: ma19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1003
lastpage: 1012
published: 2019-04-11 00:00:00 +0000
- title: '$β^3$-IRT: A New Item Response Model and its Applications'
abstract: 'Item Response Theory (IRT) aims to assess latent abilities of respondents based on the correctness of their answers in aptitude test items with different difficulty levels. In this paper, we propose the $\beta^3$-IRT model, which models continuous responses and can generate a much enriched family of Item Characteristic Curves. In experiments we applied the proposed model to data from an online exam platform, and show our model outperforms a more standard 2PL-ND model on all datasets. Furthermore, we show how to apply $\beta^3$-IRT to assess the ability of machine learning classifiers.This novel application results in a new metric for evaluating the quality of the classifier’s probability estimates, based on the inferred difficulty and discrimination of data instances.'
volume: 89
URL: http://proceedings.mlr.press/v89/chen19b.html
PDF: http://proceedings.mlr.press/v89/chen19b/chen19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chen19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chen
given: Yu
- family: Filho
given: Telmo Silva
- family: Prudencio
given: Ricardo B.
- family: Diethe
given: Tom
- family: Flach
given: Peter
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1013-1021
id: chen19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1013
lastpage: 1021
published: 2019-04-11 00:00:00 +0000
- title: 'Can You Trust This Prediction? Auditing Pointwise Reliability After Learning'
abstract: 'To use machine learning in high stakes applications (e.g. medicine), we need tools for building confidence in the system and evaluating whether it is reliable. Methods to improve model reliability often require new learning algorithms (e.g. using Bayesian inference to obtain uncertainty estimates). An alternative is to audit a model after it is trained. In this paper, we describe resampling uncertainty estimation (RUE), an algorithm to audit the pointwise reliability of predictions. Intuitively, RUE estimates the amount that a prediction would change if the model had been fit on different training data. The algorithm uses the gradient and Hessian of the model’s loss function to create an ensemble of predictions. Experimentally, we show that RUE more effectively detects inaccurate predictions than existing tools for auditing reliability subsequent to training. We also show that RUE can create predictive distributions that are competitive with state-of-the-art methods like Monte Carlo dropout, probabilistic backpropagation, and deep ensembles, but does not depend on specific algorithms at train-time like these methods do.'
volume: 89
URL: http://proceedings.mlr.press/v89/schulam19a.html
PDF: http://proceedings.mlr.press/v89/schulam19a/schulam19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-schulam19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Schulam
given: Peter
- family: Saria
given: Suchi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1022-1031
id: schulam19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1022
lastpage: 1031
published: 2019-04-11 00:00:00 +0000
- title: 'Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach'
abstract: 'The Fisher information matrix (FIM) is a fundamental quantity to represent the characteristics of a stochastic model, including deep neural networks (DNNs). The present study reveals novel statistics of FIM that are universal among a wide class of DNNs. To this end, we use random weights and large width limits, which enables us to utilize mean field theories. We investigate the asymptotic statistics of the FIM’s eigenvalues and reveal that most of them are close to zero while the maximum eigenvalue takes a huge value. Because the landscape of the parameter space is defined by the FIM, it is locally flat in most dimensions, but strongly distorted in others. Moreover, we demonstrate the potential usage of the derived statistics in learning strategies. First, small eigenvalues that induce flatness can be connected to a norm-based capacity measure of generalization ability. Second, the maximum eigenvalue that induces the distortion enables us to quantitatively estimate an appropriately sized learning rate for gradient methods to converge.'
volume: 89
URL: http://proceedings.mlr.press/v89/karakida19a.html
PDF: http://proceedings.mlr.press/v89/karakida19a/karakida19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-karakida19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Karakida
given: Ryo
- family: Akaho
given: Shotaro
- family: Amari
given: Shun-ichi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1032-1041
id: karakida19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1032
lastpage: 1041
published: 2019-04-11 00:00:00 +0000
- title: 'Conditional Sparse $L_p$-norm Regression With Optimal Probability'
abstract: 'We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and minimizing the $l_p$ loss of $f$ at predicting the target $z$ in the distribution conditioned on $c$. Thus, the task is to identify a portion of the distribution on which a linear rule can provide a good fit. Algorithms for this task are useful in cases where portions of the distribution are not modeled well by simple, learnable rules, but on other portions such rules perform well. The prior state-of-the-art for such algorithms could only guarantee finding a condition of probability $O(\mu/n^k )$ when a condition of probability $\mu$ exists, and achieved a $O(n^k)$-approximation to the target loss. Here, we give efficient algorithms for solving this task with a condition $c$ that nearly matches the probability of the ideal condition, while also improving the approximation to the target loss to a $O (n^{k/2})$ factor. We also give an algorithm for finding a k-DNF reference class for prediction at a given query point, that obtains a sparse regression fit that has loss within $O(n^k)$ of optimal among all sparse regression parameters and sufficiently large $k$-DNF reference classes containing the query point.'
volume: 89
URL: http://proceedings.mlr.press/v89/hainline19a.html
PDF: http://proceedings.mlr.press/v89/hainline19a/hainline19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hainline19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hainline
given: John
- family: Juba
given: Brendan
- family: Le
given: Hai S.
- family: Woodruff
given: David
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1042-1050
id: hainline19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1042
lastpage: 1050
published: 2019-04-11 00:00:00 +0000
- title: 'On the Connection Between Learning Two-Layer Neural Networks and Tensor Decomposition'
abstract: 'We establish connections between the problem of learning a two-layer neural network and tensor decomposition. We consider a model with feature vectors $x$, $r$ hidden units with weights $w_i$ and output $y$, i.e., $y=\sum_{i=1}^r \sigma(w_i^{T} x)$, with activation functions given by low-degree polynomials. In particular, if $\sigma(x) = a_0+a_1x+a_3x^3$, we prove that no polynomial-time algorithm can outperform the trivial predictor that assigns to each example the response variable $E(y)$, when $d^{3/2}<< r <2$ components is also provided.'
volume: 89
URL: http://proceedings.mlr.press/v89/kushnir19a.html
PDF: http://proceedings.mlr.press/v89/kushnir19a/kushnir19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kushnir19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kushnir
given: Dan
- family: Jalali
given: Shirin
- family: Saniee
given: Iraj
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1379-1387
id: kushnir19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1379
lastpage: 1387
published: 2019-04-11 00:00:00 +0000
- title: 'Classifying Signals on Irregular Domains via Convolutional Cluster Pooling'
abstract: 'We present a novel and hierarchical approach for supervised classification of signals spanning over a fixed graph, reflecting shared properties of the dataset. To this end, we introduce a Convolutional Cluster Pooling layer exploiting a multi-scale clustering in order to highlight, at different resolutions, locally connected regions on the input graph. Our proposal generalises well-established neural models such as Convolutional Neural Networks (CNNs) on irregular and complex domains, by means of the exploitation of the weight sharing property in a graph-oriented architecture. In this work, such property is based on the centrality of each vertex within its soft-assigned cluster. Extensive experiments on NTU RGB+D, CIFAR-10 and 20NEWS demonstrate the effectiveness of the proposed technique in capturing both local and global patterns in graph-structured data out of different domains.'
volume: 89
URL: http://proceedings.mlr.press/v89/porrello19a.html
PDF: http://proceedings.mlr.press/v89/porrello19a/porrello19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-porrello19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Porrello
given: Angelo
- family: Abati
given: Davide
- family: Calderara
given: Simone
- family: Cucchiara
given: Rita
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1388-1397
id: porrello19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1388
lastpage: 1397
published: 2019-04-11 00:00:00 +0000
- title: 'Learning Rules-First Classifiers'
abstract: 'Complex classifiers may exhibit “embarassing” failures in cases where humans can easily provide a justified classification. Avoiding such failures is obviously of key importance. In this work, we focus on one such setting, where a label is perfectly predictable if the input contains certain features, or rules, and otherwise it is predictable by a linear classifier. We define a hypothesis class that captures this notion and determine its sample complexity. We also give evidence that efficient algorithms cannot achieve this sample complexity. We then derive a simple and efficient algorithm and show that its sample complexity is close to optimal, among efficient algorithms. Experiments on synthetic and sentiment analysis data demonstrate the efficacy of the method, both in terms of accuracy and interpretability.'
volume: 89
URL: http://proceedings.mlr.press/v89/cohen19a.html
PDF: http://proceedings.mlr.press/v89/cohen19a/cohen19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-cohen19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Cohen
given: Deborah
- family: Daniely
given: Amit
- family: Globerson
given: Amir
- family: Elidan
given: Gal
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1398-1406
id: cohen19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1398
lastpage: 1406
published: 2019-04-11 00:00:00 +0000
- title: 'Wasserstein regularization for sparse multi-task regression'
abstract: 'We focus in this paper on high-dimensional regression problems where each regressor can be associated to a location in a physical space, or more generally a generic geometric space. Such problems often employ sparse priors, which promote models using a small subset of regressors. To increase statistical power, the so-called multi-task techniques were proposed, which consist in the simultaneous estimation of several related models. Combined with sparsity assumptions, it lead to models enforcing the active regressors to be shared across models, thanks to, for instance L1/Lq norms. We argue in this paper that these techniques fail to leverage the spatial information associated to regressors. Indeed, while sparse priors enforce that only a small subset of variables is used, the assumption that these regressors overlap across all tasks is overly simplistic given the spatial variability observed in real data. In this paper, we propose a convex regularizer for multi-task regression that encodes a more flexible geometry. Our regularizer is based on unbalanced optimal transport (OT) theory, and can take into account a prior geometric knowledge on the regressor variables, without necessarily requiring overlapping supports. We derive an efficient algorithm based on a regularized formulation of OT, which iterates through applications of Sinkhorn’s algorithm along with coordinate descent iterations. The performance of our model is demonstrated on regular grids with both synthetic and real datasets as well as complex triangulated geometries of the cortex with an application in neuroimaging.'
volume: 89
URL: http://proceedings.mlr.press/v89/janati19a.html
PDF: http://proceedings.mlr.press/v89/janati19a/janati19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-janati19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Janati
given: Hicham
- family: Cuturi
given: Marco
- family: Gramfort
given: Alexandre
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1407-1416
id: janati19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1407
lastpage: 1416
published: 2019-04-11 00:00:00 +0000
- title: 'Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors'
abstract: 'We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the square loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the $L_2$-regularized logistic regression.'
volume: 89
URL: http://proceedings.mlr.press/v89/nitanda19a.html
PDF: http://proceedings.mlr.press/v89/nitanda19a/nitanda19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-nitanda19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Nitanda
given: Atsushi
- family: Suzuki
given: Taiji
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1417-1426
id: nitanda19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1417
lastpage: 1426
published: 2019-04-11 00:00:00 +0000
- title: 'Black Box Quantiles for Kernel Learning'
abstract: 'Kernel methods have been successfully used in various domains to model nonlinear patterns. However, the structure of the kernels is typically handcrafted for each dataset based on the experience of the data analyst. In this paper, we present a novel technique to learn kernels that best fit the data. We exploit the measure-theoretic view of a shift-invariant kernel given by the Bochner’s theorem, and automatically learn the measure in terms of a parameterized quantile function. This flexible black box quantile function, evaluated on Quasi-Monte Carlo samples, builds up quasi-random Fourier feature maps that can approximate arbitrary kernels. The proposed method is not only general enough to be used in any kernel machine, but can also be combined with other kernel design techniques. We learn expressive kernels on a variety of datasets, verifying the methods ability to automatically discover complex patterns without being guided by human expert knowledge.'
volume: 89
URL: http://proceedings.mlr.press/v89/tompkins19a.html
PDF: http://proceedings.mlr.press/v89/tompkins19a/tompkins19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-tompkins19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Tompkins
given: Anthony
- family: Senanayake
given: Ransalu
- family: Morere
given: Philippe
- family: Ramos
given: Fabio
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1427-1437
id: tompkins19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1427
lastpage: 1437
published: 2019-04-11 00:00:00 +0000
- title: 'Adversarial Variational Optimization of Non-Differentiable Simulators'
abstract: 'Complex computer simulators are increasingly used across fields of science as generative models tying parameters of an underlying theory to experimental observations. Inference in this setup is often difficult, as simulators rarely admit a tractable density or likelihood function. We introduce Adversarial Variational Optimization (AVO), a likelihood-free inference algorithm for fitting a non-differentiable generative model incorporating ideas from generative adversarial networks, variational optimization and empirical Bayes. We adapt the training procedure of generative adversarial networks by replacing the differentiable generative network with a domain-specific simulator. We solve the resulting non-differentiable minimax problem by minimizing variational upper bounds of the two adversarial objectives. Effectively, the procedure results in learning a proposal distribution over simulator parameters, such that the JS divergence between the marginal distribution of the synthetic data and the empirical distribution of observed data is minimized. We evaluate and compare the method with simulators producing both discrete and continuous data.'
volume: 89
URL: http://proceedings.mlr.press/v89/louppe19a.html
PDF: http://proceedings.mlr.press/v89/louppe19a/louppe19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-louppe19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Louppe
given: Gilles
- family: Hermans
given: Joeri
- family: Cranmer
given: Kyle
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1438-1447
id: louppe19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1438
lastpage: 1447
published: 2019-04-11 00:00:00 +0000
- title: 'Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization'
abstract: 'Pre-conditioning is a well-known concept that can significantly improve the convergence of optimization algorithms. For noise-free problems, where good pre-conditioners are not known a priori, iterative linear algebra methods offer one way to efficiently construct them. For the stochastic optimization problems that dominate contemporary machine learning, however, this approach is not readily available. We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise. The algorithm is empirically demonstrated to efficiently construct effective pre-conditioners for stochastic gradient descent and its variants. Experiments on problems of comparably low dimensionality show improved convergence. In very high-dimensional problems, such as those encountered in deep learning, the pre-conditioner effectively becomes an automatic learning-rate adaptation scheme, which we also show to empirically work well.'
volume: 89
URL: http://proceedings.mlr.press/v89/roos19a.html
PDF: http://proceedings.mlr.press/v89/roos19a/roos19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-roos19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Roos
given: Filip
- family: Hennig
given: Philipp
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1448-1457
id: roos19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1448
lastpage: 1457
published: 2019-04-11 00:00:00 +0000
- title: 'Projection Free Online Learning over Smooth Sets'
abstract: 'The projection operation is a crucial step in applying Online Gradient Descent (OGD) and its stochastic version SGD. Unfortunately, in some cases, projection is computationally demanding and inhibits us from applying OGD. In this work we focus on the special case where the constraint set is smooth and we have an access to gradient and value oracles of the constraint function. Under these assumptions we design a new approximate projection operation that necessitates only logarithmically many calls to these oracles. We further show that combining OGD with this new approximate projection, results in a projection free variant which recovers the standard rates of the fully projected version. This applies to both convex and strongly-convex online settings.'
volume: 89
URL: http://proceedings.mlr.press/v89/levy19a.html
PDF: http://proceedings.mlr.press/v89/levy19a/levy19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-levy19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Levy
given: Kfir
- family: Krause
given: Andreas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1458-1466
id: levy19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1458
lastpage: 1466
published: 2019-04-11 00:00:00 +0000
- title: 'Confidence Scoring Using Whitebox Meta-models with Linear Classifier Probes'
abstract: 'We propose a novel confidence scoring mechanism for deep neural networks based on a two-model paradigm involving a base model and a meta-model. The confidence score is learned by the meta-model observing the base model succeeding/failing at its task. As features to the meta-model, we investigate linear classifier probes inserted between the various layers of the base model. Our experiments demonstrate that this approach outperforms multiple baselines in a filtering task, i.e., task of rejecting samples with low confidence. Experimental results are presented using CIFAR-10 and CIFAR-100 dataset with and without added noise. We discuss the importance of confidence scoring to bridge the gap between experimental and real-world applications.'
volume: 89
URL: http://proceedings.mlr.press/v89/chen19c.html
PDF: http://proceedings.mlr.press/v89/chen19c/chen19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chen19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chen
given: Tongfei
- family: Navratil
given: Jiri
- family: Iyengar
given: Vijay
- family: Shanmugam
given: Karthikeyan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1467-1475
id: chen19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1467
lastpage: 1475
published: 2019-04-11 00:00:00 +0000
- title: 'Learning Influence-Receptivity Network Structure with Guarantee'
abstract: 'Traditional works on community detection from observations of information cascade assume that a single adjacency matrix parametrizes all the observed cascades. However, in reality the connection structure usually does not stay the same across cascades. For example, different people have different topics of interest, therefore the connection structure depends on the information/topic content of the cascade. In this paper we consider the case where we observe a sequence of noisy adjacency matrices triggered by information/events with different topic distributions. We propose a novel latent model using the intuition that a connection is more likely to exist between two nodes if they are interested in similar topics, which are common with the information/event. Specifically, we endow each node with two node-topic vectors: an influence vector that measures how influential/authoritative they are on each topic; and a receptivity vector that measures how receptive/susceptible they are to each topic. We show how these two node-topic structures can be estimated from observed adjacency matrices with theoretical guarantee on estimation error, in cases where the topic distributions of the information/events are known, as well as when they are unknown. Experiments on synthetic and real data demonstrate the effectiveness of our model and superior performance compared to state-of-the-art methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/yu19c.html
PDF: http://proceedings.mlr.press/v89/yu19c/yu19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-yu19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Yu
given: Ming
- family: Gupta
given: Varun
- family: Kolar
given: Mladen
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1476-1485
id: yu19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1476
lastpage: 1485
published: 2019-04-11 00:00:00 +0000
- title: 'Iterative Bayesian Learning for Crowdsourced Regression'
abstract: 'Crowdsourcing platforms emerged as popular venues for purchasing human intelligence at low cost for large volume of tasks. As many low-paid workers are prone to give noisy answers, a common practice is to add redundancy by assigning multiple workers to each task and then simply average out these answers. However, to fully harness the wisdom of the crowd, one needs to learn the heterogeneous quality of each worker. We resolve this fundamental challenge in crowdsourced regression tasks, i.e., the answer takes continuous labels, where identifying good or bad workers becomes much more non-trivial compared to a classification setting of discrete labels. In particular, we introduce a Bayesian iterative scheme and show that it provably achieves the optimal mean squared error. Our evaluations on synthetic and real-world datasets support our theoretical results and show the superiority of the proposed scheme.'
volume: 89
URL: http://proceedings.mlr.press/v89/ok19a.html
PDF: http://proceedings.mlr.press/v89/ok19a/ok19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ok19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ok
given: Jungseul
- family: Oh
given: Sewoong
- family: Jang
given: Yunhun
- family: Shin
given: Jinwoo
- family: Yi
given: Yung
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1486-1495
id: ok19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1486
lastpage: 1495
published: 2019-04-11 00:00:00 +0000
- title: 'Nonconvex Matrix Factorization from Rank-One Measurements'
abstract: 'We consider the problem of recovering low-rank matrices from random rank-one measurements, which spans numerous applications including phase retrieval, quantum state tomography, and learning shallow neural networks with quadratic activations, among others. Our approach is to directly estimate the low-rank factor by minimizing a nonconvex least-squares loss function via vanilla gradient descent, following a tailored spectral initialization. When the true rank is small, this algorithm is guaranteed to converge to the ground truth (up to global ambiguity) with near-optimal sample and computational complexities with respect to the problem size. To the best of our knowledge, this is the first theoretical guarantee that achieves near optimality in both metrics. In particular, the key enabler of near-optimal computational guarantees is an implicit regularization phenomenon: without explicit regularization, both spectral initialization and the gradient descent iterates automatically stay within a region incoherent with the measurement vectors. This feature allows one to employ much more aggressive step sizes compared with the ones suggested in prior literature, without the need of sample splitting.'
volume: 89
URL: http://proceedings.mlr.press/v89/li19e.html
PDF: http://proceedings.mlr.press/v89/li19e/li19e.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-li19e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Li
given: Yuanxin
- family: Ma
given: Cong
- family: Chen
given: Yuxin
- family: Chi
given: Yuejie
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1496-1505
id: li19e
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1496
lastpage: 1505
published: 2019-04-11 00:00:00 +0000
- title: 'Fast and Robust Shortest Paths on Manifolds Learned from Data'
abstract: 'We propose a fast, simple and robust algorithm for computing shortest paths and distances on Riemannian manifolds learned from data. This amounts to solving a system of ordinary differential equations (ODEs) subject to boundary conditions. Here standard solvers perform poorly because they require well-behaved Jacobians of the ODE, and usually, manifolds learned from data imply unstable and ill-conditioned Jacobians. Instead, we propose a fixed-point iteration scheme for solving the ODE that avoids Jacobians. This enhances the stability of the solver, while reduces the computational cost. In experiments involving both Riemannian metric learning and deep generative models we demonstrate significant improvements in speed and stability over both general-purpose state-of-the-art solvers as well as over specialized solvers.'
volume: 89
URL: http://proceedings.mlr.press/v89/arvanitidis19a.html
PDF: http://proceedings.mlr.press/v89/arvanitidis19a/arvanitidis19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-arvanitidis19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Arvanitidis
given: Georgios
- family: Hauberg
given: Soren
- family: Hennig
given: Philipp
- family: Schober
given: Michael
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1506-1515
id: arvanitidis19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1506
lastpage: 1515
published: 2019-04-11 00:00:00 +0000
- title: 'Training a Spiking Neural Network with Equilibrium Propagation'
abstract: 'Backpropagation is almost universally used to train artificial neural networks. However, there are several reasons that backpropagation could not be plausibly implemented by biological neurons. Among these are the facts that (1) biological neurons appear to lack any mechanism for sending gradients backwards across synapses, and (2) biological “spiking” neurons emit binary signals, whereas back-propagation requires that neurons communicate continuous values between one another. Recently, Scellier and Bengio [2017], demonstrated an alternative to backpropagation, called Equilibrium Propagation, wherein gradients are implicitly computed by the dynamics of the neural network, so that neurons do not need an internal mechanism for backpropagation of gradients. This provides an interesting solution to problem (1). In this paper, we address problem (2) by proposing a way in which Equilibrium Propagation can be implemented with neurons which are constrained to just communicate binary values at each time step. We show that with appropriate step-size annealing, we can converge to the same fixed-point as a real-valued neural network, and that with predictive coding, we can make this convergence much faster. We demonstrate that the resulting model can be used to train a spiking neural network using the update scheme from Equilibrium propagation.'
volume: 89
URL: http://proceedings.mlr.press/v89/o-connor19a.html
PDF: http://proceedings.mlr.press/v89/o-connor19a/o-connor19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-o-connor19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: O’Connor
given: Peter
- family: Gavves
given: Efstratios
- family: Welling
given: Max
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1516-1523
id: o-connor19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1516
lastpage: 1523
published: 2019-04-11 00:00:00 +0000
- title: 'Learning One-hidden-layer ReLU Networks via Gradient Descent'
abstract: 'We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhang19g.html
PDF: http://proceedings.mlr.press/v89/zhang19g/zhang19g.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhang19g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhang
given: Xiao
- family: Yu
given: Yaodong
- family: Wang
given: Lingxiao
- family: Gu
given: Quanquan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1524-1534
id: zhang19g
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1524
lastpage: 1534
published: 2019-04-11 00:00:00 +0000
- title: 'Gain estimation of linear dynamical systems using Thompson Sampling'
abstract: 'We present the gain estimation problem for linear dynamical systems as a multi-armed bandit. This is particularly a very important engineering problem in control design, where performance guarantees are casted in terms of the largest gain of the frequency response of the system. The dynamical system is unknown and only noisy input-output data is available. In a more general setup, the noise perturbing the data is non-white and the variance at each frequency band is unknown, resulting in a two-dimensional Gaussian bandit model with unknown mean and scaled-identity covariance matrix. This model corresponds to a two-parameter exponential family. Within a bandit framework, the set of means is given by the frequency response of the system and, unlike traditional bandit problems, the goal here is to maximize the probability of choosing the arm drawing samples with the highest norm of its mean. A problem-dependent lower bound for the expected cumulative regret is derived and a matching upper bound is obtained for a Thompson-Sampling algorithm under a uniform prior over the variances and the two-dimensional means.'
volume: 89
URL: http://proceedings.mlr.press/v89/muller19a.html
PDF: http://proceedings.mlr.press/v89/muller19a/muller19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-muller19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Müller
given: Matias I.
- family: Rojas
given: Cristian R.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1535-1543
id: muller19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1535
lastpage: 1543
published: 2019-04-11 00:00:00 +0000
- title: 'Universal Hypothesis Testing with Kernels: Asymptotically Optimal Tests for Goodness of Fit'
abstract: 'We characterize the asymptotic performance of nonparametric goodness of fit testing. The exponential decay rate of the type-II error probability is used as the asymptotic performance metric, and a test is optimal if it achieves the maximum rate subject to a constant level constraint on the type-I error probability. We show that two classes of Maximum Mean Discrepancy (MMD) based tests attain this optimality on $\mathbb R^d$, while the quadratic-time Kernel Stein Discrepancy (KSD) based tests achieve the maximum exponential decay rate under a relaxed level constraint. Under the same performance metric, we proceed to show that the quadratic-time MMD based two-sample tests are also optimal for general two-sample problems, provided that kernels are bounded continuous and characteristic. Key to our approach are Sanov’s theorem from large deviation theory and the weak metrizable properties of the MMD and KSD.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhu19b.html
PDF: http://proceedings.mlr.press/v89/zhu19b/zhu19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhu19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhu
given: Shengyu
- family: Chen
given: Biao
- family: Yang
given: Pengfei
- family: Chen
given: Zhitang
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1544-1553
id: zhu19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1544
lastpage: 1553
published: 2019-04-11 00:00:00 +0000
- title: 'Calibrating Deep Convolutional Gaussian Processes'
abstract: 'The wide adoption of Convolutional Neural Networks CNNs in applications where decision-making under uncertainty is fundamental, has brought a great deal of attention to the ability of these models to accurately quantify the uncertainty in their predictions. Previous work on combining CNNs with Gaussian processes GPs has been developed under the assumption that the predictive probabilities of these models are well-calibrated. In this paper we show that, in fact, current combinations of CNNs and GPs are miscalibrated. We proposes a novel combination that considerably outperforms previous approaches on this aspect, while achieving state-of-the-art performance on image classification tasks.'
volume: 89
URL: http://proceedings.mlr.press/v89/tran19a.html
PDF: http://proceedings.mlr.press/v89/tran19a/tran19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-tran19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Tran
given: Gia-Lac
- family: Bonilla
given: Edwin V.
- family: Cunningham
given: John
- family: Michiardi
given: Pietro
- family: Filippone
given: Maurizio
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1554-1563
id: tran19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1554
lastpage: 1563
published: 2019-04-11 00:00:00 +0000
- title: 'Stochastic algorithms with descent guarantees for ICA'
abstract: 'Independent component analysis (ICA) is a widespread data exploration technique, where observed signals are modeled as linear mixtures of independent components. From a machine learning point of view, it amounts to a matrix factorization problem with a statistical independence criterion. Infomax is one of the most used ICA algorithms. It is based on a loss function which is a non-convex log-likelihood. We develop a new majorization-minimization framework adapted to this loss function. We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits. First, unlike most algorithms found in the literature, the proposed methods do not rely on any critical hyper-parameter like a step size, nor do they require a line-search technique. Second, the algorithm for the finite sum setting, although stochastic, guarantees a decrease of the loss function at each iteration. Experiments demonstrate progress on the state-of-the-art for large scale datasets, without the necessity for any manual parameter tuning.'
volume: 89
URL: http://proceedings.mlr.press/v89/ablin19a.html
PDF: http://proceedings.mlr.press/v89/ablin19a/ablin19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ablin19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ablin
given: Pierre
- family: Gramfort
given: Alexandre
- family: Cardoso
given: Jean-François
- family: Bach
given: Francis
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1564-1573
id: ablin19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1564
lastpage: 1573
published: 2019-04-11 00:00:00 +0000
- title: 'Sample Complexity of Sinkhorn Divergences'
abstract: 'Optimal transport (OT) and maximum mean discrepancies (MMD) are now routinely used in machine learning to compare probability measures. We focus in this paper on Sinkhorn divergences (SDs), a regularized variant of OT distances which can interpolate, depending on the regularization strength $\varepsilon$, between OT ($\varepsilon=0$) and MMD ($\varepsilon=\infty$). Although the tradeoff induced by that regularization is now well understood computationally (OT, SDs and MMD require respectively $O(n^3\log n)$, $O(n^2)$ and $n^2$ operations given a sample size $n$), much less is known in terms of their sample complexity, namely the gap between these quantities, when evaluated using finite samples vs. their respective densities. Indeed, while the sample complexity of OT and MMD stand at two extremes, $1/n^{1/d}$ for OT in dimension $d$ and $1/\sqrt{n}$ for MMD, that for SDs has only been studied empirically. In this paper, we (i) derive a bound on the approximation error made with SDs when approximating OT as a function of the regularizer $\varepsilon$, (ii) prove that the optimizers of regularized OT are bounded in a Sobolev (RKHS) ball independent of the two measures and (iii) provide the first sample complexity bound for SDs, obtained,by reformulating SDs as a maximization problem in a RKHS. We thus obtain a scaling in $1/\sqrt{n}$ (as in MMD), with a constant that depends however on $\varepsilon$, making the bridge between OT and MMD complete.'
volume: 89
URL: http://proceedings.mlr.press/v89/genevay19a.html
PDF: http://proceedings.mlr.press/v89/genevay19a/genevay19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-genevay19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Genevay
given: Aude
- family: Chizat
given: Lénaïc
- family: Bach
given: Francis
- family: Cuturi
given: Marco
- family: Peyré
given: Gabriel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1574-1583
id: genevay19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1574
lastpage: 1583
published: 2019-04-11 00:00:00 +0000
- title: 'Adaptive Gaussian Copula ABC'
abstract: 'Approximate Bayesian computation (ABC) is a set of techniques for Bayesian inference when the likelihood is intractable but sampling from the model is possible. This work presents a simple yet effective ABC algorithm based on the combination of two classical ABC approaches — regression ABC and sequential ABC. The key idea is that rather than learning the posterior directly, we first target another auxiliary distribution that can be learned accurately by existing methods, through which we then subsequently learn the desired posterior with the help of a Gaussian copula. During this process, the complexity of the model changes adaptively according to the data at hand. Experiments on a synthetic dataset as well as three real-world inference tasks demonstrates that the proposed method is fast, accurate, and easy to use.'
volume: 89
URL: http://proceedings.mlr.press/v89/chen19d.html
PDF: http://proceedings.mlr.press/v89/chen19d/chen19d.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chen19d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chen
given: Yanzhi
- family: Gutmann
given: Michael U.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1584-1592
id: chen19d
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1584
lastpage: 1592
published: 2019-04-11 00:00:00 +0000
- title: 'Top Feasible Arm Identification'
abstract: 'We propose a new variant of the top arm identification problem, \emph{top feasible arm identification}, where there are $K$ arms associated with $D$-dimensional distributions and the goal is to find $m$ arms that maximize some known linear function of their means subject to the constraint that their means belong to a given set $P \subset R^D$. This problem has many applications since in many settings, feedback is multi-dimensional and it is of interest to perform \emph{constrained maximization}. We present problem-dependent lower bounds for top feasible arm identification and upper bounds for several algorithms. Our most broadly applicable algorithm, TF-LUCB-B, has an upper bound that is loose by a factor of $O(D \log(K))$. Many problems of practical interest are two-dimensional and, for these, it is loose by a factor of $O(\log(K))$. Finally, we conduct experiments on synthetic and real-world datasets that demonstrate the effectiveness of our algorithms. Our algorithms are superior both in theory and in practice to a naive two-stage algorithm that first identifies the feasible arms and then applies a best arm identification algorithm to the feasible arms.'
volume: 89
URL: http://proceedings.mlr.press/v89/katz-samuels19a.html
PDF: http://proceedings.mlr.press/v89/katz-samuels19a/katz-samuels19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-katz-samuels19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Katz-Samuels
given: Julian
- family: Scott
given: Clayton
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1593-1601
id: katz-samuels19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1593
lastpage: 1601
published: 2019-04-11 00:00:00 +0000
- title: 'Direct Acceleration of SAGA using Sampled Negative Momentum'
abstract: 'Variance reduction is a simple and effective technique that accelerates convex (or non-convex) stochastic optimization. Among existing variance reduction methods, SVRG and SAGA adopt unbiased gradient estimators and are the most popular variance reduction methods in recent years. Although various accelerated variants of SVRG (e.g., Katyusha and Acc-Prox-SVRG) have been proposed, the direct acceleration of SAGA still remains unknown. In this paper, we propose a directly accelerated variant of SAGA using a novel Sampled Negative Momentum (SSNM), which achieves the best known oracle complexity for strongly convex problems (with known strong convexity parameter). Consequently, our work fills the void of directly accelerated SAGA.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhou19c.html
PDF: http://proceedings.mlr.press/v89/zhou19c/zhou19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhou19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhou
given: Kaiwen
- family: Ding
given: Qinghua
- family: Shang
given: Fanhua
- family: Cheng
given: James
- family: Li
given: Danli
- family: Luo
given: Zhi-Quan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1602-1610
id: zhou19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1602
lastpage: 1610
published: 2019-04-11 00:00:00 +0000
- title: 'Does data interpolation contradict statistical optimality?'
abstract: 'We show that classical learning methods interpolating the training data can achieve optimal rates for the problems of nonparametric regression and prediction with square loss.'
volume: 89
URL: http://proceedings.mlr.press/v89/belkin19a.html
PDF: http://proceedings.mlr.press/v89/belkin19a/belkin19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-belkin19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Belkin
given: Mikhail
- family: Rakhlin
given: Alexander
- family: Tsybakov
given: Alexandre B.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1611-1619
id: belkin19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1611
lastpage: 1619
published: 2019-04-11 00:00:00 +0000
- title: 'Inverting Supervised Representations with Autoregressive Neural Density Models'
abstract: 'We present a method for feature interpretation that makes use of recent advances in autoregressive density estimation models to invert model representations. We train generative inversion models to express a distribution over input features conditioned on intermediate model representations. Insights into the invariances learned by supervised models can be gained by viewing samples from these inversion models. In addition, we can use these inversion models to estimate the mutual information between a model’s inputs and its intermediate representations, thus quantifying the amount of information preserved by the network at different stages. Using this method we examine the types of information preserved at different layers of convolutional neural networks, and explore the invariances induced by different architectural choices. Finally we show that the mutual information between inputs and network layers initially increases and then decreases over the course of training, supporting recent work by Shwartz-Ziv and Tishby (2017) on the information bottleneck theory of deep learning.'
volume: 89
URL: http://proceedings.mlr.press/v89/nash19a.html
PDF: http://proceedings.mlr.press/v89/nash19a/nash19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-nash19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Nash
given: Charlie
- family: Kushman
given: Nate
- family: Williams
given: Christopher K.I.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1620-1629
id: nash19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1620
lastpage: 1629
published: 2019-04-11 00:00:00 +0000
- title: 'Connecting Weighted Automata and Recurrent Neural Networks through Spectral Learning'
abstract: 'In this paper, we unravel a fundamental connection between weighted finite automata (WFAs) and second-order recurrent neural networks (2-RNNs): in the case of sequences of discrete symbols, WFAs and 2-RNNs with linear activation functions are expressively equivalent. Motivated by this result, we build upon a recent extension of the spectral learning algorithm to vector-valued WFAs and propose the first provable learning algorithm for linear 2-RNNs defined over sequences of continuous input vectors. This algorithm relies on estimating low rank sub-blocks of the so-called Hankel tensor, from which the parameters of a linear 2-RNN can be provably recovered. The performances of the proposed method are assessed in a simulation study.'
volume: 89
URL: http://proceedings.mlr.press/v89/rabusseau19a.html
PDF: http://proceedings.mlr.press/v89/rabusseau19a/rabusseau19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-rabusseau19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Rabusseau
given: Guillaume
- family: Li
given: Tianyu
- family: Precup
given: Doina
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1630-1639
id: rabusseau19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1630
lastpage: 1639
published: 2019-04-11 00:00:00 +0000
- title: 'A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete Distributions'
abstract: 'The objective of goodness-of-fit testing is to assess whether a dataset of observations is likely to have been drawn from a candidate probability distribution. This paper presents a rank-based family of goodness-of-fit tests that is specialized to discrete distributions on high-dimensional domains. The test is readily implemented using a simulation-based, linear-time procedure. The testing procedure can be customized by the practitioner using knowledge of the underlying data domain. Unlike most existing test statistics, the proposed test statistic is distribution-free and its exact (non-asymptotic) sampling distribution is known in closed form. We establish consistency of the test against all alternatives by showing that the test statistic is distributed as a discrete uniform if and only if the samples were drawn from the candidate distribution. We illustrate its efficacy for assessing the sample quality of approximate sampling algorithms over combinatorially large spaces with intractable probabilities, including random partitions in Dirichlet process mixture models and random lattices in Ising models.'
volume: 89
URL: http://proceedings.mlr.press/v89/saad19a.html
PDF: http://proceedings.mlr.press/v89/saad19a/saad19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-saad19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Saad
given: Feras A.
- family: Freer
given: Cameron E.
- family: Ackerman
given: Nathanael L.
- family: Mansinghka
given: Vikash K.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1640-1649
id: saad19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1640
lastpage: 1649
published: 2019-04-11 00:00:00 +0000
- title: 'Differentially Private Online Submodular Minimization'
abstract: 'In this paper we develop the first algorithms for online submodular minimization that preserve differential privacy under full information feedback and bandit feedback. Our first result is in the full information setting, where the algorithm can observe the entire function after making its decision at each time step. We give an algorithm in this setting that is $\eps$-differentially private and achieves expected regret $\tilde{O}\left(\frac{n\sqrt{T}}{\eps}\right)$ over $T$ rounds for a collection of $n$ elements. Our second result is in the bandit setting, where the algorithm can only observe the cost incurred by its chosen set, and does not have access to the entire function. This setting is significantly more challenging due to the limited information. Our algorithm using bandit feedback is $\eps$-differentially private and achieves expected regret $\tilde{O}\left(\frac{n^{3/2}T^{2/3}}{\eps}\right)$.'
volume: 89
URL: http://proceedings.mlr.press/v89/cardoso19b.html
PDF: http://proceedings.mlr.press/v89/cardoso19b/cardoso19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-cardoso19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Cardoso
given: Adrian Rivera
- family: Cummings
given: Rachel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1650-1658
id: cardoso19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1650
lastpage: 1658
published: 2019-04-11 00:00:00 +0000
- title: 'Semi-supervised clustering for de-duplication'
abstract: 'Data de-duplication is the task of detecting multiple records in a database that correspond to the same real-world entity. In this work, we view de-duplication as a clustering problem where the goal is to put records corresponding to the same physical entity in the same cluster and putting records corresponding to different physical entities into different clusters. We introduce a framework which we call promise correlation clustering. Given a complete graph G with the edges labelled 0 and 1, the goal is to find a clustering that minimizes the number of 0 edges within a cluster plus the number of 1 edges across different clusters (or correlation loss). The optimal clustering can also be viewed as a complete graph $G^*$ with edges corresponding to points in the same cluster being labelled 0 and other edges being labelled 1. Under the promise that the edge difference between G and $G^*$ is “small", we prove that finding the optimal clustering (or $G^*$) is still NP-Hard. \cite{ashtiani2016clustering} introduced the framework of semi-supervised clustering, where the learning algorithm has access to an oracle, which answers whether two points belong to the same or different clusters. We further prove that even with access to a same-cluster oracle, the promise version is NP-Hard as long as the number queries to the oracle is not too large (o(n) where n is the number of vertices). Given these negative results, we consider a restricted version of correlation clustering. As before, the goal is to find a clustering that minimizes the correlation loss. However, we restrict ourselves to a given class F of clusterings. We offer a semi-supervised algorithmic approach to solve the restricted variant with success guarantees.'
volume: 89
URL: http://proceedings.mlr.press/v89/kushagra19a.html
PDF: http://proceedings.mlr.press/v89/kushagra19a/kushagra19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kushagra19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kushagra
given: Shrinu
- family: Ben-David
given: Shai
- family: Ilyas
given: Ihab
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1659-1667
id: kushagra19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1659
lastpage: 1667
published: 2019-04-11 00:00:00 +0000
- title: 'Finding the bandit in a graph: Sequential search-and-stop'
abstract: 'We consider the problem where an agent wants to find a hidden object that is randomly located in some vertex of a directed acyclic graph (DAG) according to a fixed but possibly unknown distribution. The agent can only examine vertices whose in-neighbors have already been examined. In this paper, we address a learning setting where we allow the agent to stop before having found the object and restart searching on a new independent instance of the same problem. Our goal is to maximize the total number of hidden objects found given a time budget. The agent can thus skip an instance after realizing that it would spend too much time on it. Our contributions are both to the search theory and multi-armed bandits. If the distribution is known, we provide a quasi-optimal and efficient stationary strategy. If the distribution is unknown, we additionally show how to sequentially approximate it and, at the same time, act near-optimally in order to collect as many hidden objects as possible.'
volume: 89
URL: http://proceedings.mlr.press/v89/perrault19a.html
PDF: http://proceedings.mlr.press/v89/perrault19a/perrault19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-perrault19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Perrault
given: Pierre
- family: Perchet
given: Vianney
- family: Valko
given: Michal
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1668-1677
id: perrault19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1668
lastpage: 1677
published: 2019-04-11 00:00:00 +0000
- title: 'Statistical Learning under Nonstationary Mixing Processes'
abstract: 'We study a special case of the problem of statistical learning without the i.i.d. assumption. Specifically, we suppose a learning method is presented with a sequence of data points, and required to make a prediction (e.g., a classification) for each one, and can then observe the loss incurred by this prediction. We go beyond traditional analyses, which have focused on stationary mixing processes or nonstationary product processes, by combining these two relaxations to allow nonstationary mixing processes. We are particularly interested in the case of $\beta$-mixing processes, with the sum of changes in marginal distributions growing sublinearly in the number of samples. Under these conditions, we propose a learning method, and establish that for bounded VC subgraph classes, the cumulative excess risk grows sublinearly in the number of predictions, at a quantified rate.'
volume: 89
URL: http://proceedings.mlr.press/v89/hanneke19a.html
PDF: http://proceedings.mlr.press/v89/hanneke19a/hanneke19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hanneke19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hanneke
given: Steve
- family: Yang
given: Liu
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1678-1686
id: hanneke19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1678
lastpage: 1686
published: 2019-04-11 00:00:00 +0000
- title: 'On Structure Priors for Learning Bayesian Networks'
abstract: 'To learn a Bayesian network structure from data, one popular approach is to maximize a decomposable likelihood-based score. While various scores have been proposed, they usually assume a uniform prior, or “penalty,” over the possible directed acyclic graphs (DAGs); relatively little attention has been paid to alternative priors. We investigate empirically several structure priors in combination with different scores, using benchmark data sets and data sets generated from benchmark networks. Our results suggest that, in practice, priors that strongly favor sparsity perform significantly better than the uniform prior or even the informed variant that is conditioned on the correct number of parents for each node. For an analytic comparison of different priors, we generalize a known recurrence equation for the number of DAGs to accommodate modular weightings of DAGs, a result that is also of independent interest.'
volume: 89
URL: http://proceedings.mlr.press/v89/eggeling19a.html
PDF: http://proceedings.mlr.press/v89/eggeling19a/eggeling19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-eggeling19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Eggeling
given: Ralf
- family: Viinikka
given: Jussi
- family: Vuoksenmaa
given: Aleksis
- family: Koivisto
given: Mikko
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1687-1695
id: eggeling19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1687
lastpage: 1695
published: 2019-04-11 00:00:00 +0000
- title: 'Partial Optimality of Dual Decomposition for MAP Inference in Pairwise MRFs'
abstract: 'Markov random fields (MRFs) are a powerful tool for modelling statistical dependencies for a set of random variables using a graphical representation. An important computational problem related to MRFs, called maximum a posteriori (MAP) inference, is finding a joint variable assignment with the maximal probability. It is well known that the two popular optimisation techniques for this task, linear programming (LP) relaxation and dual decomposition (DD), have a strong connection both providing an optimal solution to the MAP problem when a corresponding LP relaxation is tight. However, less is known about their relationship in the opposite and more realistic case. In this paper, we explain how the fully integral assignments obtained via DD partially agree with the optimal fractional assignments via LP relaxation when the latter is not tight. In particular, for binary pairwise MRFs the corresponding result suggests that both methods share the partial optimality property of their solutions.'
volume: 89
URL: http://proceedings.mlr.press/v89/bauer19b.html
PDF: http://proceedings.mlr.press/v89/bauer19b/bauer19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-bauer19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Bauer
given: Alexander
- family: Nakajima
given: Shinichi
- family: Goernitz
given: Nico
- family: Müller
given: Klaus-Robert
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1696-1703
id: bauer19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1696
lastpage: 1703
published: 2019-04-11 00:00:00 +0000
- title: 'Sparse Feature Selection in Kernel Discriminant Analysis via Optimal Scoring'
abstract: 'We consider the two-group classification problem and propose a kernel classifier based on the optimal scoring framework. Unlike previous approaches, we provide theoretical guarantees on the expected risk consistency of the method. We also allow for feature selection by imposing structured sparsity using weighted kernels. We propose fully-automated methods for selection of all tuning parameters, and in particular adapt kernel shrinkage ideas for ridge parameter selection. Numerical studies demonstrate the superior classification performance of the proposed approach compared to existing nonparametric classifiers.'
volume: 89
URL: http://proceedings.mlr.press/v89/lapanowski19a.html
PDF: http://proceedings.mlr.press/v89/lapanowski19a/lapanowski19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lapanowski19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lapanowski
given: Alexander F.
- family: Gaynanova
given: Irina
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1704-1713
id: lapanowski19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1704
lastpage: 1713
published: 2019-04-11 00:00:00 +0000
- title: 'Learning Natural Programs from a Few Examples in Real-Time'
abstract: 'Programming by examples (PBE) is a rapidly growing subfield of AI, that aims to synthesize user-intended programs using input-output examples from the task. As users can provide only a few I/O examples, capturing user-intent accurately and ranking user-intended programs over other programs is challenging even in the simplest of the domains. Commercially deployed PBE systems often require years of engineering effort and domain expertise to devise ranking heuristics for real-time synthesis of accurate programs. But such heuristics may not cater to new domains, or even to a different segment of users from the same domain. In this work, we develop a novel, real-time, ML-based program ranking algorithm that enables synthesis of natural, user-intended, personalized programs. We make two key technical contributions: 1) a new technique to embed programs in a vector space making them amenable to ML-formulations, 2) a novel formulation that interleaves program search with ranking, enabling real-time synthesis of accurate user-intended programs. We implement our solution in the state-of-the-art PROSE framework. The proposed approach learns the intended program with just {\em one} I/O example in a variety of real-world string/date/number manipulation tasks, and outperforms state-of-the-art neural synthesis methods along multiple metrics.'
volume: 89
URL: http://proceedings.mlr.press/v89/natarajan19a.html
PDF: http://proceedings.mlr.press/v89/natarajan19a/natarajan19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-natarajan19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Natarajan
given: Nagarajan
- family: Simmons
given: Danny
- family: Datha
given: Naren
- family: Jain
given: Prateek
- family: Gulwani
given: Sumit
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1714-1722
id: natarajan19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1714
lastpage: 1722
published: 2019-04-11 00:00:00 +0000
- title: 'Truncated Back-propagation for Bilevel Optimization'
abstract: 'Bilevel optimization has been recently revisited for designing and analyzing algorithms in hyperparameter tuning and meta learning tasks. However, due to its nested structure, evaluating exact gradients for high-dimensional problems is computationally challenging. One heuristic to circumvent this difficulty is to use the approximate gradient given by performing truncated back-propagation through the iterative optimization procedure that solves the lower-level problem. Although promising empirical performance has been reported, its theoretical properties are still unclear. In this paper, we analyze the properties of this family of approximate gradients and establish sufficient conditions for convergence. We validate this on several hyperparameter tuning and meta learning tasks. We find that optimization with the approximate gradient computed using few-step back-propagation often performs comparably to optimization with the exact gradient, while requiring far less memory and half the computation time.'
volume: 89
URL: http://proceedings.mlr.press/v89/shaban19a.html
PDF: http://proceedings.mlr.press/v89/shaban19a/shaban19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-shaban19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Shaban
given: Amirreza
- family: Cheng
given: Ching-An
- family: Hatch
given: Nathan
- family: Boots
given: Byron
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1723-1732
id: shaban19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1723
lastpage: 1732
published: 2019-04-11 00:00:00 +0000
- title: 'Empirical Risk Minimization and Stochastic Gradient Descent for Relational Data'
abstract: 'Empirical risk minimization is the main tool for prediction problems, but its extension to relational data remains unsolved. We solve this problem using recent ideas from graph sampling theory to (i) define an empirical risk for relational data and (ii) obtain stochastic gradients for this empirical risk that are automatically unbiased. This is achieved by considering the method by which data is sampled from a graph as an explicit component of model design. By integrating fast implementations of graph sampling schemes with standard automatic differentiation tools, we provide an efficient turnkey solver for the risk minimization problem. We establish basic theoretical properties of the procedure. Finally, we demonstrate relational ERM with application to two non-standard problems: one-stage training for semi-supervised node classification, and learning embedding vectors for vertex attributes. Experiments confirm that the turnkey inference procedure is effective in practice, and that the sampling scheme used for model specification has a strong effect on model performance.'
volume: 89
URL: http://proceedings.mlr.press/v89/veitch19a.html
PDF: http://proceedings.mlr.press/v89/veitch19a/veitch19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-veitch19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Veitch
given: Victor
- family: Austern
given: Morgane
- family: Zhou
given: Wenda
- family: Blei
given: David M.
- family: Orbanz
given: Peter
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1733-1742
id: veitch19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1733
lastpage: 1742
published: 2019-04-11 00:00:00 +0000
- title: 'Variable selection for Gaussian processes via sensitivity analysis of the posterior predictive distribution'
abstract: 'Variable selection for Gaussian process models is often done using automatic relevance determination, which uses the inverse length-scale parameter of each input variable as a proxy for variable relevance. This implicitly determined relevance has several drawbacks that prevent the selection of optimal input variables in terms of predictive performance. To improve on this, we propose two novel variable selection methods for Gaussian process models that utilize the predictions of a full model in the vicinity of the training points and thereby rank the variables based on their predictive relevance. Our empirical results on synthetic and real world data sets demonstrate improved variable selection compared to automatic relevance determination in terms of variability and predictive performance.'
volume: 89
URL: http://proceedings.mlr.press/v89/paananen19a.html
PDF: http://proceedings.mlr.press/v89/paananen19a/paananen19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-paananen19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Paananen
given: Topi
- family: Piironen
given: Juho
- family: Andersen
given: Michael Riis
- family: Vehtari
given: Aki
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1743-1752
id: paananen19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1743
lastpage: 1752
published: 2019-04-11 00:00:00 +0000
- title: 'Lifted Weight Learning of Markov Logic Networks Revisited'
abstract: 'We study lifted weight learning of Markov logic networks. We show that there is an algorithm for maximum-likelihood learning of 2-variable Markov logic networks which runs in time polynomial in the domain size. Our results are based on existing lifted-inference algorithms and recent algorithmic results on computing maximum entropy distributions.'
volume: 89
URL: http://proceedings.mlr.press/v89/kuzelka19a.html
PDF: http://proceedings.mlr.press/v89/kuzelka19a/kuzelka19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kuzelka19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kuzelka
given: Ondrej
- family: Kungurtsev
given: Vyacheslav
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1753-1761
id: kuzelka19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1753
lastpage: 1761
published: 2019-04-11 00:00:00 +0000
- title: 'Causal Discovery in the Presence of Missing Data'
abstract: 'Missing data are ubiquitous in many domains such as healthcare. When these data entries are not missing completely at random, the (conditional) independence relations in the observed data may be different from those in the complete data generated by the underlying causal process. Consequently, simply applying existing causal discovery methods to the observed data may lead to wrong conclusions. In this paper, we aim at developing a causal discovery method to recover the underlying causal structure from observed data that are missing under different mechanisms, including missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR). With missingness mechanisms represented by missingness graphs (m-graphs), we analyze conditions under which additional correction is needed to derive conditional independence/dependence relations in the complete data. Based on our analysis, we propose Missing Value PC (MVPC), which extends the PC algorithm to incorporate additional corrections. Our proposed MVPC is shown in theory to give asymptotically correct results even on data that are MAR or MNAR. Experimental results on both synthetic data and real healthcare applications illustrate that the proposed algorithm is able to find correct causal relations even in the general case of MNAR.'
volume: 89
URL: http://proceedings.mlr.press/v89/tu19a.html
PDF: http://proceedings.mlr.press/v89/tu19a/tu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-tu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Tu
given: Ruibo
- family: Zhang
given: Cheng
- family: Ackermann
given: Paul
- family: Mohan
given: Karthika
- family: Kjellström
given: Hedvig
- family: Zhang
given: Kun
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1762-1770
id: tu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1762
lastpage: 1770
published: 2019-04-11 00:00:00 +0000
- title: 'Learning Tree Structures from Noisy Data'
abstract: 'We provide high-probability sample complexity guarantees for exact structure recovery of tree-structured graphical models, when only noisy observations of the respective vertex emissions are available. We assume that the hidden variables follow either an Ising model or a Gaussian graphical model, and the observables are noise-corrupted versions of the hidden variables: We consider multiplicative $\pm 1$ binary noise for Ising models, and additive Gaussian noise for Gaussian models. Such hidden models arise naturally in a variety of applications such as physics, biology, computer science, and finance. We study the impact of measurement noise on the task of learning the underlying tree structure via the well-known \textit{Chow-Liu algorithm} and provide formal sample complexity guarantees for exact recovery. In particular, for a tree with $p$ vertices and probability of failure $\delta>0$, we show that the number of necessary samples for exact structure recovery is of the order of $\mc{O}(\log(p/\delta))$ for Ising models (which remains the \textit{same as in the noiseless case}), and $\mc{O}(\mathrm{polylog}{(p/\delta)})$ for Gaussian models.'
volume: 89
URL: http://proceedings.mlr.press/v89/nikolakakis19a.html
PDF: http://proceedings.mlr.press/v89/nikolakakis19a/nikolakakis19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-nikolakakis19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Nikolakakis
given: Konstantinos E.
- family: Kalogerias
given: Dionysios S.
- family: Sarwate
given: Anand D.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1771-1782
id: nikolakakis19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1771
lastpage: 1782
published: 2019-04-11 00:00:00 +0000
- title: 'Active multiple matrix completion with adaptive confidence sets'
abstract: 'We address the problem of an active setting for a matrix completion, where the learner can choose, from which matrix, it receives a sample (drawn uniformly at random). Our main practical motivation is the market segmentation, where the matrices are different regions with different preferences of the customers. The challenge in this setting is that each of the matrices can be of a different size and also of a different rank. We provide and analyze a new algorithm, MAlocate that is able to adapt to the ranks of the different matrices. We also prove a lower-bound showing that our strategy is minimax-optimal, and we demonstrate its performance with synthetic experiments.'
volume: 89
URL: http://proceedings.mlr.press/v89/locatelli19a.html
PDF: http://proceedings.mlr.press/v89/locatelli19a/locatelli19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-locatelli19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Locatelli
given: Andrea
- family: Carpentier
given: Alexandra
- family: Valko
given: Michal
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1783-1791
id: locatelli19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1783
lastpage: 1791
published: 2019-04-11 00:00:00 +0000
- title: 'Confidence-based Graph Convolutional Networks for Semi-Supervised Learning'
abstract: 'Predicting properties of nodes in a graph is an important problem with applications in a variety of domains. Graph-based Semi Supervised Learning (SSL) methods aim to address this problem by labeling a small subset of the nodes as seeds, and then utilizing the graph structure to predict label scores for the rest of the nodes in the graph. Recently, Graph Convolutional Networks (GCNs) have achieved impressive performance on the graph-based SSL task. In addition to label scores, it is also desirable to have confidence scores associated with them. Unfortunately, confidence estimation in the context of GCN has not been previously explored. We fill this important gap in this paper and propose ConfGCN, which estimates labels scores along with their confidences jointly in GCN-based setting. ConfGCN uses these estimated confidences to determine the influence of one node on another during neighborhood aggregation, thereby acquiring anisotropic capabilities. Through extensive analysis and experiments on standard benchmarks, we find that ConfGCN is able to outperform state-of-the-art baselines. We have made ConfGCN’s source code available to encourage reproducible research.'
volume: 89
URL: http://proceedings.mlr.press/v89/vashishth19a.html
PDF: http://proceedings.mlr.press/v89/vashishth19a/vashishth19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-vashishth19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Vashishth
given: Shikhar
- family: Yadav
given: Prateek
- family: Bhandari
given: Manik
- family: Talukdar
given: Partha
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1792-1801
id: vashishth19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1792
lastpage: 1801
published: 2019-04-11 00:00:00 +0000
- title: 'Negative Momentum for Improved Game Dynamics'
abstract: 'Games generalize the single-objective optimization paradigm by introducing different objective functions for different players. Differentiable games often proceed by simultaneous or alternating gradient updates. In machine learning, games are gaining new importance through formulations like generative adversarial networks (GANs) and actor-critic systems. However, compared to single-objective optimization, game dynamics is more complex and less understood. In this paper, we analyze gradient-based methods with momentum on simple games. We prove that alternating updates are more stable than simultaneous updates. Next, we show both theoretically and empirically that alternating gradient updates with a negative momentum term achieves convergence in a difficult toy adversarial problem, but also on the notoriously difficult to train saturating GANs.'
volume: 89
URL: http://proceedings.mlr.press/v89/gidel19a.html
PDF: http://proceedings.mlr.press/v89/gidel19a/gidel19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-gidel19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Gidel
given: Gauthier
- family: Hemmat
given: Reyhane Askari
- family: Pezeshki
given: Mohammad
- family: Priol
given: Rémi Le
- family: Huang
given: Gabriel
- family: Lacoste-Julien
given: Simon
- family: Mitliagkas
given: Ioannis
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1802-1811
id: gidel19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1802
lastpage: 1811
published: 2019-04-11 00:00:00 +0000
- title: 'Deep learning with differential Gaussian process flows'
abstract: 'We propose a novel deep learning paradigm of differential flows that learn a stochastic differential equation transformations of inputs prior to a standard classification or regression function. The key property of differential Gaussian processes is the warping of inputs through infinitely deep, but infinitesimal, differential fields, that generalise discrete layers into a dynamical system. We demonstrate excellent results as compared to deep Gaussian processes and Bayesian neural networks.'
volume: 89
URL: http://proceedings.mlr.press/v89/hegde19a.html
PDF: http://proceedings.mlr.press/v89/hegde19a/hegde19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hegde19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hegde
given: Pashupati
- family: Heinonen
given: Markus
- family: Lähdesmäki
given: Harri
- family: Kaski
given: Samuel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1812-1821
id: hegde19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1812
lastpage: 1821
published: 2019-04-11 00:00:00 +0000
- title: 'Data-dependent compression of random features for large-scale kernel approximation'
abstract: 'Kernel methods offer the flexibility to learn complex relationships in modern, large data sets while enjoying strong theoretical guarantees on quality. Unfortunately, these methods typically require cubic running time in the data set size, a prohibitive cost in the large- data setting. Random feature maps (RFMs) and the Nystroöm method both consider low- rank approximations to the kernel matrix as a potential solution. But, in order to achieve desirable theoretical guarantees, the former may require a prohibitively large number of features J+, and the latter may be prohibitively expensive for high-dimensional problems. We propose to combine the simplicity and generality of RFMs with a data-dependent feature selection scheme to achieve desirable theoretical approximation properties of Nyström with just $O(\log J+)$ features. Our key insight is to begin with a large set of random features, then reduce them to a small number of weighted features in a data-dependent, computationally efficient way, while preserving the statistical guarantees of using the original large set of features. We demonstrate the efficacy of our method with theory and experiments-including on a data set with over 50 million observations. In particular, we show that our method achieves small kernel matrix approximation error and better test set accuracy with provably fewer random features than state-of-the-art methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/agrawal19a.html
PDF: http://proceedings.mlr.press/v89/agrawal19a/agrawal19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-agrawal19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Agrawal
given: Raj
- family: Campbell
given: Trevor
- family: Huggins
given: Jonathan
- family: Broderick
given: Tamara
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1822-1831
id: agrawal19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1822
lastpage: 1831
published: 2019-04-11 00:00:00 +0000
- title: 'Large-Margin Classification in Hyperbolic Space'
abstract: 'Representing data in hyperbolic space can effectively capture latent hierarchical relationships. To enable accurate classification of points in hyperbolic space while respecting their hyperbolic geometry, we introduce hyperbolic SVM, a hyperbolic formulation of support vector machine classifiers, and describe its theoretical connection to the Euclidean counterpart. We also generalize Euclidean kernel SVM to hyperbolic space, allowing nonlinear hyperbolic decision boundaries and providing a geometric interpretation for a certain class of indefinite kernels. Hyperbolic SVM improves classification accuracy in simulation and in real-world problems involving complex networks and word embeddings. Our work enables end-to-end analyses based on the inherent hyperbolic geometry of the data without resorting to ill-fitting tools developed for Euclidean space.'
volume: 89
URL: http://proceedings.mlr.press/v89/cho19a.html
PDF: http://proceedings.mlr.press/v89/cho19a/cho19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-cho19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Cho
given: Hyunghoon
- family: DeMeo
given: Benjamin
- family: Peng
given: Jian
- family: Berger
given: Bonnie
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1832-1840
id: cho19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1832
lastpage: 1840
published: 2019-04-11 00:00:00 +0000
- title: 'Generalizing the theory of cooperative inference'
abstract: 'Cooperation information sharing is important to theories of human learning and has potential implications for machine learning. Prior work derived conditions for achieving optimal Cooperative Inference given strong, relatively restrictive assumptions. We relax these assumptions by demonstrating convergence for any discrete joint distribution, robustness through equivalence classes and stability under perturbation, and effectiveness by deriving bounds from structural properties of the original joint distribution. We provide geometric interpretations, connections to and implications for optimal transport, and connections to importance sampling, and conclude by outlining open questions and challenges to realizing the promise of Cooperative Inference.'
volume: 89
URL: http://proceedings.mlr.press/v89/wang19c.html
PDF: http://proceedings.mlr.press/v89/wang19c/wang19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-wang19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Wang
given: Pei
- family: Paranamana
given: Pushpi
- family: Shafto
given: Patrick
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1841-1850
id: wang19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1841
lastpage: 1850
published: 2019-04-11 00:00:00 +0000
- title: 'MaxHedge: Maximizing a Maximum Online'
abstract: 'We introduce a new online learning framework where, at each trial, the learner is required to select a subset of actions from a given known action set. Each action is associated with an energy value, a reward and a cost. The sum of the energies of the actions selected cannot exceed a given energy budget. The goal is to maximise the cumulative profit, where the profit obtained on a single trial is defined as the difference between the maximum reward among the selected actions and the sum of their costs. Action energy values and the budget are known and fixed. All rewards and costs associated with each action change over time and are revealed at each trial only after the learner’s selection of actions. Our framework encompasses several online learning problems where the environment changes over time; and the solution trades-off between minimising the costs and maximising the maximum reward of the selected subset of actions, while being constrained to an action energy budget. The algorithm that we propose is efficient and general that may be specialised to multiple natural online combinatorial problems.'
volume: 89
URL: http://proceedings.mlr.press/v89/pasteris19a.html
PDF: http://proceedings.mlr.press/v89/pasteris19a/pasteris19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-pasteris19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Pasteris
given: Stephen
- family: Vitale
given: Fabio
- family: Chan
given: Kevin
- family: Wang
given: Shiqiang
- family: Herbster
given: Mark
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1851-1859
id: pasteris19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1851
lastpage: 1859
published: 2019-04-11 00:00:00 +0000
- title: 'The Gaussian Process Autoregressive Regression Model (GPAR)'
abstract: 'Multi-output regression models must exploit dependencies between outputs to maximise predictive performance. The application of Gaussian processes (GPs) to this setting typically yields models that are computationally demanding and have limited representational power. We present the Gaussian Process Autoregressive Regression (GPAR) model, a scalable multi-output GP model that is able to capture nonlinear, possibly input-varying, dependencies between outputs in a simple and tractable way: the product rule is used to decompose the joint distribution over the outputs into a set of conditionals, each of which is modelled by a standard GP. GPAR’s efficacy is demonstrated on a variety of synthetic and real-world problems, outperforming existing GP models and achieving state-of-the-art performance on established benchmarks.'
volume: 89
URL: http://proceedings.mlr.press/v89/requeima19a.html
PDF: http://proceedings.mlr.press/v89/requeima19a/requeima19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-requeima19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Requeima
given: James
- family: Tebbutt
given: William
- family: Bruinsma
given: Wessel
- family: Turner
given: Richard E.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1860-1869
id: requeima19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1860
lastpage: 1869
published: 2019-04-11 00:00:00 +0000
- title: 'Towards Optimal Transport with Global Invariances'
abstract: 'Many problems in machine learning involve calculating correspondences between sets of objects, such as point clouds or images. Discrete optimal transport provides a natural and successful approach to such tasks whenever the two sets of objects can be represented in the same space, or at least distances between them can be directly evaluated. Unfortunately neither requirement is likely to hold when object representations are learned from data. Indeed, automatically derived representations such as word embeddings are typically fixed only up to some global transformations, for example, reflection or rotation. As a result, pairwise distances across two such instances are ill-defined without specifying their relative transformation. In this work, we propose a general framework for optimal transport in the presence of latent global transformations. We cast the problem as a joint optimization over transport couplings and transformations chosen from a flexible class of invariances, propose algorithms to solve it, and show promising results in various tasks, including a popular unsupervised word translation benchmark.'
volume: 89
URL: http://proceedings.mlr.press/v89/alvarez-melis19a.html
PDF: http://proceedings.mlr.press/v89/alvarez-melis19a/alvarez-melis19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-alvarez-melis19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Alvarez-Melis
given: David
- family: Jegelka
given: Stefanie
- family: Jaakkola
given: Tommi S.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1870-1879
id: alvarez-melis19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1870
lastpage: 1879
published: 2019-04-11 00:00:00 +0000
- title: 'Unsupervised Alignment of Embeddings with Wasserstein Procrustes'
abstract: 'We consider the task of aligning two sets of points in high dimension, which has many applications in natural language processing and computer vision. As an example, it was recently shown that it is possible to infer a bilingual lexicon, without supervised data, by aligning word embeddings trained on monolingual data. These recent advances are based on adversarial training to learn the mapping between the two embeddings. In this paper, we propose to use an alternative formulation, based on the joint estimation of an orthogonal matrix and a permutation matrix. While this problem is not convex, we propose to initialize our optimization algorithm by using a convex relaxation, traditionally considered for the graph isomorphism problem. We propose a stochastic algorithm to minimize our cost function on large scale problems. Finally, we evaluate our method on the problem of unsupervised word translation, by aligning word embeddings trained on monolingual data. On this task, our method obtains state of the art results, while requiring less computational resources than competing approaches.'
volume: 89
URL: http://proceedings.mlr.press/v89/grave19a.html
PDF: http://proceedings.mlr.press/v89/grave19a/grave19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-grave19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Grave
given: Edouard
- family: Joulin
given: Armand
- family: Berthet
given: Quentin
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1880-1890
id: grave19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1880
lastpage: 1890
published: 2019-04-11 00:00:00 +0000
- title: 'Sequential Patient Recruitment and Allocation for Adaptive Clinical Trials'
abstract: 'Randomized Controlled Trials (RCTs) are the gold standard for comparing the effectiveness of a new treatment to the current one (the control). Most RCTs allocate the patients to the treatment group and the control group by uniform randomization. We show that this procedure can be highly sub-optimal (in terms of learning) if – as is often the case – patients can be recruited in cohorts (rather than all at once), the effects on each cohort can be observed before recruiting the next cohort, and the effects are heterogeneous across identifiable subgroups of patients. We formulate the patient allocation problem as a finite stage Markov Decision Process in which the objective is to minimize a given weighted combination of type-I and type-II errors. Because finding the exact solution to this Markov Decision Process is computationally intractable, we propose an algorithm Knowledge Gradient for Randomized Controlled Trials (RCT-KG) – that yields an approximate solution. Our experiment on a synthetic dataset with Bernoulli outcomes shows that for a given size of trial our method achieves significant reduction in error, and to achieve a prescribed level of confidence (in identifying whether the treatment is superior to the control), our method requires many fewer patients.'
volume: 89
URL: http://proceedings.mlr.press/v89/atan19a.html
PDF: http://proceedings.mlr.press/v89/atan19a/atan19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-atan19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Atan
given: Onur
- family: Zame
given: William R.
- family: Schaar
given: Mihaela
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1891-1900
id: atan19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1891
lastpage: 1900
published: 2019-04-11 00:00:00 +0000
- title: 'Probabilistic Forecasting with Spline Quantile Function RNNs'
abstract: 'In this paper, we propose a flexible method for probabilistic modeling with conditional quantile functions using monotonic regression splines. The shape of the spline is parameterized by a neural network whose parameters are learned by minimizing the continuous ranked probability score. Within this framework, we propose a method for probabilistic time series forecasting, which combines the modeling capacity of recurrent neural networks with the flexibility of a spline-based representation of the output distribution. Unlike methods based on parametric probability density functions and maximum likelihood estimation, the proposed method can flexibly adapt to different output distributions without manual intervention. We empirically demonstrate the effectiveness of the approach on synthetic and real-world data sets.'
volume: 89
URL: http://proceedings.mlr.press/v89/gasthaus19a.html
PDF: http://proceedings.mlr.press/v89/gasthaus19a/gasthaus19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-gasthaus19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Gasthaus
given: Jan
- family: Benidis
given: Konstantinos
- family: Wang
given: Yuyang
- family: Rangapuram
given: Syama Sundar
- family: Salinas
given: David
- family: Flunkert
given: Valentin
- family: Januschowski
given: Tim
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1901-1910
id: gasthaus19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1901
lastpage: 1910
published: 2019-04-11 00:00:00 +0000
- title: 'Exponential Weights on the Hypercube in Polynomial Time'
abstract: 'We study a general online linear optimization problem(OLO). At each round, a subset of objects from a fixed universe of $n$ objects is chosen, and a linear cost associated with the chosen subset is incurred. To measure the performance of our algorithms, we use the notion of regret which is the difference between the total cost incurred over all iterations and the cost of the best fixed subset in hindsight. We consider Full Information and Bandit feedback for this problem. This problem is equivalent to OLO on the $\{0,1\}^n$ hypercube. The Exp2 algorithm and its bandit variant are commonly used strategies for this problem. It was previously unknown if it is possible to run Exp2 on the hypercube in polynomial time. In this paper, we present a polynomial time algorithm called PolyExp for OLO on the hypercube. We show that our algorithm is equivalent Exp2 on $\{0,1\}^n$, Online Mirror Descent(OMD), Follow The Regularized Leader(FTRL) and Follow The Perturbed Leader(FTPL) algorithms. We show PolyExp achieves expected regret bound that is a factor of $\sqrt{n}$ better than Exp2 in the full information setting under $L_\infty$ adversarial losses. Because of the equivalence of these algorithms, this implies an improvement on Exp2’s regret bound in full information. We also show matching regret lower bounds. Finally, we show how to use PolyExp on the $\{-1,+1\}^n$ hypercube, solving an open problem in Bubeck et al (COLT 2012).'
volume: 89
URL: http://proceedings.mlr.press/v89/putta19a.html
PDF: http://proceedings.mlr.press/v89/putta19a/putta19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-putta19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Putta
given: Sudeep Raja
- family: Shetty
given: Abhishek
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1911-1919
id: putta19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1911
lastpage: 1919
published: 2019-04-11 00:00:00 +0000
- title: 'Sharp Analysis of Learning with Discrete Losses'
abstract: 'The problem of devising learning strategies for discrete losses (e.g., multilabeling, ranking) is currently addressed with methods and theoretical analyses ad-hoc for each loss. In this paper we study a least-squares framework to systematically design learning algorithms for discrete losses, with quantitative characterizations in terms of statistical and computational complexity. In particular, we improve existing results by providing explicit dependence on the number of labels for a wide class of losses and faster learning rates in conditions of low-noise. Theoretical results are complemented with experiments on real datasets, showing the effectiveness of the proposed general approach.'
volume: 89
URL: http://proceedings.mlr.press/v89/nowak19a.html
PDF: http://proceedings.mlr.press/v89/nowak19a/nowak19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-nowak19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Nowak
given: Alex
- family: Bach
given: Francis
- family: Rudi
given: Alessandro
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1920-1929
id: nowak19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1920
lastpage: 1929
published: 2019-04-11 00:00:00 +0000
- title: 'Designing Optimal Binary Rating Systems'
abstract: 'Modern online platforms rely on effective rating systems to learn about items. We consider the optimal design of rating systems that collect binary feedback after transactions. We make three contributions. First, we formalize the performance of a rating system as the speed with which it recovers the true underlying ranking on items (in a large deviations sense), accounting for both items’ underlying match rates and the platform’s preferences. Second, we provide an efficient algorithm to compute the binary feedback system that yields the highest such performance. Finally, we show how this theoretical perspective can be used to empirically design an implementable, approximately optimal rating system, and validate our approach using real-world experimental data collected on Amazon Mechanical Turk.'
volume: 89
URL: http://proceedings.mlr.press/v89/garg19a.html
PDF: http://proceedings.mlr.press/v89/garg19a/garg19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-garg19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Garg
given: Nikhil
- family: Johari
given: Ramesh
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1930-1939
id: garg19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1930
lastpage: 1939
published: 2019-04-11 00:00:00 +0000
- title: 'Stochastic Negative Mining for Learning with Large Output Spaces'
abstract: 'We consider the problem of retrieving the most relevant labels for a given input when the size of the output space is very large. Retrieval methods are modeled as set-valued classifiers which output a small set of classes for each input, and a mistake is made if the label is not in the output set. Despite its practical importance, a statistically principled, yet practical solution to this problem is largely missing. To this end, we first define a family of surrogate losses and show that they are calibrated and convex under certain conditions on the loss parameters and data distribution, thereby establishing a statistical and analytical basis for using these losses. Furthermore, we identify a particularly intuitive class of loss functions in the aforementioned family and show that they are amenable to practical implementation in the large output space setting (i.e. computation is possible without evaluating scores of all labels) by developing a technique called Stochastic Negative Mining. We also provide generalization error bounds for the losses in the family. Finally, we conduct experiments which demonstrate that Stochastic Negative Mining yields benefits over commonly used negative sampling approaches.'
volume: 89
URL: http://proceedings.mlr.press/v89/reddi19a.html
PDF: http://proceedings.mlr.press/v89/reddi19a/reddi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-reddi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Reddi
given: Sashank J.
- family: Kale
given: Satyen
- family: Yu
given: Felix
- family: Holtmann-Rice
given: Daniel
- family: Chen
given: Jiecao
- family: Kumar
given: Sanjiv
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1940-1949
id: reddi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1940
lastpage: 1949
published: 2019-04-11 00:00:00 +0000
- title: 'Learning One-hidden-layer Neural Networks under General Input Distributions'
abstract: 'Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points. However, existing approaches to address this issue crucially rely on a restrictive assumption: the training data is drawn from a Gaussian distribution. In this paper, we provide a novel unified framework to design loss functions with desirable landscape properties for a wide range of general input distributions. On these loss functions, remarkably, stochastic gradient descent theoretically recovers the true parameters with \emph{global} initializations and empirically outperforms the existing approaches. Our loss function design bridges the notion of score functions with the topic of neural network optimization. Central to our approach is the task of estimating the score function from samples, which is of basic and independent interest to theoretical statistics. Traditional estimation methods (example: kernel based) fail right at the outset; we bring statistical methods of local likelihood to design a novel estimator of score functions, that provably adapts to the local geometry of the unknown density.'
volume: 89
URL: http://proceedings.mlr.press/v89/gao19b.html
PDF: http://proceedings.mlr.press/v89/gao19b/gao19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-gao19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Gao
given: Weihao
- family: Makkuva
given: Ashok V.
- family: Oh
given: Sewoong
- family: Viswanath
given: Pramod
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1950-1959
id: gao19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1950
lastpage: 1959
published: 2019-04-11 00:00:00 +0000
- title: 'A Geometric Perspective on the Transferability of Adversarial Directions'
abstract: 'State-of-the-art machine learning models frequently misclassify inputs that have been perturbed in an adversarial manner. Adversarial perturbations generated for a given input and a specific classifier often seem to be effective on other inputs and even different classifiers. In other words, adversarial perturbations seem to transfer between different inputs, models, and even different neural network architectures. In this work, we show that in the context of linear classifiers and two-layer ReLU networks, there provably exist directions that give rise to adversarial perturbations for many classifiers and data points simultaneously. We show that these “transferable adversarial directions” are guaranteed to exist for linear separators of a given set, and will exist with high probability for linear classifiers trained on independent sets drawn from the same distribution. We extend our results to large classes of two-layer ReLU networks. We further show that adversarial directions for ReLU networks transfer to linear classifiers while the reverse need not hold, suggesting that adversarial perturbations for more complex models are more likely to transfer to other classifiers. We validate our findings empirically, even for deeper ReLU networks.'
volume: 89
URL: http://proceedings.mlr.press/v89/charles19a.html
PDF: http://proceedings.mlr.press/v89/charles19a/charles19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-charles19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Charles
given: Zachary
- family: Rosenberg
given: Harrison
- family: Papailiopoulos
given: Dimitris
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1960-1968
id: charles19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1960
lastpage: 1968
published: 2019-04-11 00:00:00 +0000
- title: 'Non-linear process convolutions for multi-output Gaussian processes'
abstract: 'The paper introduces a non-linear version of the process convolution formalism for building covariance functions for multi-output Gaussian processes. The non-linearity is introduced via Volterra series, one series per each output. We provide closed-form expressions for the mean function and the covariance function of the approximated Gaussian process at the output of the Volterra series. The mean function and covariance function for the joint Gaussian process are derived using formulae for the product moments of Gaussian variables. We compare the performance of the non-linear model against the classical process convolution approach in one synthetic dataset and two real datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/alvarez19a.html
PDF: http://proceedings.mlr.press/v89/alvarez19a/alvarez19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-alvarez19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Alvarez
given: Mauricio A.
- family: Ward
given: Wil
- family: Guarnizo
given: Cristian
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1969-1977
id: alvarez19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1969
lastpage: 1977
published: 2019-04-11 00:00:00 +0000
- title: 'Lovasz Convolutional Networks'
abstract: 'Semi-supervised learning on graph structured data has received significant attention with the recent introduction of Graph Convolution Networks (GCN). While traditional methods have focused on optimizing a loss augmented with Laplacian regularization framework, GCNs perform an implicit Laplacian type regularization to capture local graph structure. In this work, we propose Lovasz Convolutional Network (LCNs) which are capable of incorporating global graph properties. LCNs achieve this by utilizing Lovasz’s orthonormal embeddings of the nodes. We analyse local and global properties of graphs and demonstrate settings where LCNs tend to work better than GCNs. We validate the proposed method on standard random graph models such as stochastic block models (SBM) and certain community structure based graphs where LCNs outperform GCNs and learn more intuitive embeddings. We also perform extensive binary and multi-class classification experiments on real world datasets to demonstrate LCN’s effectiveness. In addition to simple graphs, we also demonstrate the use of LCNs on hyper-graphs by identifying settings where they are expected to work better than GCNs.'
volume: 89
URL: http://proceedings.mlr.press/v89/yadav19a.html
PDF: http://proceedings.mlr.press/v89/yadav19a/yadav19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-yadav19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Yadav
given: Prateek
- family: Nimishakavi
given: Madhav
- family: Yadati
given: Naganand
- family: Vashishth
given: Shikhar
- family: Rajkumar
given: Arun
- family: Talukdar
given: Partha
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1978-1987
id: yadav19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1978
lastpage: 1987
published: 2019-04-11 00:00:00 +0000
- title: 'Bridging the gap between regret minimization and best arm identification, with application to A/B tests'
abstract: 'State of the art online learning procedures focus either on selecting the best alternative (“best arm identification”) or on minimizing the cost (the “regret”). We merge these two objectives by providing the theoretical analysis of cost minimizing algorithms that are also $\delta$-PAC (with a proven guaranteed bound on the decision time), hence fulfilling at the same time regret minimization and best arm identification. This analysis sheds light on the common observation that ill-callibrated UCB-algorithms minimize regret while still identifying quickly the best arm. We also extend these results to the non-iid case faced by many practitioners. This provides a technique to make cost versus decision time compromise when doing adaptive tests with applications ranging from website A/B testing to clinical trials.'
volume: 89
URL: http://proceedings.mlr.press/v89/degenne19a.html
PDF: http://proceedings.mlr.press/v89/degenne19a/degenne19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-degenne19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Degenne
given: Rémy
- family: Nedelec
given: Thomas
- family: Calauzenes
given: Clement
- family: Perchet
given: Vianney
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1988-1996
id: degenne19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1988
lastpage: 1996
published: 2019-04-11 00:00:00 +0000
- title: 'Gaussian Process Modulated Cox Processes under Linear Inequality Constraints'
abstract: 'Gaussian process (GP) modulated Cox processes are widely used to model point patterns. Existing approaches require a mapping (link function) between the unconstrained GP and the positive intensity function. This commonly yields solutions that do not have a closed form or that are restricted to specific covariance functions. We introduce a novel finite approximation of GP-modulated Cox processes where positiveness conditions can be imposed directly on the GP, with no restrictions on the covariance function. Our approach can also ensure other types of inequality constraints (e.g. monotonicity, convexity), resulting in more versatile models that can be used for other classes of point processes (e.g. renewal processes). We demonstrate on both synthetic and real-world data that our framework accurately infers the intensity functions. Where monotonicity is a feature of the process, our ability to include this in the inference improves results.'
volume: 89
URL: http://proceedings.mlr.press/v89/lopez-lopera19a.html
PDF: http://proceedings.mlr.press/v89/lopez-lopera19a/lopez-lopera19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lopez-lopera19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lopez-lopera
given: Andrés F.
- family: John
given: ST
- family: Durrande
given: Nicolas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 1997-2006
id: lopez-lopera19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 1997
lastpage: 2006
published: 2019-04-11 00:00:00 +0000
- title: 'Implicit Kernel Learning'
abstract: 'Kernels are powerful and versatile tools in machine learning and statistics. Although the notion of universal kernels and characteristic kernels has been studied, kernel selection still greatly influences the empirical performance. While learning the kernel in a data driven way has been investigated, in this paper we explore learning the spectral distribution of kernel via implicit generative models parametrized by deep neural networks. We called our method Implicit Kernel Learning (IKL). The proposed framework is simple to train and inference is performed via sampling random Fourier features. We investigate two applications of the proposed IKL as examples, including generative adversarial networks with MMD (MMD GAN) and standard supervised learning. Empirically, MMD GAN with IKL outperforms vanilla predefined kernels on both image and text generation benchmarks; using IKL with Random Kitchen Sinks also leads to substantial improvement over existing state-of-the-art kernel learning algorithms on popular supervised learning benchmarks. Theory and conditions for using IKL in both applications are also studied as well as connections to previous state-of-the-art methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/li19f.html
PDF: http://proceedings.mlr.press/v89/li19f/li19f.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-li19f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Li
given: Chun-Liang
- family: Chang
given: Wei-Cheng
- family: Mroueh
given: Youssef
- family: Yang
given: Yiming
- family: Poczos
given: Barnabas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2007-2016
id: li19f
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2007
lastpage: 2016
published: 2019-04-11 00:00:00 +0000
- title: 'Bounding Inefficiency of Equilibria in Continuous Actions Games using Submodularity and Curvature'
abstract: 'Games with continuous strategy sets arise in several machine learning problems (e.g. adversarial learning). For such games, simple no-regret learning algorithms exist in several cases and ensure convergence to coarse correlated equilibria (CCE). The efficiency of such equilibria with respect to a social function, however, is not well understood. In this paper, we define the class of valid utility games with continuous strategies and provide efficiency bounds for their CCEs. Our bounds rely on the social function being a monotone DR-submodular function. We further refine our bounds based on the curvature of the social function. Furthermore, we extend our efficiency bounds to a class of non-submodular functions that satisfy approximate submodularity properties. Finally, we show that valid utility games with continuous strategies can be designed to maximize monotone DR-submodular functions subject to disjoint constraints with approximation guarantees. The approximation guarantees we derive are based on the efficiency of the equilibria of such games and can improve the existing ones in the literature. We illustrate and validate our results on a budget allocation game and a sensor coverage problem.'
volume: 89
URL: http://proceedings.mlr.press/v89/sessa19a.html
PDF: http://proceedings.mlr.press/v89/sessa19a/sessa19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-sessa19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Sessa
given: Pier Giuseppe
- family: Kamgarpour
given: Maryam
- family: Krause
given: Andreas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2017-2027
id: sessa19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2017
lastpage: 2027
published: 2019-04-11 00:00:00 +0000
- title: 'Variational Information Planning for Sequential Decision Making'
abstract: 'We consider the setting of sequential decision making where, at each stage, potential actions are evaluated based on expected reduction in posterior uncertainty, given by mutual information (MI). As MI typically lacks a closed form, we propose an approach which maintains variational approximations of, both, the posterior and MI utility. Our planning objective extends an established variational bound on MI to the setting of sequential planning. The result, variational information planning (VIP), is an efficient method for sequential decision making. We further establish convexity of the variational planning objective and, under conditional exponential family approximations, we show that the optimal MI bound arises from a relaxation of the well-known exponential family moment matching property. We demonstrate VIP for sensor selection, experiment design, and active learning, where it meets or exceeds methods requiring more computation, or those specialized to the task.'
volume: 89
URL: http://proceedings.mlr.press/v89/pacheco19a.html
PDF: http://proceedings.mlr.press/v89/pacheco19a/pacheco19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-pacheco19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Pacheco
given: Jason
- family: Fisher
given: John
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2028-2036
id: pacheco19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2028
lastpage: 2036
published: 2019-04-11 00:00:00 +0000
- title: 'Renyi Differentially Private ERM for Smooth Objectives'
abstract: 'In this paper, we present a Renyi Differentially Private stochastic gradient descent (SGD) algorithm for convex empirical risk minimization. The algorithm uses output perturbation and leverages randomness inside SGD, which creates a "randomized sensitivity", in order to reduce the amount of noise that is added. One of the benefits of output perturbation is that we can incorporate a periodic averaging step that serves to further reduce sensitivity while improving accuracy (reducing the well-known oscillating behavior of SGD near the optimum). Renyi Differential Privacy can be used to provide (epsilon, delta)-differential privacy guarantees and hence provide a comparison with prior work. An empirical evaluation demonstrates that the proposed method outperforms prior methods on differentially private ERM.'
volume: 89
URL: http://proceedings.mlr.press/v89/chen19e.html
PDF: http://proceedings.mlr.press/v89/chen19e/chen19e.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chen19e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chen
given: Chen
- family: Lee
given: Jaewoo
- family: Kifer
given: Dan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2037-2046
id: chen19e
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2037
lastpage: 2046
published: 2019-04-11 00:00:00 +0000
- title: 'Projection-Free Bandit Convex Optimization'
abstract: 'In this paper, we propose the first computationally efficient projection-free algorithm for bandit convex optimization (BCO) with a general convex constraint. We show that our algorithm achieves a sublinear regret of $O(nT^{4/5})$ (where $T$ is the horizon and $n$ is the dimension) for any bounded convex functions with uniformly bounded gradients. We also evaluate the performance of our algorithm against baselines on both synthetic and real data sets for quadratic programming, portfolio selection and matrix completion problems.'
volume: 89
URL: http://proceedings.mlr.press/v89/chen19f.html
PDF: http://proceedings.mlr.press/v89/chen19f/chen19f.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chen19f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chen
given: Lin
- family: Zhang
given: Mingrui
- family: Karbasi
given: Amin
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2047-2056
id: chen19f
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2047
lastpage: 2056
published: 2019-04-11 00:00:00 +0000
- title: 'Provable Robustness of ReLU networks via Maximization of Linear Regions'
abstract: 'It has been shown that neural network classifiers are not robust. This raises concerns about their usage in safety-critical systems. We propose in this paper a regularization scheme for ReLU networks which provably improves the robustness of the classifier by maximizing the linear regions of the classifier as well as the distance to the decision boundary. Using our regularization we can even find the minimal adversarial perturbation for a certain fraction of test points for large networks. In the experiments we show that our approach improves upon pure adversarial training both in terms of lower and upper bounds on the robustness and is comparable or better than the state of the art in terms of test error and robustness.'
volume: 89
URL: http://proceedings.mlr.press/v89/croce19a.html
PDF: http://proceedings.mlr.press/v89/croce19a/croce19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-croce19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Croce
given: Francesco
- family: Andriushchenko
given: Maksym
- family: Hein
given: Matthias
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2057-2066
id: croce19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2057
lastpage: 2066
published: 2019-04-11 00:00:00 +0000
- title: 'Test without Trust: Optimal Locally Private Distribution Testing'
abstract: 'We study the problem of distribution testing when the samples can only be accessed using a locally differentially private mechanism and focus on two representative testing questions of identity (goodness-of-fit) and independence testing for discrete distributions. First, we construct tests that use existing, general-purpose locally differentially private mechanisms such as the popular RAPPOR or the recently introduced Hadamard Response for collecting data and show that our proposed tests are sample optimal, when we insist on using these mechanisms. Next, we allow bespoke mechanisms designed specifically for testing and introduce the Randomized Aggregated Private Testing Optimal Response (RAPTOR) mechanism which is remarkably simple and requires only one bit of communication per sample. We show that our proposed mechanism yields sample-optimal tests, and in particular outperforms any test based on RAPPOR or Hadamard response. A distinguishing feature of our optimal mechanism is that, in contrast to existing mechanisms, it uses public randomness.'
volume: 89
URL: http://proceedings.mlr.press/v89/acharya19b.html
PDF: http://proceedings.mlr.press/v89/acharya19b/acharya19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-acharya19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Acharya
given: Jayadev
- family: Canonne
given: Clement
- family: Freitag
given: Cody
- family: Tyagi
given: Himanshu
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2067-2076
id: acharya19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2067
lastpage: 2076
published: 2019-04-11 00:00:00 +0000
- title: 'Distributed Maximization of "Submodular plus Diversity" Functions for Multi-label Feature Selection on Huge Datasets'
abstract: 'There are many problems in machine learning and data mining which are equivalent to selecting a non-redundant, high "quality" set of objects. Recommender systems, feature selection, and data summarization are among many applications of this. In this paper, we consider this problem as an optimization problem that seeks to maximize the sum of a sum-sum diversity function and a non-negative monotone submodular function. The diversity function addresses the redundancy, and the submodular function controls the predictive quality. We consider the problem in big data settings (in other words, distributed and streaming settings) where the data cannot be stored on a single machine or the process time is too high for a single machine. We show that a greedy algorithm achieves a constant factor approximation of the optimal solution in these settings. Moreover, we formulate the multi-label feature selection problem as such an optimization problem. This formulation combined with our algorithm leads to the first distributed multi-label feature selection method. We compare the performance of this method with centralized multi-label feature selection methods in the literature, and we show that its performance is comparable or in some cases is even better than current centralized multi-label feature selection methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/ghadiri19a.html
PDF: http://proceedings.mlr.press/v89/ghadiri19a/ghadiri19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ghadiri19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ghadiri
given: Mehrdad
- family: Schmidt
given: Mark
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2077-2086
id: ghadiri19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2077
lastpage: 2086
published: 2019-04-11 00:00:00 +0000
- title: 'On Euclidean k-Means Clustering with alpha-Center Proximity'
abstract: '$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are \emph{stable} under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is \emph{center proximity}, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $\alpha > 1$. We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $\alpha$-center proximity. We give a simple algorithm to find the optimal $\alpha$-center-proximal $k$-means clustering in running time exponential in $k$ and $1/(\alpha - 1)$ but linear in the number of points and the dimension. We define an analogous $\alpha$-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $\alpha$-center proximity. On the hardness side we show that for any $\alpha’ > 1$, there exists an $\alpha \leq \alpha’$, $(\alpha >1)$, and an $\e_0 > 0$ such that minimizing the $k$-means objective over clusterings that satisfy $\alpha$-center proximity is NP-hard to approximate within a multiplicative $(1+\e_0)$ factor.'
volume: 89
URL: http://proceedings.mlr.press/v89/deshpande19a.html
PDF: http://proceedings.mlr.press/v89/deshpande19a/deshpande19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-deshpande19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Deshpande
given: Amit
- family: Louis
given: Anand
- family: Singh
given: Apoorv
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2087-2095
id: deshpande19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2087
lastpage: 2095
published: 2019-04-11 00:00:00 +0000
- title: 'Noisy Blackbox Optimization using Multi-fidelity Queries: A Tree Search Approach'
abstract: 'We study the problem of black-box optimization of a noisy function in the presence of low-cost approximations or fidelities, which is motivated by problems like hyper-parameter tuning. In hyper-parameter tuning evaluating the black-box function at a point involves training a learning algorithm on a large data-set at a particular hyper-parameter and evaluating the validation error. Even a single such evaluation can be prohibitively expensive. Therefore, it is beneficial to use low-cost approximations, like training the learning algorithm on a sub-sampled version of the whole data-set. These low-cost approximations/fidelities can however provide a biased and noisy estimate of the function value. In this work, we combine structured state-space exploration through hierarchical partitioning with querying these partitions at multiple fidelities, and develop a multi-fidelity bandit based tree-search algorithm for noisy black-box optimization. We derive simple regret guarantees for our algorithm and validate its performance on real and synthetic datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/sen19a.html
PDF: http://proceedings.mlr.press/v89/sen19a/sen19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-sen19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Sen
given: Rajat
- family: Kandasamy
given: Kirthevasan
- family: Shakkottai
given: Sanjay
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2096-2105
id: sen19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2096
lastpage: 2105
published: 2019-04-11 00:00:00 +0000
- title: 'Safe Convex Learning under Uncertain Constraints'
abstract: 'We address the problem of minimizing a convex smooth function f(x) over a compact polyhedral set D given a stochastic zeroth-order constraint feedback model. This problem arises in safety-critical machine learning applications, such as personalized medicine and robotics. In such cases, one needs to ensure constraints are satisfied while exploring the decision space to find optimum of the loss function. We propose a new variant of the Frank-Wolfe algorithm, which applies to the case of uncertain linear constraints. Using robust optimization, we provide the convergence rate of the algorithm while guaranteeing feasibility of all iterates, with high probability.'
volume: 89
URL: http://proceedings.mlr.press/v89/usmanova19a.html
PDF: http://proceedings.mlr.press/v89/usmanova19a/usmanova19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-usmanova19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Usmanova
given: Ilnura
- family: Krause
given: Andreas
- family: Kamgarpour
given: Maryam
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2106-2114
id: usmanova19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2106
lastpage: 2114
published: 2019-04-11 00:00:00 +0000
- title: 'The non-parametric bootstrap and spectral analysis in moderate and high-dimension'
abstract: 'We consider the properties of the bootstrap as a tool for inference concerning the eigenvalues of a sample covariance matrix computed from an n x p data matrix X. We focus on the modern framework where p/n is not close to 0 but remains bounded as n and p tend to infinity. Through a mix of numerical and theoretical considerations, we show that the non-parametric bootstrap is not in general a reliable inferential tool in the setting we consider. However, in the case where the population covariance matrix is well-approximated by a finite rank matrix, the non-parametric bootstrap performs as it does in finite dimension.'
volume: 89
URL: http://proceedings.mlr.press/v89/karoui19a.html
PDF: http://proceedings.mlr.press/v89/karoui19a/karoui19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-karoui19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Karoui
given: Noureddine El
- family: Purdom
given: Elizabeth
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2115-2124
id: karoui19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2115
lastpage: 2124
published: 2019-04-11 00:00:00 +0000
- title: 'Knockoffs for the Mass: New Feature Importance Statistics with False Discovery Guarantees'
abstract: 'An important problem in machine learning and statistics is to identify features that causally affect the outcome. This is often impossible to do from purely observational data, and a natural relaxation is to identify features that are correlated with the outcome even conditioned on all other observed features. For example, we want to identify that smoking really is correlated with cancer conditioned on demographics. The knockoff procedure is a recent breakthrough in statistics that, in theory, can identify truly correlated features while guaranteeing that false discovery rate is controlled. The idea is to create synthetic data-knockoffs-that capture correlations among the features. However, there are substantial computational and practical challenges to generating and using knockoffs. This paper makes several key advances that enable knockoff application to be more efficient and powerful. We develop an efficient algorithm to generate valid knockoffs from Bayesian Networks. Then we systematically evaluate knockoff test statistics and develop new statistics with improved power. The paper combines new mathematical guarantees with systematic experiments on real and synthetic data.'
volume: 89
URL: http://proceedings.mlr.press/v89/gimenez19a.html
PDF: http://proceedings.mlr.press/v89/gimenez19a/gimenez19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-gimenez19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Gimenez
given: Jaime Roquero
- family: Ghorbani
given: Amirata
- family: Zou
given: James
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2125-2133
id: gimenez19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2125
lastpage: 2133
published: 2019-04-11 00:00:00 +0000
- title: 'Training Variational Autoencoders with Buffered Stochastic Variational Inference'
abstract: 'The recognition network in deep latent variable models such as variational autoencoders (VAEs) relies on amortized inference for efficient posterior approximation that can scale up to large datasets. However, this technique has also been demonstrated to select suboptimal variational parameters, often resulting in considerable additional error called the amortization gap. To close the amortization gap and improve the training of the generative model, recent works have introduced an additional refinement step that applies stochastic variational inference (SVI) to improve upon the variational parameters returned by the amortized inference model. In this paper, we propose the Buffered Stochastic Variational Inference (BSVI), a new refinement procedure that makes use of SVI’s sequence of intermediate variational proposal distributions and their corresponding importance weights to construct a new generalized importance-weighted lower bound. We demonstrate empirically that training the variational autoencoders with BSVI consistently out-performs SVI, yielding an improved training procedure for VAEs.'
volume: 89
URL: http://proceedings.mlr.press/v89/shu19a.html
PDF: http://proceedings.mlr.press/v89/shu19a/shu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-shu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Shu
given: Rui
- family: Bui
given: Hung
- family: Whang
given: Jay
- family: Ermon
given: Stefano
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2134-2143
id: shu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2134
lastpage: 2143
published: 2019-04-11 00:00:00 +0000
- title: 'Regularized Contextual Bandits'
abstract: 'We consider the stochastic contextual bandit problem with additional regularization. The motivation comes from problems where the policy of the agent must be close to some baseline policy which is known to perform well on the task. To tackle this problem we use a nonparametric model and propose an algorithm splitting the context space into bins, and solving simultaneously — and independently — regularized multi-armed bandit instances on each bin. We derive slow and fast rates of convergence, depending on the unknown complexity of the problem. We also consider a new relevant margin condition to get problem-independent convergence rates, ending up in intermediate convergence rates interpolating between the aforementioned slow and fast rates.'
volume: 89
URL: http://proceedings.mlr.press/v89/fontaine19a.html
PDF: http://proceedings.mlr.press/v89/fontaine19a/fontaine19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-fontaine19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Fontaine
given: Xavier
- family: Berthet
given: Quentin
- family: Perchet
given: Vianney
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2144-2153
id: fontaine19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2144
lastpage: 2153
published: 2019-04-11 00:00:00 +0000
- title: 'Risk-Sensitive Generative Adversarial Imitation Learning'
abstract: 'We study risk-sensitive imitation learning where the agent’s goal is to perform at least as well as the expert in terms of a risk profile. We first formulate our risk-sensitive imitation learning setting. We consider the generative adversarial approach to imitation learning (GAIL) and derive an optimization problem for our formulation, which we call it risk- sensitive GAIL (RS-GAIL). We then derive two different versions of our RS-GAIL optimization problem that aim at matching the risk profiles of the agent and the expert w.r.t. Jensen-Shannon (JS) divergence and Wasserstein distance, and develop risk-sensitive generative adversarial imitation learning algorithms based on these optimization problems. We evaluate the performance of our algorithms and compare them with GAIL and the risk-averse imitation learning (RAIL) algorithms in two MuJoCo and two OpenAI classical control tasks.'
volume: 89
URL: http://proceedings.mlr.press/v89/lacotte19a.html
PDF: http://proceedings.mlr.press/v89/lacotte19a/lacotte19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lacotte19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lacotte
given: Jonathan
- family: Ghavamzadeh
given: Mohammad
- family: Chow
given: Yinlam
- family: Pavone
given: Marco
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2154-2163
id: lacotte19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2154
lastpage: 2163
published: 2019-04-11 00:00:00 +0000
- title: 'Learning Controllable Fair Representations'
abstract: 'Learning data representations that are transferable and are fair with respect to certain protected attributes is crucial to reducing unfair decisions while preserving the utility of the data. We propose an information-theoretically motivated objective for learning maximally expressive representations subject to fairness constraints. We demonstrate that a range of existing approaches optimize approximations to the Lagrangian dual of our objective. In contrast to these existing approaches, our objective allows the user to control the fairness of the representations by specifying limits on unfairness. Exploiting duality, we introduce a method that optimizes the model parameters as well as the expressiveness-fairness trade-off. Empirical evidence suggests that our proposed method can balance the trade-off between multiple notions of fairness and achieves higher expressiveness at a lower computational cost.'
volume: 89
URL: http://proceedings.mlr.press/v89/song19a.html
PDF: http://proceedings.mlr.press/v89/song19a/song19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-song19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Song
given: Jiaming
- family: Kalluri
given: Pratyusha
- family: Grover
given: Aditya
- family: Zhao
given: Shengjia
- family: Ermon
given: Stefano
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2164-2173
id: song19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2164
lastpage: 2173
published: 2019-04-11 00:00:00 +0000
- title: 'Multi-Task Time Series Analysis applied to Drug Response Modelling'
abstract: 'Time series models such as dynamical systems are frequently fitted to a cohort of data, ignoring variation between individual entities such as patients. In this paper we show how these models can be personalised to an individual level while retaining statistical power, via use of multi-task learning (MTL). To our knowledge this is a novel development of MTL which applies to time series both with and without control inputs. The modelling framework is demonstrated on a physiological drug response problem which results in improved predictive accuracy and uncertainty estimation over existing state-of-the-art models.'
volume: 89
URL: http://proceedings.mlr.press/v89/bird19a.html
PDF: http://proceedings.mlr.press/v89/bird19a/bird19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-bird19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Bird
given: Alex
- family: Williams
given: Chris
- family: Hawthorne
given: Christopher
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2174-2183
id: bird19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2174
lastpage: 2183
published: 2019-04-11 00:00:00 +0000
- title: 'Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization'
abstract: 'The Model-X knockoff procedure has recently emerged as a powerful approach for feature selection with statistical guarantees. The advantage of knockoffs is that if we have a good model of the features X, then we can identify salient features without knowing anything about how the outcome Y depends on X. An important drawback of knockoffs is its instability: running the procedure twice can result in very different selected features, potentially leading to different conclusions. Addressing this instability is critical for obtaining reproducible and robust results. Here we present a generalization of the knockoff procedure that we call simultaneous multi-knockoffs. We show that multi-knockoffs guarantee false discovery rate (FDR) control, and are substantially more stable and powerful compared to the standard (single) knockoffs. Moreover we propose a new algorithm based on entropy maximization for generating Gaussian multi-knockoffs. We validate the improved stability and power of multi-knockoffs in systematic experiments. We also illustrate how multi-knockoffs can improve the accuracy of detecting genetic mutations that are causally linked to phenotypes.'
volume: 89
URL: http://proceedings.mlr.press/v89/gimenez19b.html
PDF: http://proceedings.mlr.press/v89/gimenez19b/gimenez19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-gimenez19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Gimenez
given: Jaime Roquero
- family: Zou
given: James
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2184-2192
id: gimenez19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2184
lastpage: 2192
published: 2019-04-11 00:00:00 +0000
- title: 'Know Your Boundaries: Constraining Gaussian Processes by Variational Harmonic Features'
abstract: 'Gaussian processes (GPs) provide a powerful framework for extrapolation, interpolation, and noise removal in regression and classification. This paper considers constraining GPs to arbitrarily-shaped domains with boundary conditions. We solve a Fourier-like generalised harmonic feature representation of the GP prior in the domain of interest, which both constrains the GP and attains a low-rank representation that is used for speeding up inference. The method scales as O(nm^2) in prediction and O(m^3) in hyperparameter learning for regression, where n is the number of data points and m the number of features. Furthermore, we make use of the variational approach to allow the method to deal with non-Gaussian likelihoods. The experiments cover both simulated and empirical data in which the boundary conditions allow for inclusion of additional physical information.'
volume: 89
URL: http://proceedings.mlr.press/v89/solin19a.html
PDF: http://proceedings.mlr.press/v89/solin19a/solin19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-solin19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Solin
given: Arno
- family: Kok
given: Manon
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2193-2202
id: solin19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2193
lastpage: 2202
published: 2019-04-11 00:00:00 +0000
- title: 'Distributional reinforcement learning with linear function approximation'
abstract: 'Despite many algorithmic advances, our theoretical understanding of practical distributional reinforcement learning methods remains limited. One exception is Rowland et al. (2018)’s analysis of the C51 algorithm in terms of the Cramer distance, but their results only apply to the tabular setting and ignore C51’s use of a softmax to produce normalized distributions. In this paper we adapt the Cramer distance to deal with arbitrary vectors. From it we derive a new distributional algorithm which is fully Cramer-based and can be combined to linear function approximation, with formal guarantees in the context of policy evaluation. In allowing the model’s prediction to be any real vector, we lose the probabilistic interpretation behind the method, but otherwise maintain the appealing properties of distributional approaches. To the best of our knowledge, ours is the first proof of convergence of a distributional algorithm combined with function approximation. Perhaps surprisingly, our results provide evidence that Cramer-based distributional methods may perform worse than directly approximating the value function.'
volume: 89
URL: http://proceedings.mlr.press/v89/bellemare19a.html
PDF: http://proceedings.mlr.press/v89/bellemare19a/bellemare19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-bellemare19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Bellemare
given: Marc G.
- family: Roux
given: Nicolas Le
- family: Castro
given: Pablo Samuel
- family: Moitra
given: Subhodeep
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2203-2211
id: bellemare19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2203
lastpage: 2211
published: 2019-04-11 00:00:00 +0000
- title: 'Matroids, Matchings, and Fairness'
abstract: 'The need for fairness in machine learning algorithms is increasingly critical. A recent focus has been on developing fair versions of classical algorithms, such as those for bandit learning, regression, and clustering. In this work we extend this line of work to include algorithms for optimization subject to one or multiple matroid constraints. We map out a series of results, showing optimal solutions, approximation algorithms, and hardness proofs depending on the specific flavor of the problem. Our algorithms are efficient and empirical experiments demonstrate that fairness can be achieved with a modest compromise to the overall objective value.'
volume: 89
URL: http://proceedings.mlr.press/v89/chierichetti19a.html
PDF: http://proceedings.mlr.press/v89/chierichetti19a/chierichetti19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chierichetti19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chierichetti
given: Flavio
- family: Kumar
given: Ravi
- family: Lattanzi
given: Silvio
- family: Vassilvtiskii
given: Sergei
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2212-2220
id: chierichetti19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2212
lastpage: 2220
published: 2019-04-11 00:00:00 +0000
- title: 'Dynamical Isometry is Achieved in Residual Networks in a Universal Way for any Activation Function'
abstract: 'We demonstrate that in residual neural networks (ResNets) dynamical isometry is achievable irrespective of the activation function used. We do that by deriving, with the help of Free Probability and Random Matrix Theories, a universal formula for the spectral density of the input-output Jacobian at initialization, in the large network width and depth limit. The resulting singular value spectrum depends on a single parameter, which we calculate for a variety of popular activation functions, by analyzing the signal propagation in the artificial neural network. We corroborate our results with numerical simulations of both random matrices and ResNets applied to the CIFAR-10 classification problem. Moreover, we study consequences of this universal behavior for the initial and late phases of the learning processes. We conclude by drawing attention to the simple fact, that initialization acts as a confounding factor between the choice of activation function and the rate of learning. We propose that in ResNets this can be resolved based on our results by ensuring the same level of dynamical isometry at initialization.'
volume: 89
URL: http://proceedings.mlr.press/v89/tarnowski19a.html
PDF: http://proceedings.mlr.press/v89/tarnowski19a/tarnowski19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-tarnowski19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Tarnowski
given: Wojciech
- family: Warchoł
given: Piotr
- family: Jastrzȩbski
given: Stanisław
- family: Tabor
given: Jacek
- family: Nowak
given: Maciej
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2221-2230
id: tarnowski19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2221
lastpage: 2230
published: 2019-04-11 00:00:00 +0000
- title: 'The Termination Critic'
abstract: 'In this work, we consider the problem of autonomously discovering behavioral abstractions, or options, for reinforcement learning agents. We propose an algorithm that focuses on the termination function, as opposed to - as is common - the policy. The termination function is usually trained to optimize a control objective: an option ought to terminate if another has better value. We offer a different, information-theoretic perspective, and propose that terminations should focus instead on the compressibility of the option’s encoding - arguably a key reason for using abstractions. To achieve this algorithmically, we leverage the classical options framework, and learn the option transition model as a "critic" for the termination function. Using this model, we derive gradients that optimize the desired criteria. We show that the resulting options are non-trivial, intuitively meaningful, and useful for learning.'
volume: 89
URL: http://proceedings.mlr.press/v89/harutyunyan19a.html
PDF: http://proceedings.mlr.press/v89/harutyunyan19a/harutyunyan19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-harutyunyan19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Harutyunyan
given: Anna
- family: Dabney
given: Will
- family: Borsa
given: Diana
- family: Heess
given: Nicolas
- family: Munos
given: Remi
- family: Precup
given: Doina
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2231-2240
id: harutyunyan19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2231
lastpage: 2240
published: 2019-04-11 00:00:00 +0000
- title: 'Consistent Online Optimization: Convex and Submodular'
abstract: 'Modern online learning algorithms achieve low (sublinear) regret in a variety of diverse settings. These algorithms, however, update their solution at every time step. While these updates are computationally efficient, the very requirement of frequent updates makes the algorithms untenable in some practical applications. In this work we develop online learning algorithms that update a sublinear number of times. We give a meta algorithm based on non-homogeneous Poisson Processes that gives a smooth trade-off between regret and frequency of updates. Empirically, we show that in many cases, we can significantly reduce updates at a minimal increase in regret.'
volume: 89
URL: http://proceedings.mlr.press/v89/jaghargh19a.html
PDF: http://proceedings.mlr.press/v89/jaghargh19a/jaghargh19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-jaghargh19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Jaghargh
given: Mohammad Reza Karimi
- family: Krause
given: Andreas
- family: Lattanzi
given: Silvio
- family: Vassilvtiskii
given: Sergei
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2241-2250
id: jaghargh19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2241
lastpage: 2250
published: 2019-04-11 00:00:00 +0000
- title: 'Learning Determinantal Point Processes by Corrective Negative Sampling'
abstract: 'Determinantal Point Processes (DPPs) have attracted significant interest from the machine-learning community due to their ability to elegantly and tractably model the delicate balance between quality and diversity of sets. DPPs are commonly learned from data using maximum likelihood estimation (MLE). While fitting observed sets well, MLE for DPPs may also assign high likelihoods to unobserved sets that are far from the true generative distribution of the data. To address this issue, which reduces the quality of the learned model, we introduce a novel optimization problem, Contrastive Estimation (CE), which encodes information about "negative" samples into the basic learning model. CE is grounded in the successful use of negative information in machine-vision and language modeling. Depending on the chosen negative distribution (which may be static or evolve during optimization), CE assumes two different forms, which we analyze theoretically and experimentally. We evaluate our new model on real-world datasets; on a challenging dataset, CE learning delivers a considerable improvement in predictive performance over a DPP learned without using contrastive information.'
volume: 89
URL: http://proceedings.mlr.press/v89/mariet19b.html
PDF: http://proceedings.mlr.press/v89/mariet19b/mariet19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-mariet19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Mariet
given: Zelda
- family: Gartrell
given: Mike
- family: Sra
given: Suvrit
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2251-2260
id: mariet19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2251
lastpage: 2260
published: 2019-04-11 00:00:00 +0000
- title: 'Probabilistic Semantic Inpainting with Pixel Constrained CNNs'
abstract: 'Semantic inpainting is the task of inferring missing pixels in an image given surrounding pixels and high level image semantics. Most semantic inpainting algorithms are deterministic: given an image with missing regions, a single inpainted image is generated. However, there are often several plausible inpaintings for a given missing region. In this paper, we propose a method to perform probabilistic semantic inpainting by building a model, based on PixelCNNs, that learns a distribution of images conditioned on a subset of visible pixels. Experiments on the MNIST and CelebA datasets show that our method produces diverse and realistic inpaintings.'
volume: 89
URL: http://proceedings.mlr.press/v89/dupont19a.html
PDF: http://proceedings.mlr.press/v89/dupont19a/dupont19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dupont19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dupont
given: Emilien
- family: Suresha
given: Suhas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2261-2270
id: dupont19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2261
lastpage: 2270
published: 2019-04-11 00:00:00 +0000
- title: 'Least Squares Estimation of Weakly Convex Functions'
abstract: 'Function estimation under shape restrictions, such as convexity, has many practical applications and has drawn a lot of recent interests. In this work we argue that convexity, as a global property, is too strict and prone to outliers. Instead, we propose to use weakly convex functions as a simple alternative to quantify “approximate convexity”—a notion that is perhaps more relevant in practice. We prove that, unlike convex functions, weakly convex functions can exactly interpolate any finite dataset and they are universal approximators. Through regularizing the modulus of convexity, we show that weakly convex functions can be efficiently estimated both statistically and algorithmically, requiring minimal modifications to existing algorithms and theory for estimating convex functions. Our numerical experiments confirm the class of weakly convex functions as another competitive alternative for nonparametric estimation.'
volume: 89
URL: http://proceedings.mlr.press/v89/sun19b.html
PDF: http://proceedings.mlr.press/v89/sun19b/sun19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-sun19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Sun
given: Sun
- family: Yu
given: Yaoliang
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2271-2280
id: sun19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2271
lastpage: 2280
published: 2019-04-11 00:00:00 +0000
- title: 'Interval Estimation of Individual-Level Causal Effects Under Unobserved Confounding'
abstract: 'We study the problem of learning conditional average treatment effects (CATE) from observational data with unobserved confounders. The CATE function maps baseline covariates to individual causal effect predictions and is key for personalized assessments. Recent work has focused on how to learn CATE under unconfoundedness, i.e., when there are no unobserved confounders. Since CATE may not be identified when unconfoundedness is violated, we develop a functional interval estimator that predicts bounds on the individual causal effects under realistic violations of unconfoundedness. Our estimator takes the form of a weighted kernel estimator with weights that vary adversarially. We prove that our estimator is sharp in that it converges exactly to the tightest bounds possible on CATE when there may be unobserved confounders. Further, we study personalized decision rules derived from our estimator and prove that they achieve optimal minimax regret asymptotically. We assess our approach in a simulation study as well as demonstrate its application in the case of hormone replacement therapy by comparing conclusions from a real observational study and clinical trial.'
volume: 89
URL: http://proceedings.mlr.press/v89/kallus19a.html
PDF: http://proceedings.mlr.press/v89/kallus19a/kallus19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kallus19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kallus
given: Nathan
- family: Mao
given: Xiaojie
- family: Zhou
given: Angela
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2281-2290
id: kallus19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2281
lastpage: 2290
published: 2019-04-11 00:00:00 +0000
- title: 'Amortized Variational Inference with Graph Convolutional Networks for Gaussian Processes'
abstract: 'GP Inference on large datasets is computationally expensive, especially when the observation likelihood is non-Gaussian. To reduce the computation, many recent variational inference methods define the variational distribution based on a small number of inducing points. These methods have a hard tradeoff between distribution flexibility and computational efficiency. In this paper, we focus on the approximation of GP posterior at a local level: we define a reusable template to approximate the posterior at neighborhoods while maintaining a global approximation. We first construct a variational distribution such that the inference for a data point considers only its neighborhood, thereby separating the calculation for each data point. We then train Graph Convolutional Networks as a reusable model to run inference for each data point. Comparing to previous methods, our method greatly reduces the number of parameters and also the number of optimization iterations. In empirical evaluations, the proposed method significantly speeds up the inference and often gets more accurate results than competing methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/liu19c.html
PDF: http://proceedings.mlr.press/v89/liu19c/liu19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-liu19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Liu
given: Linfeng
- family: Liu
given: Liping
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2291-2300
id: liu19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2291
lastpage: 2300
published: 2019-04-11 00:00:00 +0000
- title: 'Online Decentralized Leverage Score Sampling for Streaming Multidimensional Time Series'
abstract: 'Estimating the dependence structure of multidimensional time series data in real-time is challenging. With large volumes of streaming data, the problem becomes more difficult when the multidimensional data are collected asynchronously across distributed nodes, which motivates us to sample representative data points from streams. We propose a leverage score sampling (LSS) method for efficient online inference of the streaming vector autoregressive (VAR) model. We define the leverage score for the streaming VAR model so that the LSS method selects informative data points in real-time with statistical guarantees of parameter estimation efficiency. Moreover, our LSS method can be directly deployed in an asynchronous decentralized environment, e.g., a sensor network without a fusion center, and produce asynchronous consensus online parameter estimation over time. By exploiting the temporal dependence structure of the VAR model, the LSS method selects samples independently on each dimension and thus is able to update the estimation asynchronously. We illustrate the effectiveness of the LSS method in synthetic, gas sensor and seismic datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/xie19a.html
PDF: http://proceedings.mlr.press/v89/xie19a/xie19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-xie19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Xie
given: Rui
- family: Wang
given: Zengyan
- family: Bai
given: Shuyang
- family: Ma
given: Ping
- family: Zhong
given: Wenxuan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2301-2311
id: xie19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2301
lastpage: 2311
published: 2019-04-11 00:00:00 +0000
- title: 'Interpretable Cascade Classifiers with Abstention'
abstract: 'In many prediction tasks such as medical diagnostics, sequential decisions are crucial to provide optimal individual treatment. Budget in real-life applications is always limited, and it can represent any limited resource such as time, money, or side effects of medications. In this contribution, we develop a POMDP-based framework to learn cost-sensitive heterogeneous cascading systems. We provide both the theoretical support for the introduced approach and the intuition behind it. We evaluate our novel method on some standard benchmarks, and we discuss how the learned models can be interpreted by human experts.'
volume: 89
URL: http://proceedings.mlr.press/v89/clertant19a.html
PDF: http://proceedings.mlr.press/v89/clertant19a/clertant19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-clertant19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Clertant
given: Matthieu
- family: Sokolovska
given: Nataliya
- family: Chevaleyre
given: Yann
- family: Hanczar
given: Blaise
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2312-2320
id: clertant19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2312
lastpage: 2320
published: 2019-04-11 00:00:00 +0000
- title: 'Kernel Exponential Family Estimation via Doubly Dual Embedding'
abstract: 'We investigate penalized maximum log-likelihood estimation for exponential family distributions whose natural parameter resides in a reproducing kernel Hilbert space. Key to our approach is a novel technique, doubly dual embedding, that avoids computation of the partition function. This technique also allows the development of a flexible sampling strategy that amortizes the cost of Monte-Carlo sampling in the inference stage. The resulting estimator can be easily generalized to kernel conditional exponential families. We establish a connection between kernel exponential family estimation and MMD-GANs, revealing a new perspective for understanding GANs. Compared to the score matching based estimators, the proposed method improves both memory and time efficiency while enjoying stronger statistical properties, such as fully capturing smoothness in its statistical convergence rate while the score matching estimator appears to saturate. Finally, we show that the proposed estimator empirically outperforms state-of-the-art methods in both kernel exponential family estimation and its conditional extension.'
volume: 89
URL: http://proceedings.mlr.press/v89/dai19a.html
PDF: http://proceedings.mlr.press/v89/dai19a/dai19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dai19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dai
given: Bo
- family: Dai
given: Hanjun
- family: Gretton
given: Arthur
- family: Song
given: Le
- family: Schuurmans
given: Dale
- family: He
given: Niao
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2321-2330
id: dai19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2321
lastpage: 2330
published: 2019-04-11 00:00:00 +0000
- title: 'Revisiting Adversarial Risk'
abstract: 'Recent works on adversarial perturbations show that there is an inherent trade-off between standard test accuracy and adversarial accuracy. Specifically, they show that no classifier can simultaneously be robust to adversarial perturbations and achieve high standard test accuracy. However, this is contrary to the standard notion that on tasks such as image classification, humans are robust classifiers with low error rate. In this work, we show that the main reason behind this confusion is the inaccurate definition of adversarial perturbation that is used in the literature. To fix this issue, we propose a slight, yet important modification to the existing definition of adversarial perturbation. Based on the modified definition, we show that there is no trade-off between adversarial and standard accuracies; there exist classifiers that are robust and achieve high standard accuracy. We further study several properties of this new definition of adversarial risk and its relation to the existing definition.'
volume: 89
URL: http://proceedings.mlr.press/v89/suggala19a.html
PDF: http://proceedings.mlr.press/v89/suggala19a/suggala19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-suggala19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Suggala
given: Arun Sai
- family: Prasad
given: Adarsh
- family: Nagarajan
given: Vaishnavh
- family: Ravikumar
given: Pradeep
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2331-2339
id: suggala19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2331
lastpage: 2339
published: 2019-04-11 00:00:00 +0000
- title: 'A Memoization Framework for Scaling Submodular Optimization to Large Scale Problems'
abstract: 'We are motivated by large scale submodular optimization problems, where standard algorithms, which treat the submodular functions in the value oracle model, do not scale. In this paper, we present a new model called the pre-computational complexity model, along with a unifying memoization based framework, which looks at the specific form of the given submodular function. A key ingredient in this framework, is the notion of a precomputed statistic, which is maintained in the course of the algorithms. We show that we can easily integrate this idea into a large class of submodular optimization problems including constrained and unconstrained submodular maximization, minimization, difference of submodular optimization, ratio of submodular optimization and several other related optimization problems. Moreover, memoization can be integrated in both discrete and continuous relaxation flavors of algorithms for these problems. We demonstrate this idea for several commonly occurring submodular functions, and show how the pre-computational model provides significant speedups compared to the value oracle model. Finally, we empirically demonstrate this for large scale machine learning problems of data subset selection and summarization.'
volume: 89
URL: http://proceedings.mlr.press/v89/iyer19b.html
PDF: http://proceedings.mlr.press/v89/iyer19b/iyer19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-iyer19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Iyer
given: Rishabh
- family: Bilmes
given: Jeffrey
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2340-2349
id: iyer19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2340
lastpage: 2349
published: 2019-04-11 00:00:00 +0000
- title: 'Bernoulli Race Particle Filters'
abstract: 'When the weights in a particle filter are not available analytically, standard resampling methods cannot be employed. To circumvent this problem state-of-the-art algorithms replace the true weights with non-negative unbiased estimates. This algorithm is still valid but at the cost of higher variance of the resulting filtering estimates in comparison to a particle filter using the true weights. We propose here a novel algorithm that allows for resampling according to the true intractable weights when only an unbiased estimator of the weights is available. We demonstrate our algorithm on several examples.'
volume: 89
URL: http://proceedings.mlr.press/v89/schmon19a.html
PDF: http://proceedings.mlr.press/v89/schmon19a/schmon19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-schmon19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Schmon
given: Sebastian M.
- family: Doucet
given: Arnaud
- family: Deligiannidis
given: George
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2350-2358
id: schmon19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2350
lastpage: 2358
published: 2019-04-11 00:00:00 +0000
- title: 'Augmented Ensemble MCMC sampling in Factorial Hidden Markov Models'
abstract: 'Bayesian inference for Factorial Hidden Markov Models is challenging due to the exponentially sized latent variable space. Standard Monte Carlo samplers can have difficulties effectively exploring the posterior landscape and are often restricted to exploration around localised regions that depend on initialisation. We introduce a general purpose ensemble Markov Chain Monte Carlo (MCMC) technique to improve on existing poorly mixing samplers. This is achieved by combining parallel tempering and an auxiliary variable scheme to exchange information between the chains in an efficient way. The latter exploits a genetic algorithm within an augmented Gibbs sampler. We compare our technique with various existing samplers in a simulation study as well as in a cancer genomics application, demonstrating the improvements obtained by our augmented ensemble approach.'
volume: 89
URL: http://proceedings.mlr.press/v89/martens19a.html
PDF: http://proceedings.mlr.press/v89/martens19a/martens19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-martens19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Märtens
given: Kaspar
- family: Titsias
given: Michalis
- family: Yau
given: Christopher
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2359-2367
id: martens19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2359
lastpage: 2367
published: 2019-04-11 00:00:00 +0000
- title: 'Probabilistic Riemannian submanifold learning with wrapped Gaussian process latent variable models'
abstract: 'Latent variable models (LVMs) learn probabilistic models of data manifolds lying in an ambient Euclidean space. In a number of applications, a priori known spatial constraints can shrink the ambient space into a considerably smaller manifold. Additionally, in these applications the Euclidean geometry might induce a suboptimal similarity measure, which could be improved by choosing a different metric. Euclidean models ignore such information and assign probability mass to data points that can never appear as data, and vastly different likelihoods to points that are similar under the desired metric.We propose the wrapped Gaussian process latent variable model (WGPLVM), that extends Gaussian process latent variable models to take values strictly on a given Riemannian manifold, making the model blind to impossible data points. This allows non-linear, probabilistic inference of low-dimensional Riemannian submanifolds from data. Our evaluation on diverse datasets show that we improve performance on several tasks, including encoding, visualization and uncertainty quantification.'
volume: 89
URL: http://proceedings.mlr.press/v89/mallasto19a.html
PDF: http://proceedings.mlr.press/v89/mallasto19a/mallasto19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-mallasto19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Mallasto
given: Anton
- family: Hauberg
given: Søren
- family: Feragen
given: Aasa
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2368-2377
id: mallasto19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2368
lastpage: 2377
published: 2019-04-11 00:00:00 +0000
- title: 'Unbiased Smoothing using Particle Independent Metropolis-Hastings'
abstract: 'We consider the approximation of expectations with respect to the distribution of a latent Markov process given noisy measurements. This is known as the smoothing problem and is often approached with particle and Markov chain Monte Carlo (MCMC) methods. These methods provide consistent but biased estimators when run for a finite time. We propose a simple way of coupling two MCMC chains built using Particle Independent Metropolis-Hastings (PIMH) to produce unbiased smoothing estimators. Unbiased estimators are appealing in the context of parallel computing, and facilitate the construction of confidence intervals. The proposed scheme only requires access to off-the-shelf Particle Filters (PF) and is thus easier to implement than recently proposed unbiased smoothers. The approach is demonstrated on a Lévy-driven stochastic volatility model and a stochastic kinetic model.'
volume: 89
URL: http://proceedings.mlr.press/v89/middleton19a.html
PDF: http://proceedings.mlr.press/v89/middleton19a/middleton19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-middleton19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Middleton
given: Lawrece
- family: Deligiannidis
given: George
- family: Doucet
given: Arnaud
- family: Jacob
given: Pierre E.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2378-2387
id: middleton19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2378
lastpage: 2387
published: 2019-04-11 00:00:00 +0000
- title: 'Two-temperature logistic regression based on the Tsallis divergence'
abstract: 'We develop a variant of multiclass logistic regression that is significantly more robust to noise. The algorithm has one weight vector per class and the surrogate loss is a function of the linear activations (one per class). The surrogate loss of an example with linear activation vector $\mathbf{a}$ and class $c$ has the form $-\log_{t_1} \exp_{t_2} (a_c - G_{t_2}(\mathbf{a}))$ where the two temperatures $t_1$ and $t_2$ “temper” the $\log$ and $\exp$, respectively, and $G_{t_2}(\mathbf{a})$ is a scalar value that generalizes the log-partition function. We motivate this loss using the Tsallis divergence. Our method allows transitioning between non-convex and convex losses by the choice of the temperature parameters. As the temperature $t_1$ of the logarithm becomes smaller than the temperature $t_2$ of the exponential, the surrogate loss becomes “quasi convex”. Various tunings of the temperatures recover previous methods and tuning the degree of non-convexity is crucial in the experiments. In particular, quasi-convexity and boundedness of the loss provide significant robustness to the outliers. We explain this by showing that $t_1 < 1$ caps the surrogate loss and $t_2 >1$ makes the predictive distribution have a heavy tail. We show that the surrogate loss is Bayes-consistent, even in the non-convex case. Additionally, we provide efficient iterative algorithms for calculating the log-partition value only in a few number of iterations. Our compelling experimental results on large real-world datasets show the advantage of using the two-temperature variant in the noisy as well as the noise free case.'
volume: 89
URL: http://proceedings.mlr.press/v89/amid19a.html
PDF: http://proceedings.mlr.press/v89/amid19a/amid19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-amid19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Amid
given: Ehsan
- family: Warmuth
given: Manfred K.
- family: Srinivasan
given: Sriram
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2388-2396
id: amid19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2388
lastpage: 2396
published: 2019-04-11 00:00:00 +0000
- title: 'Avoiding Latent Variable Collapse with Generative Skip Models'
abstract: 'Variational autoencoders (VAEs) learn distributions of high-dimensional data. They model data with a deep latent-variable model and then fit the model by maximizing a lower bound of the log marginal likelihood. VAEs can capture complex distributions, but they can also suffer from an issue known as "latent variable collapse," especially if the likelihood model is powerful. Specifically, the lower bound involves an approximate posterior of the latent variables; this posterior "collapses" when it is set equal to the prior, i.e., when the approximate posterior is independent of the data. While VAEs learn good generative models, latent variable collapse prevents them from learning useful representations. In this paper, we propose a simple new way to avoid latent variable collapse by including skip connections in our generative model; these connections enforce strong links between the latent variables and the likelihood function. We study generative skip models both theoretically and empirically. Theoretically, we prove that skip models increase the mutual information between the observations and the inferred latent variables. Empirically, we study images (MNIST and Omniglot) and text (Yahoo). Compared to existing VAE architectures, we show that generative skip models maintain similar predictive performance but lead to less collapse and provide more meaningful representations of the data.'
volume: 89
URL: http://proceedings.mlr.press/v89/dieng19a.html
PDF: http://proceedings.mlr.press/v89/dieng19a/dieng19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dieng19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dieng
given: Adji B.
- family: Kim
given: Yoon
- family: Rush
given: Alexander M.
- family: Blei
given: David M.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2397-2405
id: dieng19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2397
lastpage: 2405
published: 2019-04-11 00:00:00 +0000
- title: 'SMOGS: Social Network Metrics of Game Success'
abstract: 'In this paper we propose a novel metric of basketball game success, derived from a team’s dynamic social network of game play. We combine ideas from random effects models for network links with taking a multi-resolution stochastic process approach to model passes between teammates. These passes can be viewed as directed dynamic relational links in a network. Multiplicative latent factors are introduced to study higher-order patterns in players’ interactions that distinguish a successful game from a loss. Parameters are estimated using a Markov chain Monte Carlo sampler. Results in simulation experiments suggest that the sampling scheme is effective in recovering the parameters. We also apply the model to the first high-resolution optical tracking data set collected in college basketball games. The learned latent factors demonstrate significant differences between players’ passing and receiving patterns in a loss, as opposed to a win. Our model is applicable to team sports other than basketball, as well as other time-varying network observations.'
volume: 89
URL: http://proceedings.mlr.press/v89/bu19a.html
PDF: http://proceedings.mlr.press/v89/bu19a/bu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-bu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Bu
given: Fan
- family: Xu
given: Sonia
- family: Heller
given: Katherine
- family: Volfovsky
given: Alexander
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2406-2414
id: bu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2406
lastpage: 2414
published: 2019-04-11 00:00:00 +0000
- title: 'Fast Algorithms for Sparse Reduced-Rank Regression'
abstract: 'We consider a reformulation of Reduced-Rank Regression (RRR) and Sparse Reduced-Rank Regression (SRRR) as a non-convex non-differentiable function of a single of the two matrices usually introduced to parametrize low-rank matrix learning problems. We study the behavior of proximal gradient algorithms for the minimization of the objective. In particular, based on an analysis of the geometry of the problem, we establish that a proximal Polyak-{Ł}ojasiewicz inequality is satisfied in a neighborhood of the set of optima under a condition on the regularization parameter. We can consequently derive linear convergence rates for the proximal gradient descent with line search and for related algorithms in a neighborhood of the optima. Our experiments show that our formulation leads to much faster learning algorithms for RRR and especially for SRRR.'
volume: 89
URL: http://proceedings.mlr.press/v89/dubois19a.html
PDF: http://proceedings.mlr.press/v89/dubois19a/dubois19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dubois19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dubois
given: Benjamin
- family: Delmas
given: Jean-François
- family: Obozinski
given: Guillaume
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2415-2424
id: dubois19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2415
lastpage: 2424
published: 2019-04-11 00:00:00 +0000
- title: 'Modeling simple structures and geometry for better stochastic optimization algorithms'
abstract: 'We develop model-based methods for stochastic optimization problems, introducing the approximate-proximal point, or aProx, family, which includes stochastic subgradient, proximal point, and bundle methods. For appropriately accurate models, the methods enjoy stronger convergence and robustness guarantees than classical approaches and typically add little to no computational overhead over stochastic subgradient methods. For example, we show that methods using the improved models converge with probability 1; these methods are also adaptive to a natural class of what we term easy optimization problems, achieving linear convergence under appropriate strong growth conditions on the objective. Our experimental investigation shows the advantages of more accurate modeling over standard subgradient methods across many smooth and non-smooth, convex and non-convex optimization problems.'
volume: 89
URL: http://proceedings.mlr.press/v89/asi19a.html
PDF: http://proceedings.mlr.press/v89/asi19a/asi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-asi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Asi
given: Hilal
- family: Duchi
given: John C.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2425-2434
id: asi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2425
lastpage: 2434
published: 2019-04-11 00:00:00 +0000
- title: 'Online learning with feedback graphs and switching costs'
abstract: 'We study online learning when partial feedback information is provided following every action of the learning process, and the learner incurs switching costs for changing his actions. In this setting, the feedback information system can be represented by a graph, and previous works studied the expected regret of the learner in the case of a clique (Expert setup), or disconnected single loops (Multi-Armed Bandits (MAB)). This work provides a lower bound on the expected regret in the Partial Information (PI) setting, namely for general feedback graphs –excluding the clique. Additionally, it shows that all algorithms that are optimal without switching costs are necessarily sub-optimal in the presence of switching costs, which motivates the need to design new algorithms. We propose two new algorithms: Threshold Based EXP3 and EXP3.SC. For the two special cases of symmetric PI setting and MAB, the expected regret of both of these algorithms is order optimal in the duration of the learning process. Additionally, Threshold Based EXP3 is order optimal in the switching cost, whereas EXP3.SC is not. Finally, empirical evaluations show that Threshold Based EXP3 outperforms the previously proposed order-optimal algorithms EXP3 SET in the presence of switching costs, and Batch EXP3 in the MAB setting with switching costs.'
volume: 89
URL: http://proceedings.mlr.press/v89/rangi19a.html
PDF: http://proceedings.mlr.press/v89/rangi19a/rangi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-rangi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Rangi
given: Anshuka
- family: Franceschetti
given: Massimo
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2435-2444
id: rangi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2435
lastpage: 2444
published: 2019-04-11 00:00:00 +0000
- title: 'Interpretable Almost-Exact Matching for Causal Inference'
abstract: 'Matching methods are heavily used in the social and health sciences due to their interpretability. We aim to create the highest possible quality of treatment-control matches for categorical data in the potential outcomes framework. The method proposed in this work aims to match units on a weighted Hamming distance, taking into account the relative importance of the covariates; the algorithm aims to match units on as many relevant variables as possible. To do this, the algorithm creates a hierarchy of covariate combinations on which to match (similar to downward closure), in the process solving an optimization problem for each unit in order to construct the optimal matches. The algorithm uses a single dynamic program to solve all of the units’ optimization problems simultaneously. Notable advantages of our method over existing matching procedures are its high-quality interpretable matches, versatility in handling different data distributions that may have irrelevant variables, and ability to handle missing data by matching on as many available covariates as possible.'
volume: 89
URL: http://proceedings.mlr.press/v89/dieng19b.html
PDF: http://proceedings.mlr.press/v89/dieng19b/dieng19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dieng19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dieng
given: Awa
- family: Liu
given: Yameng
- family: Roy
given: Sudeepa
- family: Rudin
given: Cynthia
- family: Volfovsky
given: Alexander
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2445-2453
id: dieng19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2445
lastpage: 2453
published: 2019-04-11 00:00:00 +0000
- title: 'Statistical Optimal Transport via Factored Couplings'
abstract: 'We propose a new method to estimate Wasserstein distances and optimal transport plans between two probability distributions from samples in high dimension. Unlike plug-in rules that simply replace the true distributions by their empirical counterparts, our method promotes couplings with low transport rank, a new structural assumption that is similar to the nonnegative rank of a matrix. Regularizing based on this assumption leads to drastic improvements on high-dimensional data for various tasks, including domain adaptation in single-cell RNA sequencing data. These findings are supported by a theoretical analysis that indicates that the transport rank is key in overcoming the curse of dimensionality inherent to data-driven optimal transport.'
volume: 89
URL: http://proceedings.mlr.press/v89/forrow19a.html
PDF: http://proceedings.mlr.press/v89/forrow19a/forrow19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-forrow19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Forrow
given: Aden
- family: Hütter
given: Jan-Christian
- family: Nitzan
given: Mor
- family: Rigollet
given: Philippe
- family: Schiebinger
given: Geoffrey
- family: Weed
given: Jonathan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2454-2465
id: forrow19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2454
lastpage: 2465
published: 2019-04-11 00:00:00 +0000
- title: '$HS^2$: Active learning over hypergraphs with pointwise and pairwise queries'
abstract: 'We propose a hypergraph-based active learning scheme which we term $HS^2$; $HS^2$ generalizes the previously reported algorithm $S^2$ originally proposed for graph-based active learning with pointwise queries. Our $HS^2$ method can accommodate hypergraph structures and allows one to ask both pointwise queries and pairwise queries. Based on a novel parametric system particularly designed for hypergraphs, we derive theoretical results on the query complexity of $HS^2$ for the above described generalized settings. Both the theoretical and empirical results show that $HS^2$ requires a significantly fewer number of queries than $S^2$ when one uses $S^2$ over a graph obtained from the corresponding hypergraph via clique expansion.'
volume: 89
URL: http://proceedings.mlr.press/v89/chien19a.html
PDF: http://proceedings.mlr.press/v89/chien19a/chien19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chien19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chien
given: I (Eli)
- family: Zhou
given: Huozhi
- family: Li
given: Pan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2466-2475
id: chien19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2466
lastpage: 2475
published: 2019-04-11 00:00:00 +0000
- title: 'Clustering Time Series with Nonlinear Dynamics: A Bayesian Non-Parametric and Particle-Based Approach'
abstract: 'We propose a general statistical framework for clustering multiple time series that exhibit nonlinear dynamics into an a-priori-unknown number of sub-groups. Our motivation comes from neuroscience, where an important problem is to identify, within a large assembly of neurons, subsets that respond similarly to a stimulus or contingency. Upon modeling the multiple time series as the output of a Dirichlet process mixture of nonlinear state-space models, we derive a Metropolis-within-Gibbs algorithm for full Bayesian inference that alternates between sampling cluster assignments and sampling parameter values that form the basis of the clustering. The Metropolis step employs recent innovations in particle-based methods. We apply the framework to clustering time series acquired from the prefrontal cortex of mice in an experiment designed to characterize the neural underpinnings of fear.'
volume: 89
URL: http://proceedings.mlr.press/v89/lin19b.html
PDF: http://proceedings.mlr.press/v89/lin19b/lin19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lin19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lin
given: Alexander
- family: Zhang
given: Yingzhuo
- family: Heng
given: Jeremy
- family: Allsop
given: Stephen A.
- family: Tye
given: Kay M.
- family: Jacob
given: Pierre E.
- family: Ba
given: Demba
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2476-2484
id: lin19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2476
lastpage: 2484
published: 2019-04-11 00:00:00 +0000
- title: 'Efficient Nonconvex Empirical Risk Minimization via Adaptive Sample Size Methods'
abstract: 'In this paper, we are interested in finding a local minimizer of an empirical risk minimization (ERM) problem where the loss associated with each sample is possibly a nonconvex function. Unlike traditional deterministic and stochastic algorithms that attempt to solve the ERM problem for the full training set, we propose an adaptive sample size scheme to reduce the overall computational complexity of finding a local minimum. To be more precise, we first find an approximate local minimum of the ERM problem corresponding to a small number of samples and use the uniform convergence theory to show that if the population risk is a Morse function, by properly increasing the size of training set the iterates generated by the proposed procedure always stay close to a local minimum of the corresponding ERM problem. Therefore, eventually, the proposed procedure finds a local minimum of the ERM corresponding to the full training set which happens to also be close to a local minimum of the expected risk minimization problem with high probability. We formally state the conditions on the size of the initial sample set and characterize the required accuracy for obtaining an approximate local minimum to ensure that the iterates always stay in a neighborhood of a local minimum and do not get attracted to saddle points.'
volume: 89
URL: http://proceedings.mlr.press/v89/mokhtari19a.html
PDF: http://proceedings.mlr.press/v89/mokhtari19a/mokhtari19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-mokhtari19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Mokhtari
given: Aryan
- family: Ozdaglar
given: Asuman
- family: Jadbabaie
given: Ali
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2485-2494
id: mokhtari19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2485
lastpage: 2494
published: 2019-04-11 00:00:00 +0000
- title: 'An Optimal Control Approach to Sequential Machine Teaching'
abstract: 'Given a sequential learning algorithm and a target model, sequential machine teaching aims to find the shortest training sequence to drive the learning algorithm to the target model. We present the first principled way to find such shortest training sequences. Our key insight is to formulate sequential machine teaching as a time-optimal control problem. This allows us to solve sequential teaching by leveraging key theoretical and computational tools developed over the past 60 years in the optimal control community. Specifically, we study the Pontryagin Maximum Principle, which yields a necessary condition for opti- mality of a training sequence. We present analytic, structural, and numerical implica- tions of this approach on a case study with a least-squares loss function and gradient de- scent learner. We compute optimal train- ing sequences for this problem, and although the sequences seem circuitous, we find that they can vastly outperform the best available heuristics for generating training sequences.'
volume: 89
URL: http://proceedings.mlr.press/v89/lessard19a.html
PDF: http://proceedings.mlr.press/v89/lessard19a/lessard19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lessard19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lessard
given: Laurent
- family: Zhang
given: Xuezhou
- family: Zhu
given: Xiaojin
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2495-2503
id: lessard19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2495
lastpage: 2503
published: 2019-04-11 00:00:00 +0000
- title: 'An Online Algorithm for Smoothed Regression and LQR Control'
abstract: 'We consider Online Convex Optimization (OCO) in the setting where the costs are $m$-strongly convex and the online learner pays a switching cost for changing decisions between rounds. We show that the recently proposed Online Balanced Descent (OBD) algorithm is constant competitive in this setting, with competitive ratio $3 + O(1/m)$, irrespective of the ambient dimension. Additionally, we show that when the sequence of cost functions is $\epsilon$-smooth, OBD has near-optimal dynamic regret and maintains strong per-round accuracy. We demonstrate the generality of our approach by showing that the OBD framework can be used to construct competitive algorithms for a variety of online problems across learning and control, including online variants of ridge regression, logistic regression, maximum likelihood estimation, and LQR control.'
volume: 89
URL: http://proceedings.mlr.press/v89/goel19a.html
PDF: http://proceedings.mlr.press/v89/goel19a/goel19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-goel19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Goel
given: Gautam
- family: Wierman
given: Adam
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2504-2513
id: goel19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2504
lastpage: 2513
published: 2019-04-11 00:00:00 +0000
- title: 'Uncertainty Autoencoders: Learning Compressed Representations via Variational Information Maximization'
abstract: 'Compressed sensing techniques enable efficient acquisition and recovery of sparse, highdimensional data signals via low-dimensional projections. In this work, we propose Uncertainty Autoencoders, a learning framework for unsupervised representation learning inspired by compressed sensing. We treat the low-dimensional projections as noisy latent representations of an autoencoder and directly learn both the acquisition (i.e., encoding) and amortized recovery (i.e., decoding) procedures. Our learning objective optimizes for a tractable variational lower bound to the mutual information between the datapoints and the latent representations. We show how our framework provides a unified treatment to several lines of research in dimensionality reduction, compressed sensing, and generative modeling. Empirically, we demonstrate a 32% improvement on average over competing approaches for the task of statistical compressed sensing of high-dimensional datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/grover19a.html
PDF: http://proceedings.mlr.press/v89/grover19a/grover19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-grover19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Grover
given: Aditya
- family: Ermon
given: Stefano
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2514-2524
id: grover19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2514
lastpage: 2524
published: 2019-04-11 00:00:00 +0000
- title: 'Structured Disentangled Representations'
abstract: 'Deep latent-variable models learn representations of high-dimensional data in an unsupervised manner. A number of recent efforts have focused on learning representations that disentangle statistically independent axes of variation by introducing modifications to the standard objective function. These approaches generally assume a simple diagonal Gaussian prior and as a result are not able to reliably disentangle discrete factors of variation. We propose a two-level hierarchical objective to control relative degree of statistical independence between blocks of variables and individual variables within blocks. We derive this objective as a generalization of the evidence lower bound, which allows us to explicitly represent the trade-offs between mutual information between data and representation, KL divergence between representation and prior, and coverage of the support of the empirical data distribution. Experiments on a variety of datasets demonstrate that our objective can not only disentangle discrete variables, but that doing so also improves disentanglement of other variables and, importantly, generalization even to unseen combinations of factors.'
volume: 89
URL: http://proceedings.mlr.press/v89/esmaeili19a.html
PDF: http://proceedings.mlr.press/v89/esmaeili19a/esmaeili19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-esmaeili19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Esmaeili
given: Babak
- family: Wu
given: Hao
- family: Jain
given: Sarthak
- family: Bozkurt
given: Alican
- family: Siddharth
given: N
- family: Paige
given: Brooks
- family: Brooks
given: Dana H.
- family: Dy
given: Jennifer
- family: Meent
given: Jan-Willem
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2525-2534
id: esmaeili19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2525
lastpage: 2534
published: 2019-04-11 00:00:00 +0000
- title: 'Estimating Network Structure from Incomplete Event Data'
abstract: 'Multivariate Bernoulli autoregressive (BAR) processes model time series of events in which the likelihood of current events is determined by the times and locations of past events. These processes can be used to model nonlinear dynamical systems corresponding to criminal activity, responses of patients to different medical treatment plans, opinion dynamics across social networks, epidemic spread, and more. Past work examines this problem under the assumption that the event data is complete, but in many cases only a fraction of events are observed. Incomplete observations pose a significant challenge in this setting because the unobserved events still govern the underlying dynamical system. In this work, we develop a novel approach to estimating the parameters of a BAR process in the presence of unobserved events via an unbiased estimator of the complete data log-likelihood function. We propose a computationally efficient estimation algorithm which approximates this estimator via Taylor series truncation and establish theoretical results for both the statistical error and optimization error of our algorithm. We further justify our approach by testing our method on both simulated data and a real data set consisting of crimes recorded by the city of Chicago.'
volume: 89
URL: http://proceedings.mlr.press/v89/mark19a.html
PDF: http://proceedings.mlr.press/v89/mark19a/mark19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-mark19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Mark
given: Benjamin
- family: Raskutti
given: Garvesh
- family: Willett
given: Rebecca
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2535-2544
id: mark19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2535
lastpage: 2544
published: 2019-04-11 00:00:00 +0000
- title: 'Locally Private Mean Estimation: $Z$-test and Tight Confidence Intervals'
abstract: 'This work provides tight upper- and lower-bounds for the problem of mean estimation under differential privacy in the local-model, when the input is composed of $n$ i.i.d. drawn samples from a Gaussian. Our algorithms result in a $(1-\beta)$-confidence interval for the underlying distribution’s mean of length $O(\sigma *sqrt(log(n/beta)log(1/\beta))/(\epsilon*sqrt(n))$. In addition, our algorithms leverage on binary search using local differential privacy for quantile estimation, a result which may be of separate interest. Moreover, our algorithms have a matching lower-bound, where we prove that any one-shot (each individual is presented with a single query) local differentially private algorithm must return an interval of length $\Omega(\sigma*sqrt(\log(1/\beta))/(\epsilon*sqrt(n)))$.'
volume: 89
URL: http://proceedings.mlr.press/v89/gaboardi19a.html
PDF: http://proceedings.mlr.press/v89/gaboardi19a/gaboardi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-gaboardi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Gaboardi
given: Marco
- family: Rogers
given: Ryan
- family: Sheffet
given: Or
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2545-2554
id: gaboardi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2545
lastpage: 2554
published: 2019-04-11 00:00:00 +0000
- title: 'Estimation of Non-Normalized Mixture Models'
abstract: 'We develop a general method for estimating a finite mixture of non-normalized models. A non-normalized model is defined to be a parametric distribution with an intractable normalization constant. Existing methods for estimating non-normalized models without computing the normalization constant are not applicable to mixture models because they contain more than one intractable normalization constant. The proposed method is derived by extending noise contrastive estimation (NCE), which estimates non-normalized models by discriminating between the observed data and some artificially generated noise. In particular, the proposed method provides a probabilistically principled clustering method that is able to utilize a deep representation. Applications to clustering of natural images and neuroimaging data give promising results.'
volume: 89
URL: http://proceedings.mlr.press/v89/matsuda19a.html
PDF: http://proceedings.mlr.press/v89/matsuda19a/matsuda19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-matsuda19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Matsuda
given: Takeru
- family: Hyvärinen
given: Aapo
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2555-2563
id: matsuda19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2555
lastpage: 2563
published: 2019-04-11 00:00:00 +0000
- title: 'Rotting bandits are no harder than stochastic ones'
abstract: 'In stochastic multi-armed bandits, the reward distribution of each arm is assumed to be stationary. This assumption is often violated in practice (e.g., in recommendation systems), where the reward of an arm may change whenever is selected, i.e., rested bandit setting. In this paper, we consider the non-parametric rotting bandit setting, where rewards can only decrease. We introduce the filtering on expanding window average (FEWA) algorithm that constructs moving averages of increasing windows to identify arms that are more likely to return high rewards when pulled once more. We prove that for an unknown horizon T, and without any knowledge on the decreasing behavior of the K arms, FEWA achieves problem-dependent regret bound of $O(\log(KT))$, and a problem-independent one of $O(\sqrt(KT))$. Our result substantially improves over the algorithm of Levine et al. (2017), which suffers regret $O(K^(1/3) T^(2/3)$. FEWA also matches known bounds for the stochastic bandit setting, thus showing that the rotting bandits are not harder. Finally, we report simulations confirming the theoretical improvements of FEWA.'
volume: 89
URL: http://proceedings.mlr.press/v89/seznec19a.html
PDF: http://proceedings.mlr.press/v89/seznec19a/seznec19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-seznec19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Seznec
given: Julien
- family: Locatelli
given: Andrea
- family: Carpentier
given: Alexandra
- family: Lazaric
given: Alessandro
- family: Valko
given: Michal
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2564-2572
id: seznec19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2564
lastpage: 2572
published: 2019-04-11 00:00:00 +0000
- title: 'A Topological Regularizer for Classifiers via Persistent Homology'
abstract: 'Regularization plays a crucial role in supervised learning. Most existing methods enforce a global regularization in a structure agnostic manner. In this paper, we initiate a new direction and propose to enforce the structural simplicity of the classification boundary by regularizing over its topological complexity. In particular, our measurement of topological complexity incorporates the importance of topological features (e.g., connected components, handles, and so on) in a meaningful manner, and provides a direct control over spurious topological structures. We incorporate the new measurement as a topological penalty in training classifiers. We also propose an efficient algorithm to compute the gradient of such penalty. Our method provides a novel way to topologically simplify the global structure of the model, without having to sacrifice too much of the flexibility of the model. We demonstrate the effectiveness of our new topological regularizer on a range of synthetic and real-world datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/chen19g.html
PDF: http://proceedings.mlr.press/v89/chen19g/chen19g.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chen19g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chen
given: Chao
- family: Ni
given: Xiuyan
- family: Bai
given: Qinxun
- family: Wang
given: Yusu
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2573-2582
id: chen19g
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2573
lastpage: 2582
published: 2019-04-11 00:00:00 +0000
- title: 'Overcomplete Independent Component Analysis via SDP'
abstract: 'We present a novel algorithm for overcomplete independent components analysis (ICA), where the number of latent sources k exceeds the dimension p of observed variables. Previous algorithms either suffer from high computational complexity or make strong assumptions about the form of the mixing matrix. Our algorithm does not make any sparsity assumption yet enjoys favorable computational and theoretical properties. Our algorithm consists of two main steps: (a) estimation of the Hessians of the cumulant generating function (as opposed to the fourth and higher order cumulants used by most algorithms) and (b) a novel semi-definite programming (SDP) relaxation for recovering a mixing component. We show that this relaxation can be efficiently solved with a projected accelerated gradient descent method, which makes the whole algorithm computationally practical. Moreover, we conjecture that the proposed program recovers a mixing component at the rate $k < p^2/4$ and prove that a mixing component can be recovered with high probability when $k <(2 - \epsilon)p\log p$ when the original components are sampled uniformly at random on the hyper sphere. Experiments are provided on synthetic data and the CIFAR-10 dataset of real images.'
volume: 89
URL: http://proceedings.mlr.press/v89/podosinnikova19a.html
PDF: http://proceedings.mlr.press/v89/podosinnikova19a/podosinnikova19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-podosinnikova19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Podosinnikova
given: Anastasia
- family: Perry
given: Amelia
- family: Wein
given: Alexander S.
- family: Bach
given: Francis
- family: d’Aspremont
given: Alexandre
- family: Sontag
given: David
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2583-2592
id: podosinnikova19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2583
lastpage: 2592
published: 2019-04-11 00:00:00 +0000
- title: 'Doubly Semi-Implicit Variational Inference'
abstract: 'We extend the existing framework of semi-implicit variational inference (SIVI) and introduce doubly semi-implicit variational inference (DSIVI), a way to perform variational inference and learning when both the approximate posterior and the prior distribution are semi-implicit. In other words, DSIVI performs inference in models where the prior and the posterior can be expressed as an intractable infinite mixture of some analytic density with a highly flexible implicit mixing distribution. We provide a sandwich bound on the evidence lower bound (ELBO) objective that can be made arbitrarily tight. Unlike discriminator-based and kernel-based approaches to implicit variational inference, DSIVI optimizes a proper lower bound on ELBO that is asymptotically exact. We evaluate DSIVI on a set of problems that benefit from implicit priors. In particular, we show that DSIVI gives rise to a simple modification of VampPrior, the current state-of-the-art prior for variational autoencoders, which improves its performance.'
volume: 89
URL: http://proceedings.mlr.press/v89/molchanov19a.html
PDF: http://proceedings.mlr.press/v89/molchanov19a/molchanov19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-molchanov19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Molchanov
given: Dmitry
- family: Kharitonov
given: Valery
- family: Sobolev
given: Artem
- family: Vetrov
given: Dmitry
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2593-2602
id: molchanov19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2593
lastpage: 2602
published: 2019-04-11 00:00:00 +0000
- title: 'Reducing training time by efficient localized kernel regression'
abstract: 'We study generalization properties of kernel regularized least squares regression based on a partitioning approach. We show that optimal rates of convergence are preserved if the number of local sets grows sufficiently slowly with the sample size. Moreover, the partitioning approach can be efficiently combined with local Nyström subsampling, improving computational cost twofold.'
volume: 89
URL: http://proceedings.mlr.press/v89/muecke19a.html
PDF: http://proceedings.mlr.press/v89/muecke19a/muecke19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-muecke19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Müecke
given: Nicole
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2603-2610
id: muecke19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2603
lastpage: 2610
published: 2019-04-11 00:00:00 +0000
- title: 'Scalable High-Order Gaussian Process Regression'
abstract: 'While most Gaussian processes (GP) work focus on learning single-output functions, many applications, such as physical simulations and gene expressions prediction, require estimations of functions with many outputs. The number of outputs can be much larger than or comparable to the size of training samples. Existing multi-output GP models either are limited to low-dimensional outputs and restricted kernel choices, or assume oversimplified low-rank structures within the outputs. To address these issues, we propose HOGPR, a High-Order Gaussian Process Regression model, which can flexibly capture complex correlations among the outputs and scale up to a large number of outputs. Specifically, we tensorize the high-dimensional outputs, introducing latent coordinate features to index each tensor element (i.e., output) and to capture their correlations. We then generalize a multilinear model to a hybrid of a GP and latent GP model. The model is endowed with a Kronecker product structure over the inputs and the latent features. Using the Kronecker product properties and tensor algebra, we are able to perform exact inference over millions of outputs. We show the advantage of the proposed model on several real-world applications.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhe19a.html
PDF: http://proceedings.mlr.press/v89/zhe19a/zhe19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhe19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhe
given: Shandian
- family: Xing
given: Wei
- family: Kirby
given: Robert M.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2611-2620
id: zhe19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2611
lastpage: 2620
published: 2019-04-11 00:00:00 +0000
- title: 'A Higher-Order Kolmogorov-Smirnov Test'
abstract: 'We present an extension of the Kolmogorov-Smirnov (KS) two-sample test, which can be more sensitive to differences in the tails. Our test statistic is an integral probability metric (IPM) defined over a higher-order total variation ball, recovering the original KS test as its simplest case. We give an exact representer result for our IPM, which generalizes the fact that the original KS test statistic can be expressed in equivalent variational and CDF forms. For small enough orders $(k \le 5)$, we develop a linear-time algorithm for computing our higher-order KS test statistic; for all others $(k \ge 6)$, we give a nearly linear-time approximation. We derive the asymptotic null distribution for our test, and show that our nearly linear-time approximation shares the same asymptotic null. Lastly, we complement our theory with numerical studies.'
volume: 89
URL: http://proceedings.mlr.press/v89/sadhanala19a.html
PDF: http://proceedings.mlr.press/v89/sadhanala19a/sadhanala19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-sadhanala19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Sadhanala
given: Veeranjaneyulu
- family: Wang
given: Yu-Xiang
- family: Ramdas
given: Aaditya
- family: Tibshirani
given: Ryan J.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2621-2630
id: sadhanala19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2621
lastpage: 2630
published: 2019-04-11 00:00:00 +0000
- title: 'Bayesian Learning of Conditional Kernel Mean Embeddings for Automatic Likelihood-Free Inference'
abstract: 'In likelihood-free settings where likelihood evaluations are intractable, approximate Bayesian computation (ABC) addresses the formidable inference task to discover plausible parameters of simulation programs that explain the observations. However, they demand large quantities of simulation calls. Critically, hyperparameters that determine measures of simulation discrepancy crucially balance inference accuracy and sample efficiency, yet are difficult to tune. In this paper, we present kernel embedding likelihood-free inference (KELFI), a holistic framework that automatically learns model hyperparameters to improve inference accuracy given limited simulation budget. By leveraging likelihood smoothness with conditional mean embeddings, we nonparametrically approximate likelihoods and posteriors as surrogate densities and sample from closed-form posterior mean embeddings, whose hyperparameters are learned under its approximate marginal likelihood. Our modular framework demonstrates improved accuracy and efficiency on challenging inference problems in ecology.'
volume: 89
URL: http://proceedings.mlr.press/v89/hsu19a.html
PDF: http://proceedings.mlr.press/v89/hsu19a/hsu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hsu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hsu
given: Kelvin
- family: Ramos
given: Fabio
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2631-2640
id: hsu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2631
lastpage: 2640
published: 2019-04-11 00:00:00 +0000
- title: 'Parallel Asynchronous Stochastic Coordinate Descent with Auxiliary Variables'
abstract: 'The key to the recent success of coordinate descent (CD) in many applications is to maintain a set of auxiliary variables to facilitate efficient single variable updates. For example, the vector of residual/primal variables has to be maintained when CD is applied for Lasso/linear SVM, respectively. An implementation without maintenance is $O(n)$ times slower than the one with maintenance, where n is the number of variables. In serial implementations, maintaining auxiliary variables is only a computing trick without changing the behavior of coordinate descent. However, maintenance of auxiliary variables is non-trivial when there are multiple threads/workers which read/write the auxiliary variables concurrently. Thus, most existing theoretical analysis of parallel CD either assumes vanilla CD without auxiliary variables (which ends up being extremely slow in practice) or limits to a small class of problems. In this paper, we consider a rich family of objective functions where AUX-PCD can be applied. We also establish global linear convergence for AUX-PCD with atomic operations for a general family of functions and perform a complete backward error analysis of AUX-PCD with wild updates, where some updates are not just delayed but lost because of memory conflicts. Our results enable us to provide theoretical guarantees for many practical parallel coordinate descent implementations, which currently lack guarantees (such as the implementation of Shotgun by Bradley et al. 2011, which uses auxiliary variables)'
volume: 89
URL: http://proceedings.mlr.press/v89/yu19d.html
PDF: http://proceedings.mlr.press/v89/yu19d/yu19d.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-yu19d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Yu
given: Hsiang-Fu
- family: Hsieh
given: Cho-Jui
- family: Dhillon
given: Inderjit S.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2641-2649
id: yu19d
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2641
lastpage: 2649
published: 2019-04-11 00:00:00 +0000
- title: 'Credit Assignment Techniques in Stochastic Computation Graphs'
abstract: 'Stochastic computation graphs (SCGs) provide a formalism to represent structured optimization problems arising in artificial intelligence, including supervised, unsupervised, and reinforcement learning. Previous work has shown that an unbiased estimator of the gradient of the expected loss of SCGs can be derived from a single principle. However, this estimator often has high variance and requires a full model evaluation per data point, making this algorithm costly in large graphs. In this work, we address these problems by generalizing concepts from the reinforcement learning literature. We introduce the concepts of value functions, baselines and critics for arbitrary SCGs, and show how to use them to derive lower-variance gradient estimates from partial model evaluations, paving the way towards general and efficient credit assignment for gradient-based optimization. In doing so, we demonstrate how our results unify recent advances in the probabilistic inference and reinforcement learning literature.'
volume: 89
URL: http://proceedings.mlr.press/v89/weber19a.html
PDF: http://proceedings.mlr.press/v89/weber19a/weber19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-weber19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Weber
given: Théophane
- family: Heess
given: Nicolas
- family: Buesing
given: Lars
- family: Silver
given: David
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2650-2660
id: weber19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2650
lastpage: 2660
published: 2019-04-11 00:00:00 +0000
- title: 'Efficient Bayesian Optimization for Target Vector Estimation'
abstract: 'We consider the problem of estimating a target vector by querying an unknown multi-output function which is stochastic and expensive to evaluate. Through sequential experimental design the aim is to minimize the squared Euclidean distance between the output of the function and the target vector. Applying standard single-objective Bayesian optimization to this problem is both wasteful, since individual output components are never observed, and imprecise since the predictive distribution for new inputs will be symmetric and have negative support. We address this issue by proposing a Gaussian process model that considers the individual function outputs and derive a distribution over the resulting 2-norm. Furthermore we derive computationally efficient acquisition functions and evaluate the resulting optimization framework on several synthetic problems and a real-world problem. The results demonstrate a significant improvement over Bayesian optimization based on both standard and warped Gaussian processes.'
volume: 89
URL: http://proceedings.mlr.press/v89/uhrenholt19a.html
PDF: http://proceedings.mlr.press/v89/uhrenholt19a/uhrenholt19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-uhrenholt19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Uhrenholt
given: Anders Kirk
- family: Jensen
given: Bjøern Sand
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2661-2670
id: uhrenholt19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2661
lastpage: 2670
published: 2019-04-11 00:00:00 +0000
- title: 'Correspondence Analysis Using Neural Networks'
abstract: 'Correspondence analysis (CA) is a multivariate statistical tool used to visualize and interpret data dependencies. CA has found applications in fields ranging from epidemiology to social sciences. However, current methods used to perform CA do not scale to large, high-dimensional datasets. By re-interpreting the objective in CA using an information-theoretic tool called the principal inertia components, we demonstrate that performing CA is equivalent to solving a functional optimization problem over the space of finite variance functions of two random variable. We show that this optimization problem, in turn, can be efficiently approximated by neural networks. The resulting formulation, called the correspondence analysis neural network (CA-NN), enables CA to be performed at an unprecedented scale. We validate the CA-NN on synthetic data, and demonstrate how it can be used to perform CA on a variety of datasets, including food recipes, wine compositions, and images. Our results outperform traditional methods used in CA, indicating that CA-NN can serve as a new, scalable tool for interpretability and visualization of complex dependencies between random variables.'
volume: 89
URL: http://proceedings.mlr.press/v89/hsu19b.html
PDF: http://proceedings.mlr.press/v89/hsu19b/hsu19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-hsu19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Hsu
given: Hsiang
- family: Salamatian
given: Salman
- family: Calmon
given: Flavio P.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2671-2680
id: hsu19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2671
lastpage: 2680
published: 2019-04-11 00:00:00 +0000
- title: 'Interpolating between Optimal Transport and MMD using Sinkhorn Divergences'
abstract: 'Comparing probability distributions is a fundamental problem in data sciences. Simple norms and divergences such as the total variation and the relative entropy only compare densities in a point-wise manner and fail to capture the geometric nature of the problem. In sharp contrast, Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between measures that take into account the geometry of the underlying space and metrize the convergence in law. This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT. Relying on a new notion of geometric entropy, we provide theoretical guarantees for these divergences: positivity, convexity and metrization of the convergence in law. On the practical side, we detail a numerical scheme that enables the large scale application of these divergences for machine learning: on the GPU, gradients of the Sinkhorn loss can be computed for batches of a million samples.'
volume: 89
URL: http://proceedings.mlr.press/v89/feydy19a.html
PDF: http://proceedings.mlr.press/v89/feydy19a/feydy19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-feydy19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Feydy
given: Jean
- family: Séjourné
given: Thibault
- family: Vialard
given: François-Xavier
- family: Amari
given: Shun-ichi
- family: Trouve
given: Alain
- family: Peyré
given: Gabriel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2681-2690
id: feydy19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2681
lastpage: 2690
published: 2019-04-11 00:00:00 +0000
- title: 'Multi-Observation Regression'
abstract: 'Given a data set of $(x,y)$ pairs, a common learning task is to fit a model predicting $y$ (a label or dependent variable) conditioned on $x$. This paper considers the similar but much less-understood problem of modeling “higher-order” statistics of $y$’s distribution conditioned on $x$. Such statistics are often challenging to estimate using traditional empirical risk minimization (ERM) approaches. We develop and theoretically analyze an ERM-like approach with multi-observation loss functions. We propose four algorithms formalizing the concept of ERM for this problem, two of which have statistical guarantees in settings allowing both slow and fast convergence rates, but which are out-performed empirically by the other two. Empirical results illustrate potential practicality of these algorithms in low dimensions and significant improvement over standard approaches in some settings.'
volume: 89
URL: http://proceedings.mlr.press/v89/frongillo19a.html
PDF: http://proceedings.mlr.press/v89/frongillo19a/frongillo19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-frongillo19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Frongillo
given: Rafael
- family: Mehta
given: Nishant A.
- family: Morgan
given: Tom
- family: Waggoner
given: Bo
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2691-2700
id: frongillo19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2691
lastpage: 2700
published: 2019-04-11 00:00:00 +0000
- title: 'Adaptive MCMC via Combining Local Samplers'
abstract: 'Markov chain Monte Carlo (MCMC) methods are widely used in machine learning. One of the major problems with MCMC is the question of how to design chains that mix fast over the whole state space; in particular, how to select the parameters of an MCMC algorithm. Here we take a different approach and, similarly to parallel MCMC methods, instead of trying to find a single chain that samples from the whole distribution, we combine samples from several chains run in parallel, each exploring only parts of the state space (e.g., a few modes only). The chains are prioritized based on the kernel Stein discrepancy, which provides a good measure of performance locally. The samples from the independent chains are combined using a novel technique for estimating the probability of different regions of the sample space. Experimental results demonstrate that the proposed algorithm may provide significant speedups in different sampling problems. Most importantly, when combined with the state-of-the-art NUTS algorithm as the base MCMC sampler, our method remained competitive with NUTS on sampling from unimodal distributions, while significantly outperformed state-of-the-art competitors on synthetic multimodal problems as well as on a challenging sensor localization task.'
volume: 89
URL: http://proceedings.mlr.press/v89/shaloudegi19a.html
PDF: http://proceedings.mlr.press/v89/shaloudegi19a/shaloudegi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-shaloudegi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Shaloudegi
given: Kiárash
- family: György
given: András
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2701-2710
id: shaloudegi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2701
lastpage: 2710
published: 2019-04-11 00:00:00 +0000
- title: 'Variance reduction properties of the reparameterization trick'
abstract: 'The reparameterization trick is widely used in variational inference as it yields more accurate estimates of the gradient of the variational objective than alternative approaches such as the score function method. Although there is overwhelming empirical evidence in the literature showing its success, there is relatively little research exploring why the reparameterization trick is so effective. We explore this under the idealized assumptions that the variational approximation is a mean-field Gaussian density and that the log of the joint density of the model parameters and the data is a quadratic function that depends on the variational mean. From this, we show that the marginal variances of the reparameterization gradient estimator are smaller than those of the score function gradient estimator. We apply the result of our idealized analysis to real-world examples.'
volume: 89
URL: http://proceedings.mlr.press/v89/xu19a.html
PDF: http://proceedings.mlr.press/v89/xu19a/xu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-xu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Xu
given: Ming
- family: Quiroz
given: Matias
- family: Kohn
given: Robert
- family: Sisson
given: Scott A.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2711-2720
id: xu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2711
lastpage: 2720
published: 2019-04-11 00:00:00 +0000
- title: 'Hierarchical Clustering for Euclidean Data'
abstract: 'Recent works on Hierarchical Clustering (HC), a well-studied problem in exploratory data analysis, have focused on optimizing various objective functions for this problem under arbitrary similarity measures. In this paper we take the first step and give novel scalable algorithms for this problem tailored to Euclidean data in R^d and under vector-based similarity measures, a prevalent model in several typical machine learning applications. We focus primarily on the popular Gaussian kernel and present our results through the lens of the objective introduced recently by [MW’17]. We show the approximation factor in [MW’17] can be improved for Euclidean data. We further demonstrate both theoretically and experimentally that our algorithms scale to very high dimension d, while outperforming average-linkage and showing competitive results against other less scalable approaches.'
volume: 89
URL: http://proceedings.mlr.press/v89/charikar19a.html
PDF: http://proceedings.mlr.press/v89/charikar19a/charikar19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-charikar19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Charikar
given: Moses
- family: Chatziafratis
given: Vaggos
- family: Niazadeh
given: Rad
- family: Yaroslavtsev
given: Grigory
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2721-2730
id: charikar19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2721
lastpage: 2730
published: 2019-04-11 00:00:00 +0000
- title: 'Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization'
abstract: 'Cubic regularization (CR) is an optimization method with emerging popularity due to its capability to escape saddle points and converge to second-order stationary solutions for nonconvex optimization. However, CR encounters a high sample complexity issue for finite-sum problems with a large data size. Various inexact variants of CR have been proposed to improve the sample complexity. In this paper, we propose a stochastic variance-reduced cubic-regularization (SVRC) method under random sampling, and study its convergence guarantee as well as sample complexity. We show that the iteration complexity of SVRC for achieving a second-order stationary solution within $\epsilon$ accuracy is $O(\epsilon^{-3/2})$, which matches the state-of-art result on CR types methods. Moreover, our proposed variance reduction scheme significantly reduces the per-iteration sample complexity. The resulting total Hessian sample complexity of our SVRC is $O(N^{2/3} \epsilon^{-3/2})$, which outperforms the state-of-art result by a factor of $O(N^{2/15})$. We also study our SVRC under random sampling without replacement scheme, which yields a lower per-iteration sample complexity, and hence justifies its practical applicability.'
volume: 89
URL: http://proceedings.mlr.press/v89/wang19d.html
PDF: http://proceedings.mlr.press/v89/wang19d/wang19d.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-wang19d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Wang
given: Zhe
- family: Zhou
given: Yi
- family: Liang
given: Yingbin
- family: Lan
given: Guanghui
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2731-2740
id: wang19d
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2731
lastpage: 2740
published: 2019-04-11 00:00:00 +0000
- title: 'Variational Noise-Contrastive Estimation'
abstract: 'Unnormalised latent variable models are a broad and flexible class of statistical models. However, learning their parameters from data is intractable, and few estimation techniques are currently available for such models. To increase the number of techniques in our arsenal, we propose variational noise-contrastive estimation (VNCE), building on NCE which is a method that only applies to unnormalised models. The core idea is to use a variational lower bound to the NCE objective function, which can be optimised in the same fashion as the evidence lower bound (ELBO) in standard variational inference (VI). We prove that VNCE can be used for both parameter estimation of unnormalised models and posterior inference of latent variables. The developed theory shows that VNCE has the same level of generality as standard VI, meaning that advances made there can be directly imported to the unnormalised setting. We validate VNCE on toy models and apply it to a realistic problem of estimating an undirected graphical model from incomplete data.'
volume: 89
URL: http://proceedings.mlr.press/v89/rhodes19a.html
PDF: http://proceedings.mlr.press/v89/rhodes19a/rhodes19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-rhodes19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Rhodes
given: Benjamin
- family: Gutmann
given: Michael U.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2741-2750
id: rhodes19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2741
lastpage: 2750
published: 2019-04-11 00:00:00 +0000
- title: 'Improving Quadrature for Constrained Integrands'
abstract: 'We present an improved Bayesian framework for performing inference of affine transformations of constrained functions. We focus on quadrature with nonnegative functions, a common task in Bayesian inference. We consider constraints on the range of the function of interest, such as nonnegativity or boundedness. Although our framework is general, we derive explicit approximation schemes for these constraints, and argue for the use of a log transformation for functions with high dynamic range such as likelihood surfaces. We propose a novel method for optimizing hyperparameters in this framework: we optimize the marginal likelihood in the original space, as opposed to in the transformed space. The result is a model that better explains the actual data. Experiments on synthetic and real-world data demonstrate our framework achieves superior estimates using less wall-clock time than existing Bayesian quadrature procedures.'
volume: 89
URL: http://proceedings.mlr.press/v89/chai19a.html
PDF: http://proceedings.mlr.press/v89/chai19a/chai19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chai19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chai
given: Henry R.
- family: Garnett
given: Roman
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2751-2759
id: chai19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2751
lastpage: 2759
published: 2019-04-11 00:00:00 +0000
- title: 'High Dimensional Inference in Partially Linear Models'
abstract: 'We propose two semiparametric versions of the debiased Lasso procedure for the model $Y_{i}=X_{i}\beta_{0}+g_{0}(Z_{i})+\varepsilon_{i}$, where the parameter vector of interest $\beta_{0}$ is high dimensional but sparse (exactly or approximately) and $g_{0}$ is an unknown nuisance function. Both versions are shown to have the same asymptotic normal distribution and do not require the minimal signal condition for statistical inference of any component in $\beta_{0}$. We further develop a simultaneous hypothesis testing procedure based on multiplier bootstrap. Our testing method takes into account of the dependence structure within the debiased estimates, and allows the number of tested components to be exponentially high.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhu19c.html
PDF: http://proceedings.mlr.press/v89/zhu19c/zhu19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhu19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhu
given: Ying
- family: Yu
given: Zhuqing
- family: Cheng
given: Guang
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2760-2769
id: zhu19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2760
lastpage: 2769
published: 2019-04-11 00:00:00 +0000
- title: 'Cost aware Inference for IoT Devices'
abstract: 'Networked embedded devices (IoTs) of limited CPU, memory and power resources are revolutionizing data gathering, remote monitoring and planning in many consumer and business applications. Nevertheless, resource limitations place a significant burden on their service life and operation, warranting cost-aware methods that are capable of distributively screening redundancies in device information and transmitting informative data. We propose to train a decentralized gated network that, given an observed instance at test-time, allows for activation of select devices to transmit information to a central node, which then performs inference. We analyze our proposed gradient descent algorithm for Gaussian features and establish convergence guarantees under good initialization. We conduct experiments on a number of real-world datasets arising in IoT applications and show that our model results in over 1.5X service life with negligible accuracy degradation relative to a performance achievable by a neural network.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhu19d.html
PDF: http://proceedings.mlr.press/v89/zhu19d/zhu19d.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhu19d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhu
given: Pengkai
- family: Acar
given: Durmus Alp Emre
- family: Feng
given: Nan
- family: Jain
given: Prateek
- family: Saligrama
given: Venkatesh
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2770-2779
id: zhu19d
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2770
lastpage: 2779
published: 2019-04-11 00:00:00 +0000
- title: 'Banded Matrix Operators for Gaussian Markov Models in the Automatic Differentiation Era'
abstract: 'Banded matrices can be used as precision matrices in several models including linear state-space models, some Gaussian processes, and Gaussian Markov random fields. The aim of the paper is to make modern inference methods (such as variational inference or gradient-based sampling) available for Gaussian models with banded precision. We show that this can efficiently be achieved by equipping an automatic differentiation framework, such as TensorFlow or PyTorch, with some linear algebra operators dedicated to banded matrices. This paper studies the algorithmic aspects of the required operators, details their reverse-mode derivatives, and show that their complexity is linear in the number of observations.'
volume: 89
URL: http://proceedings.mlr.press/v89/durrande19a.html
PDF: http://proceedings.mlr.press/v89/durrande19a/durrande19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-durrande19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Durrande
given: Nicolas
- family: Adam
given: Vincent
- family: Bordeaux
given: Lucas
- family: Eleftheriadis
given: Stefanos
- family: Hensman
given: James
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2780-2789
id: durrande19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2780
lastpage: 2789
published: 2019-04-11 00:00:00 +0000
- title: 'A Unified Weight Learning Paradigm for Multi-view Learning'
abstract: 'Learning a set of weights to combine views linearly forms a series of popular schemes in multi-view learning. Three weight learning paradigms, i.e., Norm Regularization (NR), Exponential Decay (ED), and p-th Root Loss (pRL), are widely used in the literature, while the relations between them and the limiting behaviors of them are not well understood yet. In this paper, we present a Unified Paradigm (UP) that contains the aforementioned three popular paradigms as special cases. Specifically, we extend the domain of hyper-parameters of NR from positive to real numbers and show this extension bridges NR, ED, and pRL. Besides, we provide detailed discussion on the weights sparsity, hyper-parameter setting, and counterintuitive limiting behavior of these paradigms. Furthermore, we show the generality of our technique with examples in Multi-Task Learning and Fuzzy Clustering. Our results may provide insights to understand existing algorithms better and inspire research on new weight learning schemes. Numerical results support our theoretical analysis.'
volume: 89
URL: http://proceedings.mlr.press/v89/tian19a.html
PDF: http://proceedings.mlr.press/v89/tian19a/tian19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-tian19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Tian
given: Lai
- family: Nie
given: Feiping
- family: Li
given: Xuelong
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2790-2800
id: tian19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2790
lastpage: 2800
published: 2019-04-11 00:00:00 +0000
- title: 'Region-Based Active Learning'
abstract: 'We study a scenario of active learning where the input space is partitioned into different regions and where a distinct hypothesis is learned for each region. We first introduce a new active learning algorithm (EIWAL), which is an enhanced version of the IWAL algorithm, based on a finer analysis that results in more favorable learning guarantees. Then, we present a new learning algorithm for region-based active learning, ORIWAL, in which either IWAL or EIWAL serve as a subroutine. ORIWAL optimally allocates points to the subroutine algorithm for each region. We give a detailed theoretical analysis of ORIWAL, including generalization error guarantees and bounds on the number of points labeled, in terms of both the hypothesis set used in each region and the probability mass of that region. We also report the results of several experiments for our algorithm which demonstrate substantial benefits over existing non-region-based active learning algorithms, such as IWAL, and over passive learning.'
volume: 89
URL: http://proceedings.mlr.press/v89/cortes19a.html
PDF: http://proceedings.mlr.press/v89/cortes19a/cortes19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-cortes19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Cortes
given: Corinna
- family: DeSalvo
given: Giulia
- family: Gentile
given: Claudio
- family: Mohri
given: Mehryar
- family: Zhang
given: Ningshan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2801-2809
id: cortes19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2801
lastpage: 2809
published: 2019-04-11 00:00:00 +0000
- title: 'Precision Matrix Estimation with Noisy and Missing Data'
abstract: 'Estimating conditional dependence graphs and precision matrices are some of the most common problems in modern statistics and machine learning. When data are fully observed, penalized maximum likelihood-type estimators have become standard tools for estimating graphical models under sparsity conditions. Extensions of these methods to more complex settings where data are contaminated with additive or multiplicative noise have been developed in recent years. In these settings, however, the relative performance of different methods is not well understood and algorithmic gaps still exist. In particular, in high-dimensional settings these methods require using non-positive semidefinite matrices as inputs, presenting novel optimization challenges. We develop an alternating direction method of multipliers (ADMM) algorithm for these problems, providing a feasible algorithm to estimate precision matrices with indefinite input and potentially nonconvex penalties. We compare this method with existing alternative solutions and empirically characterize the tradeoffs between them. Finally, we use this method to explore the networks among US senators estimated from voting records data.'
volume: 89
URL: http://proceedings.mlr.press/v89/fan19a.html
PDF: http://proceedings.mlr.press/v89/fan19a/fan19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-fan19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Fan
given: Roger
- family: Jang
given: Byoungwook
- family: Sun
given: Yuekai
- family: Zhou
given: Shuheng
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2810-2819
id: fan19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2810
lastpage: 2819
published: 2019-04-11 00:00:00 +0000
- title: 'Exploring $k$ out of Top $ρ$ Fraction of Arms in Stochastic Bandits'
abstract: 'This paper studies the problem of identifying any $k$ distinct arms among the top $\rho$ fraction (e.g., top 5%) of arms from a finite or infinite set with a probably approximately correct (PAC) tolerance $\epsilon$. We consider two cases: (i) when the threshold of the top arms’ expected rewards is known and (ii) when it is unknown. We prove lower bounds for the four variants (finite or infinite arms, and known or unknown threshold), and propose algorithms for each. Two of these algorithms are shown to be sample complexity optimal (up to constant factors) and the other two are optimal up to a log factor. Results in this paper provide up to $\rho n/k$ reductions compared with the “$k$-exploration” algorithms that focus on finding the (PAC) best $k$ arms out of $n$ arms. We also numerically show improvements over the state-of-the-art.'
volume: 89
URL: http://proceedings.mlr.press/v89/ren19a.html
PDF: http://proceedings.mlr.press/v89/ren19a/ren19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ren19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ren
given: Wenbo
- family: Liu
given: Jia
- family: Shroff
given: Ness B.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2820-2828
id: ren19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2820
lastpage: 2828
published: 2019-04-11 00:00:00 +0000
- title: 'AutoML from Service Provider’s Perspective: Multi-device, Multi-tenant Model Selection with GP-EI'
abstract: 'AutoML has become a popular service that is provided by most leading cloud service providers today. In this paper, we focus on the AutoML problem from the \emph{service provider’s perspective}, motivated by the following practical consideration: When an AutoML service needs to serve {\em multiple users} with {\em multiple devices} at the same time, how can we allocate these devices to users in an efficient way? We focus on GP-EI, one of the most popular algorithms for automatic model selection and hyperparameter tuning, used by systems such as Google Vizer. The technical contribution of this paper is the first multi-device, multi-tenant algorithm for GP-EI that is aware of \emph{multiple} computation devices and multiple users sharing the same set of computation devices. Theoretically, given $N$ users and $M$ devices, we obtain a regret bound of $O((\text{\bf {MIU}}(T,K) + M)\frac{N^2}{M})$, where $\text{\bf {MIU}}(T,K)$ refers to the maximal incremental uncertainty up to time $T$ for the covariance matrix $K$. Empirically, we evaluate our algorithm on two applications of automatic model selection, and show that our algorithm significantly outperforms the strategy of serving users independently. Moreover, when multiple computation devices are available, we achieve near-linear speedup when the number of users is much larger than the number of devices.'
volume: 89
URL: http://proceedings.mlr.press/v89/yu19e.html
PDF: http://proceedings.mlr.press/v89/yu19e/yu19e.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-yu19e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Yu
given: Chen
- family: Karlaš
given: Bojan
- family: Zhong
given: Jie
- family: Zhang
given: Ce
- family: Liu
given: Ji
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2829-2838
id: yu19e
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2829
lastpage: 2838
published: 2019-04-11 00:00:00 +0000
- title: 'On Theory for BART'
abstract: 'Ensemble learning is a statistical paradigm built on the premise that many weak learners can perform exceptionally well when deployed collectively. The BART method of Chipman et al. (2010) is a prominent example of Bayesian ensemble learning, where each learner is a tree. Due to its impressive performance, BART has received a lot of attention from practitioners. Despite its wide popularity, however, theoretical studies of BART have begun emerging only very recently. Laying down foundation for the theoretical analysis of Bayesian forests, Rockova and van der Pas (2017) showed optimal posterior concentration under conditionally uniform tree priors. These priors deviate from the actual priors implemented in BART. Here, we study the exact BART prior and propose a simple modification so that it also enjoys optimality properties. To this end, we dive into the branching processes theory. We obtain tail bounds for the distribution of total progeny under heterogeneous Galton-Watson (GW) processes using their connection to random walks. We conclude with a result stating optimal rate of convergence for BART.'
volume: 89
URL: http://proceedings.mlr.press/v89/rockova19a.html
PDF: http://proceedings.mlr.press/v89/rockova19a/rockova19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-rockova19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ročková
given: Veronika
- family: Saha
given: Enakshi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2839-2848
id: rockova19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2839
lastpage: 2848
published: 2019-04-11 00:00:00 +0000
- title: 'Deep Topic Models for Multi-label Learning'
abstract: 'We present a probabilistic framework for multi-label learning based on a deep generative model for the binary label vector associated with each observation. Our generative model learns deep multi-layer latent embeddings of the binary label vector, which are conditioned on the input features of the observation. The model also has an interesting interpretation in terms of a deep topic model, with each label vector representing a bag-of-words document, with the input features being its meta-data. In addition to capturing the structural properties of the label space (e.g., a near-low-rank label matrix), the model also offers a clean, geometric interpretation. In particular, the nonlinear classification boundaries learned by the model can be seen as the union of multiple convex polytopes. Our model admits a simple and scalable inference via efficient Gibbs sampling or EM algorithm. We compare our model with state-of-the-art baselines for multi-label learning on benchmark data sets, and also report some interesting qualitative results.'
volume: 89
URL: http://proceedings.mlr.press/v89/panda19a.html
PDF: http://proceedings.mlr.press/v89/panda19a/panda19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-panda19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Panda
given: Rajat
- family: Pensia
given: Ankit
- family: Mehta
given: Nikhil
- family: Zhou
given: Mingyuan
- family: Rai
given: Piyush
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2849-2857
id: panda19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2849
lastpage: 2857
published: 2019-04-11 00:00:00 +0000
- title: 'On the Dynamics of Gradient Descent for Autoencoders'
abstract: 'We provide a series of results for unsupervised learning with autoencoders. Specifically, we study shallow two-layer autoencoder architectures with shared weights. We focus on three generative models for data that are common in statistical machine learning: (i) the mixture-of-gaussians model, (ii) the sparse coding model, and (iii) the sparsity model with non-negative coefficients. For each of these models, we prove that under suitable choices of hyperparameters, architectures, and initialization, autoencoders learned by gradient descent can successfully recover the parameters of the corresponding model. To our knowledge, this is the first result that rigorously studies the dynamics of gradient descent for weight-sharing autoencoders. Our analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as feature learning mechanisms for a variety of data models, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks.'
volume: 89
URL: http://proceedings.mlr.press/v89/nguyen19a.html
PDF: http://proceedings.mlr.press/v89/nguyen19a/nguyen19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-nguyen19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Nguyen
given: Thanh V.
- family: Wong
given: Raymond K. W.
- family: Hegde
given: Chinmay
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2858-2867
id: nguyen19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2858
lastpage: 2867
published: 2019-04-11 00:00:00 +0000
- title: 'Complexities in Projection-Free Stochastic Non-convex Minimization'
abstract: 'For constrained nonconvex minimization problems, we propose a meta stochastic projection-free optimization algorithm, named Normalized Frank Wolfe Updating, that can take any Gradient Estimator (GE) as input. For this algorithm, we prove its convergence rate, regardless of the choice of GE. Using a sophisticated GE, this algorithm can significantly improve the Stochastic First order Oracle (SFO) complexity. Further, a new second order GE strategy is proposed to incorporate curvature information, which enjoys theoretical advantage over the first order ones. Besides, this paper also provides a lower bound of Linear-optimization Oracle (LO) queried to achieve an approximate stationary point. Simulation studies validate our analysis under various parameter settings.'
volume: 89
URL: http://proceedings.mlr.press/v89/shen19b.html
PDF: http://proceedings.mlr.press/v89/shen19b/shen19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-shen19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Shen
given: Zebang
- family: Fang
given: Cong
- family: Zhao
given: Peilin
- family: Huang
given: Junzhou
- family: Qian
given: Hui
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2868-2876
id: shen19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2868
lastpage: 2876
published: 2019-04-11 00:00:00 +0000
- title: 'Differentiable Antithetic Sampling for Variance Reduction in Stochastic Variational Inference'
abstract: 'Stochastic optimization techniques are standard in variational inference algorithms. These methods estimate gradients by approximating expectations with independent Monte Carlo samples. In this paper, we explore a technique that uses correlated, but more representative, samples to reduce estimator variance. Specifically, we show how to generate antithetic samples that match sample moments with the true moments of an underlying importance distribution. Combining a differentiable antithetic sampler with modern stochastic variational inference, we showcase the effectiveness of this approach for learning a deep generative model. An implementation is available at https://github.com/mhw32/antithetic-vae-public.'
volume: 89
URL: http://proceedings.mlr.press/v89/wu19c.html
PDF: http://proceedings.mlr.press/v89/wu19c/wu19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-wu19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Wu
given: Mike
- family: Goodman
given: Noah
- family: Ermon
given: Stefano
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2877-2886
id: wu19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2877
lastpage: 2886
published: 2019-04-11 00:00:00 +0000
- title: 'Efficient Greedy Coordinate Descent for Composite Problems'
abstract: 'Coordinate descent with random coordinate selection is the current state of the art for many large scale optimization problems. However, greedy selection of the steepest coordinate on smooth problems can yield convergence rates independent of the dimension $n$, requiring $n$ times fewer iterations. In this paper, we consider greedy updates that are based on subgradients for a class of non-smooth composite problems, including $L1$-regularized problems, SVMs and related applications. For these problems we provide (i) the first linear rates of convergence independent of $n$, and show that our greedy update rule provides speedups similar to those obtained in the smooth case. This was previously conjectured to be true for a stronger greedy coordinate selection strategy. Furthermore, we show that (ii) our new selection rule can be mapped to instances of maximum inner product search, allowing to leverage standard nearest neighbor algorithms to speed up the implementation. We demonstrate the validity of the approach through extensive numerical experiments.'
volume: 89
URL: http://proceedings.mlr.press/v89/karimireddy19a.html
PDF: http://proceedings.mlr.press/v89/karimireddy19a/karimireddy19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-karimireddy19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Karimireddy
given: Sai Praneeth
- family: Koloskova
given: Anastasia
- family: Stich
given: Sebastian U.
- family: Jaggi
given: Martin
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2887-2896
id: karimireddy19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2887
lastpage: 2896
published: 2019-04-11 00:00:00 +0000
- title: 'Decentralized Gradient Tracking for Continuous DR-Submodular Maximization'
abstract: 'In this paper, we focus on the continuous DR-submodular maximization over a network. By using the gradient tracking technique, two decentralized algorithms are proposed for deterministic and stochastic settings, respectively. The proposed methods attain the $\epsilon$-accuracy tight approximation ratio for monotone continuous DR-submodular functions in only $O(1/\epsilon)$ and $\tilde{O}(1/\epsilon)$ rounds of communication, respectively, which are superior to the state-of-the-art. Our numerical results show that the proposed methods outperform existing decentralized methods in terms of both computation and communication complexity.'
volume: 89
URL: http://proceedings.mlr.press/v89/xie19b.html
PDF: http://proceedings.mlr.press/v89/xie19b/xie19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-xie19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Xie
given: Jiahao
- family: Zhang
given: Chao
- family: Shen
given: Zebang
- family: Mi
given: Chao
- family: Qian
given: Hui
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2897-2906
id: xie19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2897
lastpage: 2906
published: 2019-04-11 00:00:00 +0000
- title: 'Adaptive Rao-Blackwellisation in Gibbs Sampling for Probabilistic Graphical Models'
abstract: 'Rao-Blackwellisation is a technique that provably improves the performance of Gibbs sampling by summing-out variables from the PGM. However, collapsing variables is computationally expensive, since it changes the PGM structure introducing factors whose size is dependent upon the Markov blanket of the variable. Therefore, collapsing out several variables jointly is typically intractable in arbitrary PGM structures. In this paper, we propose an adaptive approach for Rao-Blackwellisation, where we add parallel Markov chains defined over different collapsed PGM structures. The collapsed variables are chosen based on their convergence diagnostics. However, adding a new chain requires burn-in, thus wasting samples. To address this, we initialize the new chains from a mean field approximation for the distribution, that improves over time, thus reducing the burn-in period. Our experiments on several UAI benchmarks shows that our approach is more accurate than state-of-the-art inference systems such as Merlin that implements algorithms that have previously won the UAI inference challenge.'
volume: 89
URL: http://proceedings.mlr.press/v89/kelly19a.html
PDF: http://proceedings.mlr.press/v89/kelly19a/kelly19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kelly19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kelly
given: Craig
- family: Sarkhel
given: Somdeb
- family: Venugopal
given: Deepak
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2907-2915
id: kelly19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2907
lastpage: 2915
published: 2019-04-11 00:00:00 +0000
- title: 'Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems'
abstract: 'We study derivative-free methods for policy optimization over the class of linear policies. We focus on characterizing the convergence rate of a canonical stochastic, two-point, derivative-free method for linear-quadratic systems in which the initial state of the system is drawn at random. In particular, we show that for problems with effective dimension $D$, such a method converges to an $\epsilon$-approximate solution within $\widetilde{\mathcal{O}}(D/\epsilon)$ steps, with multiplicative pre-factors that are explicit lower-order polynomial terms in the curvature parameters of the problem. Along the way, we also derive stochastic zero-order rates for a class of non-convex optimization problems.'
volume: 89
URL: http://proceedings.mlr.press/v89/malik19a.html
PDF: http://proceedings.mlr.press/v89/malik19a/malik19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-malik19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Malik
given: Dhruv
- family: Pananjady
given: Ashwin
- family: Bhatia
given: Kush
- family: Khamaru
given: Koulik
- family: Bartlett
given: Peter
- family: Wainwright
given: Martin
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2916-2925
id: malik19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2916
lastpage: 2925
published: 2019-04-11 00:00:00 +0000
- title: 'Contrasting Exploration in Parameter and Action Space: A Zeroth-Order Optimization Perspective'
abstract: 'Black-box optimizers that explore in parameter space have often been shown to outperform more sophisticated action space exploration methods developed specifically for the reinforcement learning problem. We examine these black-box methods closely to identify situations in which they are worse than action space exploration methods and those in which they are superior. Through simple theoretical analyses, we prove that complexity of exploration in parameter space depends on the dimensionality of parameter space, while complexity of exploration in action space depends on both the dimensionality of action space and horizon length. This is also demonstrated empirically by comparing simple exploration methods on several model problems, including Contextual Bandit, Linear Regression and Reinforcement Learning in continuous control.'
volume: 89
URL: http://proceedings.mlr.press/v89/vemula19a.html
PDF: http://proceedings.mlr.press/v89/vemula19a/vemula19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-vemula19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Vemula
given: Anirudh
- family: Sun
given: Wen
- family: Bagnell
given: J.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2926-2935
id: vemula19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2926
lastpage: 2935
published: 2019-04-11 00:00:00 +0000
- title: 'Sampling from Non-Log-Concave Distributions via Variance-Reduced Gradient Langevin Dynamics'
abstract: 'We study stochastic variance reduction-based Langevin dynamic algorithms, SVRG-LD and SAGA-LD \citep{dubey2016variance}, for sampling from non-log-concave distributions. Under certain assumptions on the log density function, we establish the convergence guarantees of SVRG-LD and SAGA-LD in $2$-Wasserstein distance. More specifically, we show that both SVRG-LD and SAGA-LD require $ \tilde O\big(n+n^{3/4}/\epsilon^2 + n^{1/2}/\epsilon^4\big)\cdot \exp\big(\tilde O(d+\gamma)\big)$ stochastic gradient evaluations to achieve $\epsilon$-accuracy in $2$-Wasserstein distance, which outperforms the $ \tilde O\big(n/\epsilon^4\big)\cdot \exp\big(\tilde O(d+\gamma)\big)$ gradient complexity achieved by Langevin Monte Carlo Method \citep{raginsky2017non}. Experiments on synthetic data and real data back up our theory.'
volume: 89
URL: http://proceedings.mlr.press/v89/zou19a.html
PDF: http://proceedings.mlr.press/v89/zou19a/zou19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zou19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zou
given: Difan
- family: Xu
given: Pan
- family: Gu
given: Quanquan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2936-2945
id: zou19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2936
lastpage: 2945
published: 2019-04-11 00:00:00 +0000
- title: 'Graph to Graph: a Topology Aware Approach for Graph Structures Learning and Generation'
abstract: 'This paper is concerned with the problem of learning the mapping from one graph to another graph. Primarily, we focus on the issue of how to effectively learn the topology of the source graph and then decode it to form the topology of the target graph. We embed the topology of the graph into the states of nodes by exerting a topology constraint, which results in our Topology-Flow encoder. To decoder the encoded topology, we design a conditioned graph generation model with two edge generation options, which result in the Edge-Bernoulli decoder and the Edge-Connect decoder. Experimental results on the 10-nodes simple graph dataset illustrate the substantial progress of the proposed method. The MNIST digits skeleton mapping experiment also reveals the ability of our approach to discover different typologies.'
volume: 89
URL: http://proceedings.mlr.press/v89/sun19c.html
PDF: http://proceedings.mlr.press/v89/sun19c/sun19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-sun19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Sun
given: Mingming
- family: Li
given: Ping
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2946-2955
id: sun19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2946
lastpage: 2955
published: 2019-04-11 00:00:00 +0000
- title: 'Imitation-Regularized Offline Learning'
abstract: 'We study the problem of offline learning in automated decision systems under the contextual bandits model. We are given logged historical data consisting of contexts, (randomized) actions, and (nonnegative) rewards. A common goal is to evaluate what would happen if different actions were taken in the same contexts, so as to optimize the action policies accordingly. The typical approach to this problem, inverse probability weighted estimation (IPWE), requires logged action probabilities, which may be missing in practice due to engineering complications. Even when available, small action probabilities cause large uncertainty in IPWE, rendering the corresponding results insignificant. To solve both problems, we show how one can use policy improvement (PIL) objectives, regularized by policy imitation (IML). We motivate and analyze PIL as an extension to Clipped-IPWE, by showing that both are lower-bound surrogates to the vanilla IPWE. We also formally connect IML to IPWE variance estimation and natural policy gradients. Without probability logging, our PIL-IML interpretations justify and improve, by reward-weighting, the state-of-art cross-entropy (CE) loss that predicts the action items among all action candidates available in the same contexts. With probability logging, our main theoretical contribution connects IML-underfitting to the existence of either confounding variables or model misspecification. We show the value and accuracy of our insights by simulations based on Simpson’s paradox, standard UCI multiclass-to-bandit conversions and on the Criteo counterfactual analysis challenge dataset.'
volume: 89
URL: http://proceedings.mlr.press/v89/ma19b.html
PDF: http://proceedings.mlr.press/v89/ma19b/ma19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ma19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ma
given: Yifei
- family: Wang
given: Yu-Xiang
- family: Narayanaswamy
given: Balakrishnan
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2956-2965
id: ma19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2956
lastpage: 2965
published: 2019-04-11 00:00:00 +0000
- title: 'A maximum-mean-discrepancy goodness-of-fit test for censored data'
abstract: 'We introduce a kernel-based goodness-of-fit test for censored data, where observations may be missing in random time intervals: a common occurrence in clinical trials and industrial life-testing. The test statistic is straightforward to compute, as is the test threshold, and we establish consistency under the null. Unlike earlier approaches such as the Log-rank test, we make no assumptions as to how the data distribution might differ from the null, and our test has power against a very rich class of alternatives. In experiments, our test outperforms competing approaches for periodic and Weibull hazard functions (where risks are time dependent), and does not show the failure modes of tests that rely on user defined features. Moreover, in cases where classical tests are provably most powerful, our test performs almost as well, while being more general.'
volume: 89
URL: http://proceedings.mlr.press/v89/fernandez19a.html
PDF: http://proceedings.mlr.press/v89/fernandez19a/fernandez19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-fernandez19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Fernandez
given: Tamara
- family: Gretton
given: Arthur
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2966-2975
id: fernandez19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2966
lastpage: 2975
published: 2019-04-11 00:00:00 +0000
- title: 'Sobolev Descent'
abstract: 'We study a simplification of GAN training: the problem of transporting particles from a source to a target distribution. Starting from the Sobolev GAN critic, part of the gradient regularized GAN family, we show a strong relation with Optimal Transport (OT). Specifically with the less popular *dynamic* formulation of OT that finds a path of distributions from source to target minimizing a "kinetic energy". We introduce Sobolev descent that constructs similar paths by following gradient flows of a critic function in a kernel space or parametrized by a neural network. In the kernel version, we show convergence to the target distribution in the MMD sense. We show in theory and experiments that regularization has an important role in favoring smooth transitions between distributions, avoiding large gradients from the critic. This analysis in a simplified particle setting provides insight in paths to equilibrium in GANs.'
volume: 89
URL: http://proceedings.mlr.press/v89/mroueh19a.html
PDF: http://proceedings.mlr.press/v89/mroueh19a/mroueh19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-mroueh19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Mroueh
given: Youssef
- family: Sercu
given: Tom
- family: Raj
given: Anant
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2976-2985
id: mroueh19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2976
lastpage: 2985
published: 2019-04-11 00:00:00 +0000
- title: 'Learning the Structure of a Nonstationary Vector Autoregression'
abstract: 'We adapt graphical causal structure learning methods to apply to nonstationary time series data, specifically to processes that exhibit stochastic trends. We modify the likelihood component of the BIC score used by score-based search algorithms, such that it remains a consistent selection criterion for integrated or cointegrated processes. We use this modified score in conjunction with the SVAR-GFCI algorithm, which allows us to recover qualitative structural information about the underlying data-generating process even in the presence of latent (unmeasured) factors. We demonstrate our approach on both simulated and real macroeconomic data.'
volume: 89
URL: http://proceedings.mlr.press/v89/malinsky19a.html
PDF: http://proceedings.mlr.press/v89/malinsky19a/malinsky19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-malinsky19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Malinsky
given: Daniel
- family: Spirtes
given: Peter
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2986-2994
id: malinsky19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2986
lastpage: 2994
published: 2019-04-11 00:00:00 +0000
- title: 'Theoretical Analysis of Efficiency and Robustness of Softmax and Gap-Increasing Operators in Reinforcement Learning'
abstract: 'In this paper, we propose and analyze conservative value iteration, which unifies value iteration, soft value iteration, advantage learning, and dynamic policy programming. Our analysis shows that algorithms using a combination of gap-increasing and max operators are resilient to stochastic errors, but not to non-stochastic errors. In contrast, algorithms using a softmax operator without a gap-increasing operator are less susceptible to all types of errors, but may display poor asymptotic performance. Algorithms using a combination of gap-increasing and softmax operators are much more effective and may asymptotically outperform algorithms with the max operator. Not only do these theoretical results provide a deep understanding of various reinforcement learning algorithms, but they also highlight the effectiveness of gap-increasing operators, as well as the limitations of traditional greedy value updates by the max operator.'
volume: 89
URL: http://proceedings.mlr.press/v89/kozuno19a.html
PDF: http://proceedings.mlr.press/v89/kozuno19a/kozuno19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-kozuno19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Kozuno
given: Tadashi
- family: Uchibe
given: Eiji
- family: Doya
given: Kenji
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 2995-3003
id: kozuno19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 2995
lastpage: 3003
published: 2019-04-11 00:00:00 +0000
- title: 'A Fast Sampling Algorithm for Maximum Inner Product Search'
abstract: 'Maximum Inner Product Search (MIPS) has been recognized as an important operation for the inference phase of many machine learning algorithms, including matrix factorization, multi-class/multi-label prediction and neural networks. In this paper, we propose Sampling-MIPS, which is the first sampling based algorithm that can be applied to the MIPS problem on a set of general vectors with both positive and negative values. Our Sampling-MIPS algorithm is efficient in terms of both time and sample complexity. In particular, by designing a two-step sampling with alias table, Sampling-MIPS only requires constant time to draw a candidate. In addition, we show that the probability of candidate generation in our algorithm is consistent with the true ranking induced by the value of the corresponding inner products, and derive the sample complexity of Sampling-MIPS to obtain the true candidate. Furthermore, the algorithm can be easily extended to large problems with sparse candidate vectors. Experimental results on real and synthetic datasets show that Sampling-MIPS is consistently better than other previous approaches such as LSH-MIPS, PCA-MIPS and Diamond sampling approach.'
volume: 89
URL: http://proceedings.mlr.press/v89/ding19a.html
PDF: http://proceedings.mlr.press/v89/ding19a/ding19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ding19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: DING
given: QIN
- family: Yu
given: Hsiang-Fu
- family: Hsieh
given: Cho-Jui
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3004-3012
id: ding19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3004
lastpage: 3012
published: 2019-04-11 00:00:00 +0000
- title: 'Minimum Volume Topic Modeling'
abstract: 'We propose a new topic modeling procedure that takes advantage of the fact that the Latent Dirichlet Allocation (LDA) log-likelihood function is asymptotically equivalent to the logarithm of the volume of the topic simplex. This allows topic modeling to be reformulated as finding the probability simplex that minimizes its volume and encloses the documents that are represented as distributions over words. A convex relaxation of the minimum volume topic model optimization is proposed, and it is shown that the relaxed problem has the same global minimum as the original problem under the separability assumption and the sufficiently scattered assumption introduced by Arora et al. (2013) and Huang et al. (2016). A locally convergent alternating direction method of multipliers (ADMM) approach is introduced for solving the relaxed minimum volume problem. Numerical experiments illustrate the benefits of our approach in terms of computation time and topic recovery performance.'
volume: 89
URL: http://proceedings.mlr.press/v89/jang19a.html
PDF: http://proceedings.mlr.press/v89/jang19a/jang19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-jang19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Jang
given: Byoungwook
- family: Hero
given: Alfred
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3013-3021
id: jang19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3013
lastpage: 3021
published: 2019-04-11 00:00:00 +0000
- title: 'Binary Space Partitioning Forest'
abstract: 'The Binary Space Partitioning (BSP)-Tree process is proposed to produce flexible 2-D partition structures which are originally used as a Bayesian nonparametric prior for relational modelling. It can hardly be applied to other learning tasks such as regression trees because extending the BSP-Tree process to a higher dimensional space is nontrivial. This paper is the first attempt to extend the BSP-Tree process to a d-dimensional ($d>2$) space. We propose to generate a cutting hyperplane, which is assumed to be parallel to $d-2$ dimensions, to cut each node in the d-dimensional BSP-tree. By designing a subtle strategy to sample two free dimensions from d dimensions, the extended BSP-Tree process can inherit the essential self-consistency property from the original version. Based on the extended BSP-Tree process, an ensemble model, which is named the BSP-Forest, is further developed for regression tasks. Thanks to the retained self-consistency property, we can thus significantly reduce the geometric calculations in the inference stage. Compared to its counterpart, the Mondrian Forest, the BSP-Forest can achieve similar performance with fewer cuts due to its flexibility. The BSP-Forest also outperforms other (Bayesian) regression forests on a number of real-world data sets.'
volume: 89
URL: http://proceedings.mlr.press/v89/fan19b.html
PDF: http://proceedings.mlr.press/v89/fan19b/fan19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-fan19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Fan
given: Xuhui
- family: Li
given: Bin
- family: SIsson
given: Scott
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3022-3031
id: fan19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3022
lastpage: 3031
published: 2019-04-11 00:00:00 +0000
- title: 'Improved Semi-Supervised Learning with Multiple Graphs'
abstract: 'We present a new approach for graph based semi-supervised learning based on a multi-component extension to the Gaussian MRF model. This approach models the observations on the vertices as jointly Gaussian with an inverse covariance matrix that is a weighted linear combination of multiple matrices. Building on randomized matrix trace estimation and fast Laplacian solvers, we develop fast and efficient algorithms for computing the best-fit (maximum likelihood) model and the predicted labels using gradient descent. Our model is considerably simpler, with just tens of parameters, and a single hyperparameter, in contrast with state-of-the-art approaches using deep learning techniques. Our experiments on benchmark citation networks show that the best-fit model estimated by our algorithm leads to significant improvements on all datasets compared to baseline models. Further, our performance compares favorably with several state-of-the-art methods on these datasets, and is comparable with the best performances.'
volume: 89
URL: http://proceedings.mlr.press/v89/viswanathan19a.html
PDF: http://proceedings.mlr.press/v89/viswanathan19a/viswanathan19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-viswanathan19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Viswanathan
given: Krishnamurthy
- family: Sachdeva
given: Sushant
- family: Tomkins
given: Andrew
- family: Ravi
given: Sujith
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3032-3041
id: viswanathan19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3032
lastpage: 3041
published: 2019-04-11 00:00:00 +0000
- title: 'Optimizing over a Restricted Policy Class in MDPs'
abstract: 'We address the problem of finding an optimal policy in a Markov decision process (MDP) under a restricted policy class defined by the convex hull of a set of base policies. This problem is of great interest in applications in which a number of reasonably good (or safe) policies are already known and we are interested in optimizing in their convex hull. We first prove that solving this problem is NP-hard. We then propose an efficient algorithm that finds a policy whose performance is almost as good as that of the best convex combination of the base policies, under the assumption that the occupancy measures of the base policies have a large overlap. The running time of the proposed algorithm is linear in the number of states and polynomial in the number of base policies. A distinct advantage of the proposed algorithm is that, apart from the computation of the occupancy measures of the base policies, it does not need to interact with the environment during the optimization process. This is especially important (i) in problems that due to concerns such as safety, we are restricted in interacting with the environment only through the (safe) base policies, and (ii) in complex systems where estimating the value of a policy can be a time consuming process.'
volume: 89
URL: http://proceedings.mlr.press/v89/banijamali19a.html
PDF: http://proceedings.mlr.press/v89/banijamali19a/banijamali19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-banijamali19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Banijamali
given: Ershad
- family: Abbasi-Yadkori
given: Yasin
- family: Ghavamzadeh
given: Mohammad
- family: Vlassis
given: Nikos
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3042-3050
id: banijamali19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3042
lastpage: 3050
published: 2019-04-11 00:00:00 +0000
- title: 'Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate'
abstract: 'Stochastic Gradient Descent (SGD) is a central tool in machine learning. We prove that SGD converges to zero loss, even with a fixed (non-vanishing) learning rate — in the special case of homogeneous linear classifiers with smooth monotone loss functions, optimized on linearly separable data. Previous works assumed either a vanishing learning rate, iterate averaging, or loss assumptions that do not hold for monotone loss functions used for classification, such as the logistic loss. We prove our result on a fixed dataset, both for sampling with or without replacement. Furthermore, for logistic loss (and similar exponentially-tailed losses), we prove that with SGD the weight vector converges in direction to the $L_2$ max margin vector as $O(1/\log(t))$ for almost all separable datasets, and the loss converges as $O(1/t)$ — similarly to gradient descent. Lastly, we examine the case of a fixed learning rate proportional to the minibatch size. We prove that in this case, the asymptotic convergence rate of SGD (with replacement) does not depend on the minibatch size in terms of epochs, if the support vectors span the data. These results may suggest an explanation to similar behaviors observed in deep networks, when trained with SGD.'
volume: 89
URL: http://proceedings.mlr.press/v89/nacson19a.html
PDF: http://proceedings.mlr.press/v89/nacson19a/nacson19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-nacson19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Nacson
given: Mor Shpigel
- family: Srebro
given: Nathan
- family: Soudry
given: Daniel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3051-3059
id: nacson19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3051
lastpage: 3059
published: 2019-04-11 00:00:00 +0000
- title: 'Deep Switch Networks for Generating Discrete Data and Language'
abstract: 'Multilayer switch networks are proposed as artificial generators of high-dimensional discrete data (e.g., binary vectors, categorical data, natural language, network log files, and discrete-valued time series). Unlike deconvolution networks which generate continuous-valued data and which consist of upsampling filters and reverse pooling layers, multilayer switch networks are composed of adaptive switches which model conditional distributions of discrete random variables. An interpretable, statistical framework is introduced for training these nonlinear networks based on a maximum-likelihood objective function. To learn network parameters, stochastic gradient descent is applied to the objective, and is stable until convergence. This direct optimization does not involve back-propagation over separate encoder and decoder networks, or adversarial training of dueling networks. While training remains tractable for moderately sized networks, Markov-chain Monte Carlo (MCMC) approximations of gradients are derived for deep networks which contain latent variables. The statistical framework is evaluated on synthetic data, high-dimensional binary data of handwritten digits, and web-crawled natural language data. Aspects of the model’s framework such as interpretability, computational complexity, and generalization ability are discussed.'
volume: 89
URL: http://proceedings.mlr.press/v89/delgosha19a.html
PDF: http://proceedings.mlr.press/v89/delgosha19a/delgosha19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-delgosha19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Delgosha
given: Payam
- family: Goela
given: Naveen
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3060-3069
id: delgosha19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3060
lastpage: 3069
published: 2019-04-11 00:00:00 +0000
- title: 'A recurrent Markov state-space generative model for sequences'
abstract: 'While the Hidden Markov Model (HMM) is a versatile generative model of sequences capable of performing many exact inferences efficiently, it is not suited for capturing complex long-term structure in the data. Advanced state-space models based on Deep Neural Networks (DNN) overcome this limitation but cannot perform exact inferences. In this article, we present a new generative model for sequences that combines both aspects, the ability to perform exact inferences and the ability to model long-term structure, by augmenting the HMM with a deterministic, continuous state variable modeled through a Recurrent Neural Network. We empirically study the performance of the model on (i) synthetic data comparing it to the HMM, (ii) a supervised learning task in bioinformatics where it outperforms two DNN-based regressors and (iii) in the generative modeling of music where it outperforms many prominent DNN-based generative models.'
volume: 89
URL: http://proceedings.mlr.press/v89/ramachandran19a.html
PDF: http://proceedings.mlr.press/v89/ramachandran19a/ramachandran19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ramachandran19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ramachandran
given: Anand
- family: Lumetta
given: Steve
- family: Klee
given: Eric
- family: Chen
given: Deming
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3070-3079
id: ramachandran19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3070
lastpage: 3079
published: 2019-04-11 00:00:00 +0000
- title: 'A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects'
abstract: 'The do-calculus is a well-known deductive system for deriving connections between interventional and observed distributions, and has been proven complete for a number of important identifiability problems in causal inference. Nevertheless, as it is currently defined, the do-calculus is inapplicable to causal problems that involve complex nested counterfactuals which cannot be expressed in terms of the "do" operator. Such problems include analyses of path-specific effects and dynamic treatment regimes. In this paper we present the potential outcome calculus (po-calculus), a natural generalization of do-calculus for arbitrary potential outcomes. We thereby provide a bridge between identification approaches which have their origins in artificial intelligence and statistics, respectively. We use po-calculus to give a complete identification algorithm for conditional path-specific effects with applications to problems in mediation analysis and algorithmic fairness.'
volume: 89
URL: http://proceedings.mlr.press/v89/malinsky19b.html
PDF: http://proceedings.mlr.press/v89/malinsky19b/malinsky19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-malinsky19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Malinsky
given: Daniel
- family: Shpitser
given: Ilya
- family: Richardson
given: Thomas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3080-3088
id: malinsky19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3080
lastpage: 3088
published: 2019-04-11 00:00:00 +0000
- title: 'Adversarial Discrete Sequence Generation without Explicit NeuralNetworks as Discriminators'
abstract: 'This paper presents a novel approach to train GANs for discrete sequence generation without resorting to an explicit neural network as the discriminator. We show that when an alternative mini-max optimization procedure is performed for the value function where a closed form solution for the discriminator exists in the maximization step, it is equivalent to directly optimizing the Jenson-Shannon divergence (JSD) between the generator’s distribution and the empirical distribution over the training data without sampling from the generator, thus optimizing the JSD becomes computationally tractable to train the generator that generates sequences of discrete data. Extensive experiments on synthetic data and real-world tasks demonstrate significant improvements over existing methods to train GANs that generate discrete sequences.'
volume: 89
URL: http://proceedings.mlr.press/v89/li19g.html
PDF: http://proceedings.mlr.press/v89/li19g/li19g.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-li19g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Li
given: Zhongliang
- family: Xia
given: Tian
- family: Lou
given: Xingyu
- family: Xu
given: Kaihe
- family: Wang
given: Shaojun
- family: Xiao
given: Jing
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3089-3098
id: li19g
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3089
lastpage: 3098
published: 2019-04-11 00:00:00 +0000
- title: 'Adaptive Estimation for Approximate $k$-Nearest-Neighbor Computations'
abstract: 'Algorithms often carry out equally many computations for "easy" and "hard" problem instances. In particular, algorithms for finding nearest neighbors typically have the same running time regardless of the particular problem instance. In this paper, we consider the approximate $k$-nearest-neighbor problem, which is the problem of finding a subset of O(k) points in a given set of points that contains the set of $k$ nearest neighbors of a given query point. We propose an algorithm based on adaptively estimating the distances, and show that it is essentially optimal out of algorithms that are only allowed to adaptively estimate distances. We then demonstrate both theoretically and experimentally that the algorithm can achieve significant speedups relative to the naive method.'
volume: 89
URL: http://proceedings.mlr.press/v89/lejeune19a.html
PDF: http://proceedings.mlr.press/v89/lejeune19a/lejeune19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lejeune19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: LeJeune
given: Daniel
- family: Heckel
given: Reinhard
- family: Baraniuk
given: Richard
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3099-3107
id: lejeune19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3099
lastpage: 3107
published: 2019-04-11 00:00:00 +0000
- title: 'Model-Free Linear Quadratic Control via Reduction to Expert Prediction'
abstract: 'Model-free approaches for reinforcement learning (RL) and continuous control find policies based only on past states and rewards, without fitting a model of the system dynamics. They are appealing as they are general purpose and easy to implement; however, they also come with fewer theoretical guarantees than model-based RL. In this work, we present a new model-free algorithm for controlling linear quadratic (LQ) systems, and show that its regret scales as $O(T^{\xi+2/3})$ for any small $\xi>0$ if time horizon satisfies $T>C^{1/\xi}$ for a constant $C$. The algorithm is based on a reduction of control of Markov decision processes to an expert prediction problem. In practice, it corresponds to a variant of policy iteration with forced exploration, where the policy in each phase is greedy with respect to the average of all previous value functions. This is the first model-free algorithm for adaptive control of LQ systems that provably achieves sublinear regret and has a polynomial computation cost. Empirically, our algorithm dramatically outperforms standard policy iteration, but performs worse than a model-based approach.'
volume: 89
URL: http://proceedings.mlr.press/v89/abbasi-yadkori19a.html
PDF: http://proceedings.mlr.press/v89/abbasi-yadkori19a/abbasi-yadkori19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-abbasi-yadkori19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Abbasi-Yadkori
given: Yasin
- family: Lazic
given: Nevena
- family: Szepesvari
given: Csaba
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3108-3117
id: abbasi-yadkori19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3108
lastpage: 3117
published: 2019-04-11 00:00:00 +0000
- title: 'Preventing Failures Due to Dataset Shift: Learning Predictive Models That Transport'
abstract: 'Classical supervised learning produces unreliable models when training and target distributions differ, with most existing solutions requiring samples from the target domain. We propose a proactive approach which learns a relationship in the training domain that will generalize to the target domain by incorporating prior knowledge of aspects of the data generating process that are expected to differ as expressed in a causal selection diagram. Specifically, we remove variables generated by unstable mechanisms from the joint factorization to yield the Surgery Estimator—an interventional distribution that is invariant to the differences across environments. We prove that the surgery estimator finds stable relationships in strictly more scenarios than previous approaches which only consider conditional relationships, and demonstrate this in simulated experiments. We also evaluate on real world data for which the true causal diagram is unknown, performing competitively against entirely data-driven approaches.'
volume: 89
URL: http://proceedings.mlr.press/v89/subbaswamy19a.html
PDF: http://proceedings.mlr.press/v89/subbaswamy19a/subbaswamy19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-subbaswamy19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Subbaswamy
given: Adarsh
- family: Schulam
given: Peter
- family: Saria
given: Suchi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3118-3127
id: subbaswamy19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3118
lastpage: 3127
published: 2019-04-11 00:00:00 +0000
- title: 'Structured Robust Submodular Maximization: Offline and Online Algorithms'
abstract: 'Constrained submodular function maximization has been used in subset selection problems such as selection of most informative sensor locations. While these models have been quite popular, the solutions obtained via this approach are unstable to perturbations in data defining the submodular functions. Robust submodular maximization has been proposed as a richer model that aims to overcome this discrepancy as well as increase the modeling scope of submodular optimization. In this work, we consider robust submodular maximization with structured combinatorial constraints and give efficient algorithms with provable guarantees. Our approach is applicable to constraints defined by single or multiple matroids, knapsack as well as distributionally robust criteria. We consider both the offline setting where the data defining the problem is known in advance as well as the online setting where the input data is revealed over time. For the offline setting, we give a nearly optimal bi-criteria approximation algorithm that relies on new extensions of the classical greedy algorithm. For the online version of the problem, we give an algorithm that returns a bi-criteria solution with sub-linear regret.'
volume: 89
URL: http://proceedings.mlr.press/v89/anari19a.html
PDF: http://proceedings.mlr.press/v89/anari19a/anari19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-anari19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Anari
given: Nima
- family: Haghtalab
given: Nika
- family: Naor
given: Seffi
- family: Pokutta
given: Sebastian
- family: Singh
given: Mohit
- family: Torrico
given: Alfredo
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3128-3137
id: anari19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3128
lastpage: 3137
published: 2019-04-11 00:00:00 +0000
- title: 'Sample-Efficient Imitation Learning via Generative Adversarial Nets'
abstract: 'GAIL is a recent successful imitation learning architecture that exploits the adversarial training procedure introduced in GANs. Albeit successful at generating behaviours similar to those demonstrated to the agent, GAIL suffers from a high sample complexity in the number of interactions it has to carry out in the environment in order to achieve satisfactory performance. We dramatically shrink the amount of interactions with the environment necessary to learn well-behaved imitation policies, by up to several orders of magnitude. Our framework, operating in the model-free regime, exhibits a significant increase in sample-efficiency over previous methods by simultaneously a) learning a self-tuned adversarially-trained surrogate reward and b) leveraging an off-policy actor-critic architecture. We show that our approach is simple to implement and that the learned agents remain remarkably stable, as shown in our experiments that span a variety of continuous control tasks. Video visualisations available at: \url{https://youtu.be/-nCsqUJnRKU}.'
volume: 89
URL: http://proceedings.mlr.press/v89/blonde19a.html
PDF: http://proceedings.mlr.press/v89/blonde19a/blonde19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-blonde19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Blondé
given: Lionel
- family: Kalousis
given: Alexandros
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3138-3148
id: blonde19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3138
lastpage: 3148
published: 2019-04-11 00:00:00 +0000
- title: 'Probabilistic Multilevel Clustering via Composite Transportation Distance'
abstract: 'We propose a novel probabilistic approach to multilevel clustering problems based on composite transportation distance, which is a variant of transportation distance where the underlying metric is Kullback-Leibler divergence. Our method involves solving a joint optimization problem over spaces of probability measures to simultaneously discover grouping structures within groups and among groups. By exploiting the connection of our method to the problem of finding composite transportation barycenters, we develop fast and efficient optimization algorithms even for potentially large-scale multilevel datasets. Finally, we present experimental results with both synthetic and real data to demonstrate the efficiency and scalability of the proposed approach.'
volume: 89
URL: http://proceedings.mlr.press/v89/ho19a.html
PDF: http://proceedings.mlr.press/v89/ho19a/ho19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-ho19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Ho
given: Nhat
- family: Huynh
given: Viet
- family: Phung
given: Dinh
- family: Jordan
given: Michael
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3149-3157
id: ho19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3149
lastpage: 3157
published: 2019-04-11 00:00:00 +0000
- title: 'A General Framework for Multi-fidelity Bayesian Optimization with Gaussian Processes'
abstract: 'How can we efficiently gather information to optimize an unknown function, when presented with multiple, mutually dependent information sources with different costs? For example, when optimizing a physical system, intelligently trading off computer simulations and real-world tests can lead to significant savings. Existing multi-fidelity Bayesian optimization methods, such as multi-fidelity GP-UCB or Entropy Search-based approaches, either make simplistic assumptions on the interaction among different fidelities or use simple heuristics that lack theoretical guarantees. In this paper, we study multi-fidelity Bayesian optimization with complex structural dependencies among multiple outputs, and propose MF-MI-Greedy, a principled algorithmic framework for addressing this problem. In particular, we model different fidelities using additive Gaussian processes based on shared latent relationships with the target function. Then we use cost-sensitive mutual information gain for efficient Bayesian optimization. We propose a simple notion of regret which incorporates the varying cost of different fidelities, and prove that MF-MI-Greedy achieves low regret. We demonstrate the strong empirical performance of our algorithm on both synthetic and real-world datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/song19b.html
PDF: http://proceedings.mlr.press/v89/song19b/song19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-song19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Song
given: Jialin
- family: Chen
given: Yuxin
- family: Yue
given: Yisong
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3158-3167
id: song19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3158
lastpage: 3167
published: 2019-04-11 00:00:00 +0000
- title: 'Online Algorithm for Unsupervised Sensor Selection'
abstract: 'In many security and healthcare systems, the detection and diagnosis systems use a sequence of sensors/tests. Each test outputs a prediction of the latent state and carries an inherent cost. However, the correctness of the predictions cannot be evaluated due to unavailability of the ground-truth annotations. Our objective is to learn strategies for selecting a test that gives the best trade-off between accuracy and costs in such unsupervised sensor selection (USS) problems. Clearly, learning is feasible only if ground truth can be inferred (explicitly or implicitly) from the problem structure. It is observed that this happens if the problem satisfies the ’Weak Dominance’ (WD) property. We set up the USS problem as a stochastic partial monitoring problem and develop an algorithm with sub-linear regret under the WD property. We argue that our algorithm is optimal and evaluate its performance on problem instances generated from synthetic and real-world datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/verma19a.html
PDF: http://proceedings.mlr.press/v89/verma19a/verma19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-verma19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Verma
given: Arun
- family: Hanawal
given: Manjesh
- family: Szepesvari
given: Csaba
- family: Saligrama
given: Venkatesh
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3168-3176
id: verma19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3168
lastpage: 3176
published: 2019-04-11 00:00:00 +0000
- title: 'Best of many worlds: Robust model selection for online supervised learning'
abstract: 'We introduce algorithms for online, full-information prediction that are computationally efficient and competitive with contextual tree experts of unknown complexity, in both probabilistic and adversarial settings. We incorporate a novel probabilistic framework of structural risk minimization into existing adaptive algorithms and show that we can robustly learn not only the presence of stochastic structure when it exists, but also the correct model order. When the stochastic data is actually realized from a predictor in the model class considered, we obtain regret bounds that are competitive with the regret of an optimal algorithm that possesses strong side information about both the true model order and whether the process generating the data is stochastic or adversarial. In cases where the data does not arise from any of the models, our algorithm selects models of higher order as we play more rounds. We display empirically improved \textit{overall prediction error} over other adversarially robust approaches.'
volume: 89
URL: http://proceedings.mlr.press/v89/muthukumar19a.html
PDF: http://proceedings.mlr.press/v89/muthukumar19a/muthukumar19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-muthukumar19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Muthukumar
given: Vidya
- family: Ray
given: Mitas
- family: Sahai
given: Anant
- family: Bartlett
given: Peter
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3177-3186
id: muthukumar19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3177
lastpage: 3186
published: 2019-04-11 00:00:00 +0000
- title: 'Accelerating Imitation Learning with Predictive Models'
abstract: 'Sample efficiency is critical in solving real-world reinforcement learning problems where agent-environment interactions can be costly. Imitation learning from expert advice has proved to be an effective strategy for reducing the number of interactions required to train a policy. Online imitation learning, which interleaves policy evaluation and policy optimization, is a particularly effective technique with provable performance guarantees. In this work, we seek to further accelerate the convergence rate of online imitation learning, thereby making it more sample efficient. We propose two model-based algorithms inspired by Follow-the-Leader (FTL) with prediction: MoBIL-VI based on solving variational inequalities and MoBIL-Prox based on stochastic first-order updates. These two methods leverage a model to predict future gradients to speed up policy learning. When the model oracle is learned online, these algorithms can provably accelerate the best known convergence rate up to an order. Our algorithms can be viewed as a generalization of stochastic Mirror-Prox (Juditsky et al., 2011), and admit a simple constructive FTL-style analysis of performance.'
volume: 89
URL: http://proceedings.mlr.press/v89/cheng19a.html
PDF: http://proceedings.mlr.press/v89/cheng19a/cheng19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-cheng19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Cheng
given: Ching-An
- family: Yan
given: Xinyan
- family: Theodorou
given: Evangelos
- family: Boots
given: Byron
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3187-3196
id: cheng19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3187
lastpage: 3196
published: 2019-04-11 00:00:00 +0000
- title: 'Online Learning in Kernelized Markov Decision Processes'
abstract: 'We consider online learning for minimizing regret in unknown, episodic Markov decision processes (MDPs) with continuous states and actions. We develop variants of the UCRL and posterior sampling algorithms that employ non-parametric Gaussian process priors to generalize across the state and action spaces. When the transition and reward functions of the true MDP are members of the associated Reproducing Kernel Hilbert Spaces of functions induced by symmetric psd kernels, we show that the algorithms en-joy sublinear regret bounds. The bounds are in terms of explicit structural parameters of the kernels, namely a novel generalization of the information gain metric from kernelized bandit, and highlight the influence of transition and reward function structure on the learning performance. Our results are applicable to multi-dimensional state and action spaces with composite kernel structures, and generalize results from the literature on kernelized bandits, and the adaptive control of parametric linear dynamical systems with quadratic costs.'
volume: 89
URL: http://proceedings.mlr.press/v89/chowdhury19a.html
PDF: http://proceedings.mlr.press/v89/chowdhury19a/chowdhury19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-chowdhury19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Chowdhury
given: Sayak Ray
- family: Gopalan
given: Aditya
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3197-3205
id: chowdhury19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3197
lastpage: 3205
published: 2019-04-11 00:00:00 +0000
- title: 'Lifting high-dimensional non-linear models with Gaussian regressors'
abstract: 'We study the problem of recovering a structured signal $\mathbf{x}_0$ from high-dimensional data $\mathbf{y}_i=f(\mathbf{a}_i^T\mathbf{x}_0)$ for some nonlinear (and potentially unknown) link function $f$, when the regressors $\mathbf{a}_i$ are iid Gaussian. Brillinger (1982) showed that ordinary least-squares estimates $\mathbf{x}_0$ up to a constant of proportionality $\mu_\ell$, which depends on $f$. Recently, Plan & Vershynin (2015) extended this result to the high-dimensional setting deriving sharp error bounds for the generalized Lasso. Unfortunately, both least-squares and the Lasso fail to recover $\mathbf{x}_0$ when $\mu_\ell=0$. For example, this includes all even link functions. We resolve this issue by proposing and analyzing an alternative convex recovery method. In a nutshell, our method treats such link functions as if they were linear in a lifted space of higher-dimension. Interestingly, our error analysis captures the effect of both the nonlinearity and the problem’s geometry in a few simple summary parameters.'
volume: 89
URL: http://proceedings.mlr.press/v89/thrampoulidis19a.html
PDF: http://proceedings.mlr.press/v89/thrampoulidis19a/thrampoulidis19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-thrampoulidis19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Thrampoulidis
given: Christos
- family: Rawat
given: Ankit Singh
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3206-3215
id: thrampoulidis19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3206
lastpage: 3215
published: 2019-04-11 00:00:00 +0000
- title: 'Domain-Size Aware Markov Logic Networks'
abstract: 'Several domains in AI need to represent the relational structure as well as model uncertainty. Markov Logic is a powerful formalism which achieves this by attaching weights to formulas in finite first-order logic. Though Markov Logic Networks (MLNs) have been used for a wide variety of applications, a significant challenge remains that weights do not generalize well when training domain sizes are different from those seen during testing. In particular, it has been observed that marginal probabilities tend to extremes in the limit of increasing domain sizes. As the first contribution of our work, we further characterize the distribution and show that marginal probabilities tend to a constant independent of weights and not always to extremes as was previously observed. As our second contribution, we present a principled solution to this problem by defining Domain-size Aware Markov Logic Networks (DA-MLNs) which can be seen as re-parameterizing the MLNs after taking domain size into consideration. For some simple but representative MLN formulas, we formally prove that probabilities defined by DA-MLNs are well behaved. On a practical side, DA-MLNs allow us to generalize the weights learned over small-sized training data to much larger domains. Experiments on three different benchmark MLNs show that our approach results in significant performance gains compared to existing methods.'
volume: 89
URL: http://proceedings.mlr.press/v89/mittal19a.html
PDF: http://proceedings.mlr.press/v89/mittal19a/mittal19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-mittal19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Mittal
given: Happy
- family: Bhardwaj
given: Ayush
- family: Gogate
given: Vibhav
- family: Singla
given: Parag
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3216-3224
id: mittal19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3216
lastpage: 3224
published: 2019-04-11 00:00:00 +0000
- title: 'Database Alignment with Gaussian Features'
abstract: 'We consider the problem of aligning a pair of databases with jointly Gaussian features. We consider two algorithms, complete database alignment via MAP estimation among all possible database alignments, and partial alignment via a thresholding approach of log likelihood ratios. We derive conditions on mutual information between feature pairs, identifying the regimes where the algorithms are guaranteed to perform reliably and those where they cannot be expected to succeed.'
volume: 89
URL: http://proceedings.mlr.press/v89/dai19b.html
PDF: http://proceedings.mlr.press/v89/dai19b/dai19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-dai19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Dai
given: Osman E.
- family: Cullina
given: Daniel
- family: Kiyavash
given: Negar
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3225-3233
id: dai19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3225
lastpage: 3233
published: 2019-04-11 00:00:00 +0000
- title: 'Size of Interventional Markov Equivalence Classes in random DAG models'
abstract: 'Directed acyclic graph (DAG) models are popular for capturing causal relationships. From observational and interventional data, a DAG model can only be determined up to its \emph{interventional Markov equivalence class} (I-MEC). We investigate the size of MECs for random DAG models generated by uniformly sampling and ordering an Erdős-Rényi graph. For constant density, we show that the expected $\log$ observational MEC size asymptotically (in the number of vertices) approaches a constant. We characterize I-MEC size in a similar fashion in the above settings with high precision. We show that the asymptotic expected number of interventions required to fully identify a DAG is a constant. These results are obtained by exploiting Meek rules and coupling arguments to provide sharp upper and lower bounds on the asymptotic quantities, which are then calculated numerically up to high precision. Our results have important consequences for experimental design of interventions and the development of algorithms for causal inference.'
volume: 89
URL: http://proceedings.mlr.press/v89/katz19a.html
PDF: http://proceedings.mlr.press/v89/katz19a/katz19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-katz19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Katz
given: Dmitriy
- family: Shanmugam
given: Karthikeyan
- family: Squires
given: Chandler
- family: Uhler
given: Caroline
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3234-3243
id: katz19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3234
lastpage: 3243
published: 2019-04-11 00:00:00 +0000
- title: 'Reparameterizing Distributions on Lie Groups'
abstract: 'Reparameterizable densities are an important way to learn probability distributions in a deep learning setting. For many distributions it is possible to create low-variance gradient estimators by utilizing a ‘reparameterization trick’. Due to the absence of a general reparameterization trick, much research has recently been devoted to extend the number of reparameterizable distributional families. Unfortunately, this research has primarily focused on distributions defined in Euclidean space, ruling out the usage of one of the most influential class of spaces with non-trivial topologies: Lie groups. In this work we define a general framework to create reparameterizable densities on arbitrary Lie groups, and provide a detailed practitioners guide to further the ease of usage. We demonstrate how to create complex and multimodal distributions on the well known oriented group of 3D rotations, SO{3}, using normalizing flows. Our experiments on applying such distributions in a Bayesian setting for pose estimation on objects with discrete and continuous symmetries, showcase their necessity in achieving realistic uncertainty estimates.'
volume: 89
URL: http://proceedings.mlr.press/v89/falorsi19a.html
PDF: http://proceedings.mlr.press/v89/falorsi19a/falorsi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-falorsi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Falorsi
given: Luca
- family: de Haan
given: Pim
- family: Davidson
given: Tim R.
- family: Forré
given: Patrick
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3244-3253
id: falorsi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3244
lastpage: 3253
published: 2019-04-11 00:00:00 +0000
- title: 'Revisit Batch Normalization: New Understanding and Refinement via Composition Optimization'
abstract: 'Batch Normalization (BN) has been used extensively in deep learning to achieve faster training process and better resulting models. However, whether BN works strongly depends on how the batches are constructed during training, and it may not converge to a desired solution if the statistics on the batch are not close to the statistics over the whole dataset. In this paper, we try to understand BN from an optimization perspective by providing an explicit objective function associated with BN. This explicit objective function reveals that: 1) BN, rather than being a new optimization algorithm or trick, is creating a different objective function instead of the one in our common sense; and 2) why BN may not work well in some scenarios. We then propose a refinement of BN based on the compositional optimization technique called Full Normalization (FN) to alleviate the issues of BN when the batches are not constructed ideally. The convergence analysis and empirical study for FN are also included in this paper.'
volume: 89
URL: http://proceedings.mlr.press/v89/lian19a.html
PDF: http://proceedings.mlr.press/v89/lian19a/lian19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-lian19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Lian
given: Xiangru
- family: Liu
given: Ji
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3254-3263
id: lian19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3254
lastpage: 3263
published: 2019-04-11 00:00:00 +0000
- title: 'Multi-Order Information for Working Set Selection of Sequential Minimal Optimization'
abstract: 'A new working set selection method for sequential minimal optimization (SMO) is proposed in this paper. Instead of the method adopted in the current version of LIBSVM, which uses the second order information of the objective function to choose the violating pairs, we suggest a new method where a higher order information is considered. It includes the descent degree of the objective function and the stride of variables update. Many experimental results show, in contrast to LIBSVM, the number of iterations obtained by the proposed method is less in the vast majority of cases and the training of support vector machines (SVMs) is sped up. Meanwhile, the convergence of the proposed approach can be guaranteed and its accuracy is at the same level as LIBSVM’s.'
volume: 89
URL: http://proceedings.mlr.press/v89/yang19b.html
PDF: http://proceedings.mlr.press/v89/yang19b/yang19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-yang19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Yang
given: Qimao
- family: Li
given: Changrong
- family: Guo
given: Jun
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3264-3272
id: yang19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3264
lastpage: 3272
published: 2019-04-11 00:00:00 +0000
- title: 'Harmonizable mixture kernels with variational Fourier features'
abstract: 'The expressive power of Gaussian processes depends heavily on the choice of kernel. In this work we propose the novel harmonizable mixture kernel (HMK), a family of expressive, interpretable, non-stationary kernels derived from mixture models on the generalized spectral representation. As a theoretically sound treatment of non-stationary kernels, HMK supports harmonizable covariances, a wide subset of kernels including all stationary and many non-stationary covariances. We also propose variational Fourier features, an inter-domain sparse GP inference framework that offers a representative set of ’inducing frequencies’. We show that harmonizable mixture kernels interpolate between local patterns, and that variational Fourier features offers a robust kernel learning framework for the new kernel family.'
volume: 89
URL: http://proceedings.mlr.press/v89/shen19c.html
PDF: http://proceedings.mlr.press/v89/shen19c/shen19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-shen19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Shen
given: Zheyang
- family: Heinonen
given: Markus
- family: Kaski
given: Samuel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3273-3282
id: shen19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3273
lastpage: 3282
published: 2019-04-11 00:00:00 +0000
- title: 'Multiscale Gaussian Process Level Set Estimation'
abstract: 'In this paper, the problem of estimating the level set of a black-box function from noisy and expensive evaluation queries is considered. A new algorithm for this problem in the Bayesian framework with a Gaussian Process (GP) prior is proposed. The proposed algorithm employs a hierarchical sequence of partitions to explore different regions of the search space at varying levels of detail depending upon their proximity to the level set boundary. It is shown that this approach results in the algorithm having a low complexity implementation whose computational cost is significantly smaller than the existing algorithms for higher dimensional search space $\X$. Furthermore, high probability bounds on a measure of discrepancy between the estimated level set and the true level set for the the proposed algorithm are obtained, which are shown to be strictly better than the existing guarantees for a large class of GPs.In the process, a tighter characterization of the information gain of the proposed algorithm is obtained which takes into account the structured nature of the evaluation points. This approach improves upon the existing technique of bounding the information gain with maximum information gain.'
volume: 89
URL: http://proceedings.mlr.press/v89/shekhar19a.html
PDF: http://proceedings.mlr.press/v89/shekhar19a/shekhar19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-shekhar19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Shekhar
given: Shubhanshu
- family: Javidi
given: Tara
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3283-3291
id: shekhar19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3283
lastpage: 3291
published: 2019-04-11 00:00:00 +0000
- title: 'The LORACs Prior for VAEs: Letting the Trees Speak for the Data'
abstract: 'In variational autoencoders, the prior on the latent codes $z$ is often treated as an afterthought, but the prior shapes the kind of latent representation that the model learns. If the goal is to learn a representation that is interpretable and useful, then the prior should reflect the ways in which the high-level factors that describe the data vary. The “default” prior is a standard normal, but if the natural factors of variation in the dataset exhibit discrete structure or are not independent, then the isotropic-normal prior will actually encourage learning representations that \emph{mask} this structure. To alleviate this problem, we propose using a flexible Bayesian nonparametric hierarchical clustering prior based on the time-marginalized coalescent (TMC). To scale learning to large datasets, we develop a new inducing-point approximation and inference algorithm. We then apply the method without supervision to several datasets and examine the interpretability and practical performance of the inferred hierarchies and learned latent space.'
volume: 89
URL: http://proceedings.mlr.press/v89/vikram19a.html
PDF: http://proceedings.mlr.press/v89/vikram19a/vikram19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-vikram19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Vikram
given: Sharad
- family: Hoffman
given: Matthew D.
- family: Johnson
given: Matthew J.
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3292-3301
id: vikram19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3292
lastpage: 3301
published: 2019-04-11 00:00:00 +0000
- title: 'Adversarial Learning of a Sampler Based on an Unnormalized Distribution'
abstract: 'Fundamental aspects of adversarial learning are investigated, with learning based on samples from the target distribution (conventional GAN setup). With insights so garnered, adversarial learning is extended to the case for which one has access to an unnormalized form $u(x)$ of the target density function, but no samples. Further, new concepts in GAN regularization are developed, based on learning from samples or from $u(x)$. The proposed method is compared to alternative approaches, with encouraging results demonstrated across a range of applications, including deep soft Q-learning.'
volume: 89
URL: http://proceedings.mlr.press/v89/li19h.html
PDF: http://proceedings.mlr.press/v89/li19h/li19h.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-li19h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Li
given: Chunyuan
- family: Bai
given: Ke
- family: Li
given: Jianqiao
- family: Wang
given: Guoyin
- family: Chen
given: Changyou
- family: Carin
given: Lawrence
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3302-3311
id: li19h
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3302
lastpage: 3311
published: 2019-04-11 00:00:00 +0000
- title: 'Active Ranking with Subset-wise Preferences'
abstract: 'We consider the problem of probably approximately correct (PAC) ranking $n$ items by adaptively eliciting subset-wise preference feedback. At each round, the learner chooses a subset of $k$ items and observes stochastic feedback indicating preference information of the winner (most preferred) item of the chosen subset drawn according to a Plackett-Luce (PL) subset choice model unknown a priori. The objective is to identify an $\epsilon$-optimal ranking of the $n$ items with probability at least $1 - \delta$. When the feedback in each subset round is a single Plackett-Luce-sampled item, we show $(\epsilon, \delta)$-PAC algorithms with a sample complexity of $O\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right)$ rounds, which we establish as being order-optimal by exhibiting a matching sample complexity lower bound of $\Omega\left(\frac{n}{\epsilon^2} \ln \frac{n}{\delta} \right)$—this shows that there is essentially no improvement possible from the pairwise comparisons setting ($k = 2$). When, however, it is possible to elicit top-$m$ ($\leq k$) ranking feedback according to the PL model from each adaptively chosen subset of size $k$, we show that an $(\epsilon, \delta)$-PAC ranking sample complexity of $O\left(\frac{n}{m \epsilon^2} \ln \frac{n}{\delta} \right)$ is achievable with explicit algorithms, which represents an $m$-wise reduction in sample complexity compared to the pairwise case. This again turns out to be order-wise unimprovable across the class of symmetric ranking algorithms. Our algorithms rely on a novel {pivot trick} to maintain only $n$ itemwise score estimates, unlike $O(n^2)$ pairwise score estimates that has been used in prior work. We report results of numerical experiments that corroborate our findings.'
volume: 89
URL: http://proceedings.mlr.press/v89/saha19a.html
PDF: http://proceedings.mlr.press/v89/saha19a/saha19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-saha19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Saha
given: Aadirupa
- family: Gopalan
given: Aditya
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3312-3321
id: saha19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3312
lastpage: 3321
published: 2019-04-11 00:00:00 +0000
- title: 'Recovery Guarantees For Quadratic Tensors With Sparse Observations'
abstract: 'We consider the tensor completion problem of predicting the missing entries of a tensor. The commonly used CP model has a triple product form, but an alternate family of quadratic models which are the sum of pairwise products instead of a triple product have emerged from applications such as recommendation systems. Non-convex methods are the method of choice for learning quadratic models, and this work examines their sample complexity and error guarantee. Our main result is that with the number of samples being only linear in the dimension, all local minima of the mean squared error objective are global minima and recover the original tensor. We substantiate our theoretical results with experiments on synthetic and real-world data.'
volume: 89
URL: http://proceedings.mlr.press/v89/zhang19h.html
PDF: http://proceedings.mlr.press/v89/zhang19h/zhang19h.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-zhang19h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Zhang
given: Hongyang
- family: Sharan
given: Vatsal
- family: Charikar
given: Moses
- family: Liang
given: Yingyu
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3322-3332
id: zhang19h
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3322
lastpage: 3332
published: 2019-04-11 00:00:00 +0000
- title: 'Sample Efficient Graph-Based Optimization with Noisy Observations'
abstract: 'We study sample complexity of optimizing “hill-climbing friendly” functions defined on a graph under noisy observations. We define a notion of convexity, and we show that a variant of best-arm identification can find a near-optimal solution after a small number of queries that is independent of the size of the graph. For functions that have local minima and are nearly convex, we show a sample complexity for the classical simulated annealing under noisy observations. We show effectiveness of the greedy algorithm with restarts and the simulated annealing on problems of graph-based nearest neighbor classification as well as a web advertising application.'
volume: 89
URL: http://proceedings.mlr.press/v89/nguyen19b.html
PDF: http://proceedings.mlr.press/v89/nguyen19b/nguyen19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-nguyen19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Nguyen
given: Thanh Tan
- family: Shameli
given: Ali
- family: Abbasi-Yadkori
given: Yasin
- family: Rao
given: Anup
- family: Kveton
given: Branislav
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3333-3341
id: nguyen19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3333
lastpage: 3341
published: 2019-04-11 00:00:00 +0000
- title: 'Robustness Guarantees for Density Clustering'
abstract: 'Despite the practical relevance of density-based clustering algorithms, there is little understanding in its statistical robustness properties under possibly adversarial contamination of the input data. We show both robustness and consistency guarantees for a simple modification of the popular DBSCAN algorithm. We then give experimental results which suggest that this method may be relevant in practice.'
volume: 89
URL: http://proceedings.mlr.press/v89/jiang19a.html
PDF: http://proceedings.mlr.press/v89/jiang19a/jiang19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-jiang19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Jiang
given: Heinrich
- family: Jang
given: Jennifer
- family: Nachum
given: Ofir
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3342-3351
id: jiang19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3342
lastpage: 3351
published: 2019-04-11 00:00:00 +0000
- title: 'Fixing Mini-batch Sequences with Hierarchical Robust Partitioning'
abstract: 'We propose a general and efficient hierarchical robust partitioning framework to generate a deterministic sequence of mini-batches, one that offers assurances of being high quality, unlike a randomly drawn sequence. We compare our deterministically generated mini-batch sequences to randomly generated sequences; we show that, on a variety of deep learning tasks, the deterministic sequences significantly beat the mean and worst case performance of the random sequences, and often outperforms the best of the random sequences. Our theoretical contributions include a new algorithm for the robust submodular partition problem subject to cardinality constraints (which is used to construct mini-batch sequences), and show in general that the algorithm is fast and has good theoretical guarantees; we also show a more efficient hierarchical variant of the algorithm with similar guarantees under mild assumptions.'
volume: 89
URL: http://proceedings.mlr.press/v89/wang19e.html
PDF: http://proceedings.mlr.press/v89/wang19e/wang19e.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-wang19e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Wang
given: Shengjie
- family: Bai
given: Wenruo
- family: Lavania
given: Chandrashekhar
- family: Bilmes
given: Jeff
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3352-3361
id: wang19e
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3352
lastpage: 3361
published: 2019-04-11 00:00:00 +0000
- title: 'Multitask Metric Learning: Theory and Algorithm'
abstract: 'In this paper, we study the problem of multitask metric learning (mtML). We first examine the generalization bound of the regularized mtML formulation based on the notion of algorithmic stability, proving the convergence rate of mtML and revealing the trade-off between the tasks. Moreover, we also establish the theoretical connection between the mtML, single-task learning and pooling-task learning approaches. In addition, we present a novel boosting-based mtML (mt-BML) algorithm, which scales well with the feature dimension of the data. Finally, we also devise an efficient second-order Riemannian retraction operator which is tailored specifically to our mt-BML algorithm. It produces a low-rank solution of mtML to reduce the model complexity, and may also improve generalization performances. Extensive evaluations on several benchmark data sets verify the effectiveness of our learning algorithm.'
volume: 89
URL: http://proceedings.mlr.press/v89/wang19f.html
PDF: http://proceedings.mlr.press/v89/wang19f/wang19f.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-wang19f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Wang
given: Boyu
- family: Zhang
given: Hejia
- family: Liu
given: Peng
- family: Shen
given: Zebang
- family: Pineau
given: Joelle
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3362-3371
id: wang19f
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3362
lastpage: 3371
published: 2019-04-11 00:00:00 +0000
- title: 'Efficient Bayes Risk Estimation for Cost-Sensitive Classification'
abstract: 'In some real world applications, acquiring covariates for classification can be cost-intensive and should be limited as much as possible. For example, in the medical setting, a doctor cannot just perform all possible types of tests to classify whether the patient has diabetes or not. The decision of classifying or acquiring more covariates before classifying is dependent on the costs of new covariates and the expected optimal cost of misclassification (Bayes risk). However, estimating the latter is a formidable task due to the estimation of a high dimensional probability density and intractable integrals. In this work, we show that for linear classifiers this task can be considerably simplified, leading to a one dimensional integral for which we propose an efficient approximation. Experimental results on three datasets show consistent improvements over previously proposed methods for cost-sensitive classification. We also demonstrate that our proposed Bayes risk estimation procedure can benefit from additional unlabeled data which can be helpful when only small amount of labeled data is available.'
volume: 89
URL: http://proceedings.mlr.press/v89/andrade19a.html
PDF: http://proceedings.mlr.press/v89/andrade19a/andrade19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-andrade19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Andrade
given: Daniel
- family: Okajima
given: Yuzuru
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3372-3381
id: andrade19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3372
lastpage: 3381
published: 2019-04-11 00:00:00 +0000
- title: 'Interpreting Black Box Predictions using Fisher Kernels'
abstract: 'Research in both machine learning and psychology suggests that salient examples can help humans to interpret learning models. To this end, we take a novel look at black box interpretation of test predictions in terms of training examples. Our goal is to ask “which training examples are most responsible for a given set of predictions”? To answer this question, we make use of Fisher kernels as the defining feature embedding of each data point, combined with Sequential Bayesian Quadrature (SBQ) for efficient selection of examples. In contrast to prior work, our method is able to seamlessly handle any sized subset of test predictions in a principled way. We theoretically analyze our approach, providing novel convergence bounds for SBQ over discrete candidate atoms. Our approach recovers the application of influence functions for interpretability as a special case yielding novel insights from this connection. We also present applications of the proposed approach to three use cases: cleaning training data, fixing mislabeled examples and data summarization.'
volume: 89
URL: http://proceedings.mlr.press/v89/khanna19a.html
PDF: http://proceedings.mlr.press/v89/khanna19a/khanna19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-khanna19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Khanna
given: Rajiv
- family: Kim
given: Been
- family: Ghosh
given: Joydeep
- family: Koyejo
given: Sanmi
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3382-3390
id: khanna19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3382
lastpage: 3390
published: 2019-04-11 00:00:00 +0000
- title: 'Representation Learning on Graphs: A Reinforcement Learning Application'
abstract: 'In this work, we study value function approximation in reinforcement learning (RL) problems with high dimensional state or action spaces via a generalized version of representation policy iteration (RPI). We consider the limitations of proto-value functions (PVFs) at accurately approximating the value function in low dimensions and we highlight the importance of features learning for an improved low-dimensional value function approximation. Then, we adopt different representation learning algorithms on graphs to learn the basis functions that best represent the value function. We empirically show that node2vec, an algorithm for scalable feature learning in networks, and Graph Auto-Encoder constantly outperform the commonly used smooth proto-value functions in low-dimensional feature space.'
volume: 89
URL: http://proceedings.mlr.press/v89/madjiheurem19a.html
PDF: http://proceedings.mlr.press/v89/madjiheurem19a/madjiheurem19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-madjiheurem19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Madjiheurem
given: Sephora
- family: Toni
given: Laura
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3391-3399
id: madjiheurem19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3391
lastpage: 3399
published: 2019-04-11 00:00:00 +0000
- title: 'ABCD-Strategy: Budgeted Experimental Design for Targeted Causal Structure Discovery'
abstract: 'Determining the causal structure of a set of variables is critical for both scientific inquiry and decision-making. However, this is often challenging in practice due to limited interventional data. Given that randomized experiments are usually expensive to perform, we propose a general framework and theory based on optimal Bayesian experimental design to select experiments for targeted causal discovery. That is, we assume the experimenter is interested in learning some function of the unknown graph (e.g., all descendants of a target node) subject to design constraints such as limits on the number of samples and rounds of experimentation. While it is in general computationally intractable to select an optimal experimental design strategy, we provide a tractable implementation with provable guarantees on both approximation and optimization quality based on submodularity. We evaluate the efficacy of our proposed method on both synthetic and real datasets, thereby demonstrating that our method realizes considerable performance gains over baseline strategies such as random sampling.'
volume: 89
URL: http://proceedings.mlr.press/v89/agrawal19b.html
PDF: http://proceedings.mlr.press/v89/agrawal19b/agrawal19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-agrawal19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Agrawal
given: Raj
- family: Squires
given: Chandler
- family: Yang
given: Karren
- family: Shanmugam
given: Karthikeyan
- family: Uhler
given: Caroline
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3400-3409
id: agrawal19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3400
lastpage: 3409
published: 2019-04-11 00:00:00 +0000
- title: 'Batched Stochastic Bayesian Optimization via Combinatorial Constraints Design'
abstract: 'In many high-throughput experimental design settings, such as those common in biochemical engineering, batched queries are often more cost effective than one-by-one sequential queries. Furthermore, it is often not possible to directly choose items to query. Instead, the experimenter specifies a set of constraints that generates a library of possible items, which are then selected stochastically. Motivated by these considerations, we investigate \emph{Batched Stochastic Bayesian Optimization} (BSBO), a novel Bayesian optimization scheme for choosing the constraints in order to guide exploration towards items with greater utility. We focus on \emph{site-saturation mutagenesis}, a prototypical setting of BSBO in biochemical engineering, and propose a natural objective function for this problem. Importantly, we show that our objective function can be efficiently decomposed as a difference of submodular functions (DS), which allows us to employ DS optimization tools to greedily identify sets of constraints that increase the likelihood of finding items with high utility. Our experimental results show that our algorithm outperforms common heuristics on both synthetic and two real protein datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/yang19c.html
PDF: http://proceedings.mlr.press/v89/yang19c/yang19c.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-yang19c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Yang
given: Kevin K.
- family: Chen
given: Yuxin
- family: Lee
given: Alycia
- family: Yue
given: Yisong
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3410-3419
id: yang19c
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3410
lastpage: 3419
published: 2019-04-11 00:00:00 +0000
- title: 'Convergence of Gradient Descent on Separable Data'
abstract: 'We provide a detailed study on the implicit bias of gradient descent when optimizing loss functions with strictly monotone tails, such as the logistic loss, over separable datasets. We look at two basic questions: (a) what are the conditions on the tail of the loss function under which gradient descent converges in the direction of the $L_2$ maximum-margin separator? (b) how does the rate of margin convergence depend on the tail of the loss function and the choice of the step size? We show that for a large family of super-polynomial tailed losses, gradient descent iterates on linear networks of any depth converge in the direction of $L_2$ maximum-margin solution, while this does not hold for losses with heavier tails. Within this family, for simple linear models we show that the optimal rates with fixed step size is indeed obtained for the commonly used exponentially tailed losses such as logistic loss. However, with a fixed step size the optimal convergence rate is extremely slow as $1/\log(t)$, as also proved in Soudry et al (2018). For linear models with exponential loss, we further prove that the convergence rate could be improved to $\log (t) /\sqrt{t}$ by using aggressive step sizes that compensates for the rapidly vanishing gradients. Numerical results suggest this method might be useful for deep networks.'
volume: 89
URL: http://proceedings.mlr.press/v89/nacson19b.html
PDF: http://proceedings.mlr.press/v89/nacson19b/nacson19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-nacson19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Nacson
given: Mor Shpigel
- family: Lee
given: Jason
- family: Gunasekar
given: Suriya
- family: Savarese
given: Pedro Henrique Pamplona
- family: Srebro
given: Nathan
- family: Soudry
given: Daniel
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3420-3428
id: nacson19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3420
lastpage: 3428
published: 2019-04-11 00:00:00 +0000
- title: 'Structured Neural Topic Models for Reviews'
abstract: 'We present Variational Aspect-based Latent Topic Allocation (VALTA), a family of autoencoding topic models that learn aspect-based representations of reviews. VALTA defines a user-item encoder that maps bag-of-words vectors for combined reviews associated with each paired user and item onto structured embeddings, which in turn define per-aspect topic weights. We model individual reviews in a structured manner by inferring an aspect assignment for each sentence in a given review, where the per-aspect topic weights obtained by the user-item encoder serve to define a mixture over topics, conditioned on the aspect. The result is an autoencoding neural topic model for reviews, which can be trained in a fully unsupervised manner to learn topics that are structured into aspects. Experimental evaluation on large number of datasets demonstrates that aspects are interpretable, yield higher coherence scores than non-structured autoencoding topic model variants, and can be utilized to perform aspect-based comparison and genre discovery.'
volume: 89
URL: http://proceedings.mlr.press/v89/esmaeili19b.html
PDF: http://proceedings.mlr.press/v89/esmaeili19b/esmaeili19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-esmaeili19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Esmaeili
given: Babak
- family: Huang
given: Hongyi
- family: Wallace
given: Byron
- family: Meent
given: Jan-Willem van de
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3429-3439
id: esmaeili19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3429
lastpage: 3439
published: 2019-04-11 00:00:00 +0000
- title: 'Adaptive Minimax Regret against Smooth Logarithmic Losses over High-Dimensional l1-Balls via Envelope Complexity'
abstract: 'We develop a new theoretical framework, the envelope complexity, to analyze the minimax regret with logarithmic loss functions. Within the framework, we derive a Bayesian predictor that adaptively achieves the minimax regret over high-dimensional l1-balls within a factor of two. The prior is newly derived for achieving the minimax regret and called the spike-and-tails (ST) prior as it looks like. The resulting regret bound is so simple that it is completely determined with the smoothness of the loss function and the radius of the balls except with logarithmic factors, and it has a generalized form of existing regret/risk bounds.'
volume: 89
URL: http://proceedings.mlr.press/v89/miyaguchi19a.html
PDF: http://proceedings.mlr.press/v89/miyaguchi19a/miyaguchi19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-miyaguchi19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Miyaguchi
given: Kohei
- family: Yamanishi
given: Kenji
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3440-3448
id: miyaguchi19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3440
lastpage: 3448
published: 2019-04-11 00:00:00 +0000
- title: 'Low-Dimensional Density Ratio Estimation for Covariate Shift Correction'
abstract: 'Covariate shift is a prevalent setting for supervised learning in the wild when the training and test data are drawn from different time periods, different but related domains, or via different sampling strategies. This paper addresses a transfer learning setting, with covariate shift between source and target domains. Most existing methods for correcting covariate shift exploit density ratios of the features to reweight the source-domain data, and when the features are high-dimensional, the estimated density ratios may suffer large estimation variances, leading to poor performance of prediction under covariate shift. In this work, we investigate the dependence of covariate shift correction performance on the dimensionality of the features, and propose a correction method that finds a low-dimensional representation of the features, which takes into account feature relevant to the target $Y$, and exploits the density ratio of this representation for importance reweighting. We discuss the factors that affect the performance of our method, and demonstrate its capabilities on both pseudo-real data and real-world applications.'
volume: 89
URL: http://proceedings.mlr.press/v89/stojanov19a.html
PDF: http://proceedings.mlr.press/v89/stojanov19a/stojanov19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-stojanov19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Stojanov
given: Petar
- family: Gong
given: Mingming
- family: Carbonell
given: Jaime
- family: Zhang
given: Kun
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3449-3458
id: stojanov19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3449
lastpage: 3458
published: 2019-04-11 00:00:00 +0000
- title: 'Evaluating model calibration in classification'
abstract: 'Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their ability to represent uncertainty about predictions. In safety-critical applications, it is pivotal for a model to possess an adequate sense of uncertainty, which for probabilistic classifiers translates into outputting probability distributions that are consistent with the empirical frequencies observed from realized outcomes. A classifier with such a property is called calibrated. In this work, we develop a general theoretical calibration evaluation framework grounded in probability theory, and point out subtleties present in model calibration evaluation that lead to refined interpretations of existing evaluation techniques. Lastly, we propose new ways to quantify and visualize miscalibration in probabilistic classification, including novel multidimensional reliability diagrams.'
volume: 89
URL: http://proceedings.mlr.press/v89/vaicenavicius19a.html
PDF: http://proceedings.mlr.press/v89/vaicenavicius19a/vaicenavicius19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-vaicenavicius19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Vaicenavicius
given: Juozas
- family: Widmann
given: David
- family: Andersson
given: Carl
- family: Lindsten
given: Fredrik
- family: Roll
given: Jacob
- family: Schön
given: Thomas
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3459-3467
id: vaicenavicius19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3459
lastpage: 3467
published: 2019-04-11 00:00:00 +0000
- title: 'Towards Gradient Free and Projection Free Stochastic Optimization'
abstract: 'This paper focuses on the problem of \emph{constrained} \emph{stochastic} optimization. A zeroth order Frank-Wolfe algorithm is proposed, which in addition to the projection-free nature of the vanilla Frank-Wolfe algorithm makes it gradient free. Under convexity and smoothness assumption, we show that the proposed algorithm converges to the optimal objective function at a rate $O\left(1/T^{1/3}\right)$, where $T$ denotes the iteration count. In particular, the primal sub-optimality gap is shown to have a dimension dependence of $O\left(d^{1/3}\right)$, which is the best known dimension dependence among all zeroth order optimization algorithms with one directional derivative per iteration. For non-convex functions, we obtain the \emph{Frank-Wolfe} gap to be $O\left(d^{1/3}T^{-1/4}\right)$. Experiments on black-box optimization setups demonstrate the efficacy of the proposed algorithm.'
volume: 89
URL: http://proceedings.mlr.press/v89/sahu19a.html
PDF: http://proceedings.mlr.press/v89/sahu19a/sahu19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-sahu19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Sahu
given: Anit Kumar
- family: Zaheer
given: Manzil
- family: Kar
given: Soummya
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3468-3477
id: sahu19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3468
lastpage: 3477
published: 2019-04-11 00:00:00 +0000
- title: 'On Multi-Cause Approaches to Causal Inference with Unobserved Counfounding: Two Cautionary Failure Cases and A Promising Alternative'
abstract: 'Unobserved confounding is a central barrier to drawing causal inferences from observational data. Several authors have recently proposed that this barrier can be overcome in the case where one attempts to infer the effects of several variables simultaneously. In this paper, we present two simple, analytical counterexamples that challenge the general claims that are central to these approaches. We discuss some reasons for these failures and suggest directions for obtaining sufficient conditions for causal identifiaciton. Despite these negative results, we show that a simple modification to the multi-cause setting, incorporating a proxy or negative control variable, solves many of the problems highlighted by the examples, and suggest a way forward for causal inference with high-dimensional action spaces.'
volume: 89
URL: http://proceedings.mlr.press/v89/d-amour19a.html
PDF: http://proceedings.mlr.press/v89/d-amour19a/d-amour19a.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-d-amour19a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: D’Amour
given: Alexander
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3478-3486
id: d-amour19a
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3478
lastpage: 3486
published: 2019-04-11 00:00:00 +0000
- title: 'Data-Driven Approach to Multiple-Source Domain Adaptation'
abstract: 'A key problem in domain adaptation is determining what to transfer across different domains. We propose a data-driven method to represent these changes across multiple source domains and perform unsupervised domain adaptation. We assume that the joint distributions follow a specific generating process and have a small number of identifiable changing parameters, and develop a data-driven method to identify the changing parameters by learning low-dimensional representations of the changing class-conditional distributions across multiple source domains. The learned low-dimensional representations enable us to reconstruct the target-domain joint distribution from unlabeled target-domain data, and further enable predicting the labels in the target domain. We demonstrate the efficacy of this method by conducting experiments on synthetic and real datasets.'
volume: 89
URL: http://proceedings.mlr.press/v89/stojanov19b.html
PDF: http://proceedings.mlr.press/v89/stojanov19b/stojanov19b.pdf
edit: https://github.com/mlresearch/v89/edit/gh-pages/_posts/2019-04-11-stojanov19b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of Machine Learning Research'
publisher: 'PMLR'
author:
- family: Stojanov
given: Petar
- family: Gong
given: Mingming
- family: Carbonell
given: Jaime
- family: Zhang
given: Kun
editor:
- family: Chaudhuri
given: Kamalika
- family: Sugiyama
given: Masashi
page: 3487-3496
id: stojanov19b
issued:
date-parts:
- 2019
- 4
- 11
firstpage: 3487
lastpage: 3496
published: 2019-04-11 00:00:00 +0000