- title: 'Minimax Gaussian Classification & Clustering'
abstract: 'We present minimax bounds for classification and clustering error in the setting where covariates are drawn from a mixture of two isotropic Gaussian distributions. Here, we define clustering error in a discriminative fashion, demonstrating fundamental connections between classification (supervised) and clustering (unsupervised). For both classification and clustering, our lower bounds show that without enough samples, the best any classifier or clustering rule can do is close to random guessing. For classification, as part of our upper bound analysis, we show that Fisher''s linear discriminant achieves a fast minimax rate $\Theta(1/n)$ with enough samples $n$. For clustering, as part of our upper bound analysis, we show that a clustering rule constructed using principal component analysis achieves the minimax rate with enough samples. We also provide lower and upper bounds for the high-dimensional sparse setting where the dimensionality of the covariates $p$ is potentially larger than the number of samples $n$, but where the difference between the Gaussian means is sparse.
'
volume: 54
URL: http://proceedings.mlr.press/v54/li17a.html
PDF: http://proceedings.mlr.press/v54/li17a/li17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-li17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Tianyang
- family: Yi
given: Xinyang
- family: Carmanis
given: Constantine
- family: Ravikumar
given: Pradeep
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1-9
id: li17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1
lastpage: 9
published: 2017-04-10 00:00:00 +0000
- title: 'Conditions beyond treewidth for tightness of higher-order LP relaxations'
abstract: 'Linear programming (LP) relaxations are a popular method to attempt to find a most likely configuration of a discrete graphical model. If a solution to the relaxed problem is obtained at an integral vertex then the solution is guaranteed to be exact and we say that the relaxation is tight. We consider binary pairwise models and introduce new methods which allow us to demonstrate refined conditions for tightness of LP relaxations in the Sherali-Adams hierarchy. Our results include showing that for higher order LP relaxations, treewidth is not precisely the right way to characterize tightness. This work is primarily theoretical, with insights that can improve efficiency in practice.'
volume: 54
URL: http://proceedings.mlr.press/v54/rowland17a.html
PDF: http://proceedings.mlr.press/v54/rowland17a/rowland17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-rowland17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Rowland
given: Mark
- family: Pacchiano
given: Aldo
- family: Weller
given: Adrian
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 10-18
id: rowland17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 10
lastpage: 18
published: 2017-04-10 00:00:00 +0000
- title: 'Large-Scale Data-Dependent Kernel Approximation'
abstract: 'Learning a computationally efficient kernel from data is an important machine learning problem. The majority of kernels in the literature do not leverage the geometry of the data, and those that do are computationally infeasible for contemporary datasets. Recent advances in approximation techniques have expanded the applicability of the kernel methodology to scale linearly with the data size. Data-dependent kernels, which could leverage this computational advantage, have however not yet seen the benefit. Here we derive an approximate large-scale learning procedure for data-dependent kernels that is efficient and performs well in practice. We provide a Lemma that can be used to derive the asymptotic convergence of the approximation in the limit of infinite random features, and, under certain conditions, an estimate of the convergence speed. We empirically prove that our construction represents a valid, yet efficient approximation of the data-dependent kernel. For large-scale datasets of millions of datapoints, where the proposed method is now applicable for the first time, we notice a significant performance boost over both baselines consisting of data independent kernels and of kernel approximations, at comparable computational cost.'
volume: 54
URL: http://proceedings.mlr.press/v54/ionescu17a.html
PDF: http://proceedings.mlr.press/v54/ionescu17a/ionescu17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-ionescu17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ionescu
given: Catalin
- family: Popa
given: Alin
- family: Sminchisescu
given: Cristian
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 19-27
id: ionescu17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 19
lastpage: 27
published: 2017-04-10 00:00:00 +0000
- title: 'Clustering from Multiple Uncertain Experts'
abstract: 'Utilizing expert input often improves clustering performance. However in a knowledge discovery problem, ground truth is unknown even to an expert. Thus, instead of one expert, we solicit the opinion from multiple experts. The key question motivating this work is: which experts should be assigned higher weights when there is disagreement on whether to put a pair of samples in the same group? To model the uncertainty in constraints from different experts, we build a probabilistic model for pairwise constraints through jointly modeling each expert’s accuracy and the mapping from features to latent cluster assignments. After learning our probabilistic discriminative clustering model and accuracies of different experts, 1) samples that were not annotated by any expert can be clustered using the discriminative clustering model; and 2) experts with higher accuracies are automatically assigned higher weights in determining the latent cluster assignments. Experimental results on UCI benchmark datasets and a real-world disease subtyping dataset demonstrate that our proposed approach outperforms competing alternatives, including semi-crowdsourced clustering, semi-supervised clustering with constraints from majority voting, and consensus clustering.'
volume: 54
URL: http://proceedings.mlr.press/v54/chang17a.html
PDF: http://proceedings.mlr.press/v54/chang17a/chang17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-chang17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chang
given: Yale
- family: Chen
given: Junxiang
- family: Cho
given: Michael
- family: Castaldi
given: Peter
- family: Silverman
given: Ed
- family: Dy
given: Jennifer
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 28-36
id: chang17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 28
lastpage: 36
published: 2017-04-10 00:00:00 +0000
- title: 'Online Nonnegative Matrix Factorization with General Divergences'
abstract: 'We develop a unified and systematic framework for performing online nonnegative matrix factorization under a wide variety of important divergences. The online nature of our algorithms makes them particularly amenable to large-scale data. We prove that the sequence of learned dictionaries converges almost surely to the set of critical points of the expected loss function. Experimental results demonstrate the computational efficiency and outstanding performances of our algorithms on several real-life applications, including topic modeling, document clustering and foreground-background separation.'
volume: 54
URL: http://proceedings.mlr.press/v54/zhao17a.html
PDF: http://proceedings.mlr.press/v54/zhao17a/zhao17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-zhao17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhao
given: Renbo
- family: Tan
given: Vincent
- family: Xu
given: Huan
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 37-45
id: zhao17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 37
lastpage: 45
published: 2017-04-10 00:00:00 +0000
- title: 'ASAGA: Asynchronous Parallel SAGA'
abstract: 'We describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates. Through a novel perspective, we revisit and clarify a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently introduced “perturbed iterate” framework that resolves it. We thereby prove that ASAGA can obtain a theoretical linear speedup on multi-core systems even without sparsity assumptions. We present results of an implementation on a 40-core architecture illustrating the practical speedup as well as the hardware overhead.'
volume: 54
URL: http://proceedings.mlr.press/v54/leblond17a.html
PDF: http://proceedings.mlr.press/v54/leblond17a/leblond17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-leblond17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Leblond
given: Rémi
- family: Pedregosa
given: Fabian
- family: Lacoste-Julien
given: Simon
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 46-54
id: leblond17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 46
lastpage: 54
published: 2017-04-10 00:00:00 +0000
- title: 'Lower Bounds on Active Learning for Graphical Model Selection'
abstract: 'We consider the problem of estimating the underlying graph associated with a Markov random field, with the added twist that the decoding algorithm can iteratively choose which subsets of nodes to sample based on the previous samples, resulting in an active learning setting. Considering both Ising and Gaussian models, we provide algorithm-independent lower bounds for high-probability recovery within the class of degree-bounded graphs. Our main results are minimax lower bounds for the active setting that match the best known lower bounds for the passive setting, which in turn are known to be tight in several cases of interest. Our analysis is based on Fano’s inequality, along with novel mutual information bounds for the active learning setting, and the application of restricted graph ensembles. While we consider ensembles that are similar or identical to those used in the passive setting, we require different analysis techniques, with a key challenge being bounding a mutual information quantity associated with observed subsets of nodes, as opposed to full observations.'
volume: 54
URL: http://proceedings.mlr.press/v54/scarlett17a.html
PDF: http://proceedings.mlr.press/v54/scarlett17a/scarlett17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-scarlett17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Scarlett
given: Jonathan
- family: Cevher
given: Volkan
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 55-64
id: scarlett17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 55
lastpage: 64
published: 2017-04-10 00:00:00 +0000
- title: 'Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach'
abstract: 'We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions. We focus on the non-convex formulation, where any rank-r matrix $X ∈R^m x n$ is represented as $UV^T$, where $U ∈R^m x r$ and $V ∈R^n x r$. In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP. '
volume: 54
URL: http://proceedings.mlr.press/v54/park17a.html
PDF: http://proceedings.mlr.press/v54/park17a/park17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-park17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Park
given: Dohyung
- family: Kyrillidis
given: Anastasios
- family: Carmanis
given: Constantine
- family: Sanghavi
given: Sujay
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 65-74
id: park17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 65
lastpage: 74
published: 2017-04-10 00:00:00 +0000
- title: 'Sparse Accelerated Exponential Weights'
abstract: 'We consider the stochastic optimization problem where a convex function is minimized observing recursively the gradients. We introduce SAEW, a new procedure that accelerates exponential weights procedures with the slow rate $1/\sqrtT$ to procedures achieving the fast rate $1/T$. Under the strong convexity of the risk, we achieve the optimal rate of convergence for approximating sparse parameters in $R^d$. The acceleration is achieved by using successive averaging steps in an online fashion. The procedure also produces sparse estimators thanks to additional hard threshold steps. '
volume: 54
URL: http://proceedings.mlr.press/v54/gaillard17a.html
PDF: http://proceedings.mlr.press/v54/gaillard17a/gaillard17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-gaillard17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gaillard
given: Pierre
- family: Wintenberger
given: Olivier
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 75-82
id: gaillard17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 75
lastpage: 82
published: 2017-04-10 00:00:00 +0000
- title: 'On the Learnability of Fully-Connected Neural Networks'
abstract: 'Despite the empirical success of deep neural networks, there is limited theoretical understanding on the learnability of these models using a polynomial-time algorithm. In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results. We focus on $\ell_1$-regularized networks, where the $\ell_1$-norm of the incoming weights of every neuron is assumed to be bounded by a constant $B > 0$. Our first result shows that such networks are properly learnable in $\text{poly}(n,d,\exp(1/ε^2))$ time, where $n$ and $d$ are the sample size and the input dimension, and $ε> 0$ is the gap to optimality. The bound is achieved by repeatedly sampling over a low-dimensional manifold so as to ensure approximate optimality, but avoids the $\exp(d)$ cost of exhaustively searching over the parameter space. We also establish a hardness result showing that the exponential dependence on $1/ε$ is unavoidable unless $\bf RP = \bf NP$. Our second result shows that the exponential dependence on $1/ε$ can be avoided by exploiting the underlying structure of the data distribution. In particular, if the positive and negative examples can be separated with margin $γ> 0$ by an unknown neural network, then the network can be learned in $\text{poly}(n,d,1/ε)$ time. The bound is achieved by an ensemble method which uses the first algorithm as a weak learner. We further show that the separability assumption can be weakened to tolerate noisy labels. Finally, we show that the exponential dependence on $1/γ$ is unimprovable under a certain cryptographic assumption.'
volume: 54
URL: http://proceedings.mlr.press/v54/zhang17a.html
PDF: http://proceedings.mlr.press/v54/zhang17a/zhang17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-zhang17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhang
given: Yuchen
- family: Lee
given: Jason
- family: Wainwright
given: Martin
- family: Jordan
given: Michael I.
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 83-91
id: zhang17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 83
lastpage: 91
published: 2017-04-10 00:00:00 +0000
- title: 'An Information-Theoretic Route from Generalization in Expectation to Generalization in Probability'
abstract: 'One fundamental goal in any learning algorithm is to mitigate its risk for overfitting. Mathematically, this requires that the learning algorithm enjoys a small generalization risk, which is defined either in expectation or in probability. Both types of generalization are commonly used in the literature. For instance, generalization in expectation has been used to analyze algorithms, such as ridge regression and SGD, whereas generalization in probability is used in the VC theory, among others. Recently, a third notion of generalization has been studied, called uniform generalization, which requires that the generalization risk vanishes uniformly in expectation across all bounded parametric losses. It has been shown that uniform generalization is, in fact, equivalent to an information-theoretic stability constraint, and that it recovers classical results in learning theory. It is achievable under various settings, such as sample compression schemes, finite hypothesis spaces, finite domains, and differential privacy. However, the relationship between uniform generalization and concentration remained unknown. In this paper, we answer this question by proving that, while a generalization in expectation does not imply a generalization in probability, a uniform generalization in expectation does imply concentration. We establish a chain rule for the uniform generalization risk of the composition of hypotheses and use it to derive a large deviation bound. Finally, we prove that the bound is tight.'
volume: 54
URL: http://proceedings.mlr.press/v54/alabdulmohsin17a.html
PDF: http://proceedings.mlr.press/v54/alabdulmohsin17a/alabdulmohsin17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-alabdulmohsin17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Alabdulmohsin
given: Ibrahim
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 92-100
id: alabdulmohsin17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 92
lastpage: 100
published: 2017-04-10 00:00:00 +0000
- title: 'Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection'
abstract: 'In the Best-k-Arm problem, we are given n stochastic bandit arms, each associated with an unknown reward distribution. We are required to identify the k arms with the largest means by taking as few samples as possible. In this paper, we make progress towards a complete characterization of the instance-wise sample complexity bounds for the Best-k-Arm problem. On the lower bound side, we obtain a novel complexity term to measure the sample complexity that every Best-k-Arm instance requires. This is derived by an interesting and nontrivial reduction from the Best-1-Arm problem. We also provide an elimination-based algorithm that matches the instance-wise lower bound within doubly-logarithmic factors. The sample complexity of our algorithm strictly dominates the state-of-the-art for Best-k-Arm (module constant factors).'
volume: 54
URL: http://proceedings.mlr.press/v54/chen17a.html
PDF: http://proceedings.mlr.press/v54/chen17a/chen17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-chen17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Lijie
- family: Li
given: Jian
- family: Qiao
given: Mingda
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 101-110
id: chen17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 101
lastpage: 110
published: 2017-04-10 00:00:00 +0000
- title: 'Guaranteed Non-convex Optimization: Submodular Maximization over Continuous Domains'
abstract: 'Submodular continuous functions are a category of (generally) non-convex/non-concave functions with a wide spectrum of applications. We characterize these functions and demonstrate that they can be maximized efficiently with approximation guarantees. Specifically, i) We introduce the weak DR property that gives a unified characterization of submodularity for all set, integer-lattice and continuous functions; ii) for maximizing monotone DR-submodular continuous functions under general down-closed convex constraints, we propose a Frank-Wolfe variant with (1-1/e) approximation guarantee, and sub-linear convergence rate; iii) for maximizing general non-monotone submodular continuous functions subject to box constraints, we propose a DoubleGreedy algorithm with 1/3 approximation guarantee. Submodular continuous functions naturally find applications in various real-world settings, including influence and revenue maximization with continuous assignments, sensor energy management, facility location, etc. Experimental results show that the proposed algorithms efficiently generate superior solutions compared to baseline algorithms.'
volume: 54
URL: http://proceedings.mlr.press/v54/bian17a.html
PDF: http://proceedings.mlr.press/v54/bian17a/bian17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-bian17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bian
given: Andrew An
- family: Mirzasoleiman
given: Baharan
- family: Buhmann
given: Joachim
- family: Krause
given: Andreas
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 111-120
id: bian17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 111
lastpage: 120
published: 2017-04-10 00:00:00 +0000
- title: 'Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis'
abstract: 'A multi-way factor analysis model is introduced for tensor-variate data of any order. Each data item is represented as a (sparse) sum of Kruskal decompositions, a Kruskal- factor analysis (KFA). KFA is nonparametric and can infer both the tensor-rank of each dictionary atom and the number of dictionary atoms. The model is adapted for online learning, which allows dictionary learning on large data sets. After KFA is introduced, the model is extended to a deep convolutional tensor-factor analysis, supervised by a Bayesian SVM. The experiments section demonstrates the improvement of KFA over vectorized approaches (e.g., BPFA), tensor decompositions, and convolutional neural networks (CNN) in multi-way denoising, blind inpainting, and image classification. The improvement in PSNR for the inpainting results over other methods exceeds 1dB in several cases and we achieve state of the art results on Caltech101 image classification. '
volume: 54
URL: http://proceedings.mlr.press/v54/stevens17a.html
PDF: http://proceedings.mlr.press/v54/stevens17a/stevens17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-stevens17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Stevens
given: Andrew
- family: Pu
given: Yunchen
- family: Sun
given: Yannan
- family: Spell
given: Gregory
- family: Carin
given: Lawrence
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 121-129
id: stevens17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 121
lastpage: 129
published: 2017-04-10 00:00:00 +0000
- title: 'Consistent and Efficient Nonparametric Different-Feature Selection'
abstract: 'Two-sample feature selection is a ubiquitous problem in both scientific and engineering studies. We propose a feature selection method to find features that describe a difference in two probability distributions. The proposed method is nonparametric and does not assume any specific parametric models on data distributions. We show that the proposed method is computationally efficient and does not require any extra computation for model selection. Moreover, we prove that the proposed method provides a consistent estimator of features under mild conditions. Our experimental results show that the proposed method outperforms the current method with regard to both accuracy and computation time.'
volume: 54
URL: http://proceedings.mlr.press/v54/hara17a.html
PDF: http://proceedings.mlr.press/v54/hara17a/hara17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-hara17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hara
given: Satoshi
- family: Katsuki
given: Takayuki
- family: Yanagisawa
given: Hiroki
- family: Ono
given: Takafumi
- family: Okamoto
given: Ryo
- family: Takeuchi
given: Shigeki
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 130-138
id: hara17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 130
lastpage: 138
published: 2017-04-10 00:00:00 +0000
- title: 'Annular Augmentation Sampling'
abstract: 'The exponentially large sample space of general binary probabilistic models renders intractable standard operations such as exact marginalization, inference, and normalization. Typically, researchers deal with these distributions via deterministic approximations, the class of belief propagation methods being a prominent example. Comparatively, Markov Chain Monte Carlo methods have been significantly less used in this domain. In this work, we introduce an auxiliary variable MCMC scheme that samples from an annular augmented space, translating to a great circle path around the hypercube of the binary sample space. This annular augmentation sampler explores the sample space more effectively than coordinate-wise samplers and has no tunable parameters, leading to substantial performance gains in estimating quantities of interest in large binary models. We extend the method to incorporate into the sampler any existing mean-field approximation (such as from belief propagation), leading to further performance improvements. Empirically, we consider a range of large Ising models and an application to risk factors for heart disease.'
volume: 54
URL: http://proceedings.mlr.press/v54/fagan17a.html
PDF: http://proceedings.mlr.press/v54/fagan17a/fagan17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-fagan17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Fagan
given: Francois
- family: Bhandari
given: Jalaj
- family: Cunningham
given: John
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 139-147
id: fagan17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 139
lastpage: 147
published: 2017-04-10 00:00:00 +0000
- title: 'Less than a Single Pass: Stochastically Controlled Stochastic Gradient'
abstract: 'We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG). As a member of the SVRG family of algorithms, SCSG makes use of gradient estimates at two scales. Unlike most existing algorithms in this family, both the computation cost and the communication cost of SCSG do not necessarily scale linearly with the sample size n; indeed, these costs are independent of n when the target accuracy is small. An experimental evaluation of SCSG on the MNIST dataset shows that it can yield accurate results on this dataset on a single commodity machine with a memory footprint of only 2.6MB and only eight disk accesses.'
volume: 54
URL: http://proceedings.mlr.press/v54/lei17a.html
PDF: http://proceedings.mlr.press/v54/lei17a/lei17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-lei17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lei
given: Lihua
- family: Jordan
given: Michael
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 148-156
id: lei17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 148
lastpage: 156
published: 2017-04-10 00:00:00 +0000
- title: 'Learning Time Series Detection Models from Temporally Imprecise Labels'
abstract: 'In this paper, we consider a new low-quality label learning problem: learning time series detection models from temporally imprecise labels. In this problem, the data consist of a set of input time series, and supervision is provided by a sequence of noisy time stamps corresponding to the occurrence of positive class events. Such temporally imprecise labels commonly occur in areas like mobile health research where human annotators are tasked with labeling the occurrence of very short duration events. We propose a general learning framework for this problem that can accommodate different base classifiers and noise models. We present results on real mobile health data showing that the proposed framework significantly outperforms a number of alternatives including assuming that the label time stamps are noise-free, transforming the problem into the multiple instance learning framework, and learning on labels that were manually re-aligned.'
volume: 54
URL: http://proceedings.mlr.press/v54/adams17a.html
PDF: http://proceedings.mlr.press/v54/adams17a/adams17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-adams17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Adams
given: Roy
- family: Marlin
given: Ben
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 157-165
id: adams17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 157
lastpage: 165
published: 2017-04-10 00:00:00 +0000
- title: 'Learning Cost-Effective and Interpretable Treatment Regimes'
abstract: 'Decision makers, such as doctors and judges, make crucial decisions such as recommending treatments to patients, and granting bails to defendants on a daily basis. Such decisions typically involve weighting the potential benefits of taking an action against the costs involved. In this work, we aim to automate this task of learning cost-effective, interpretable and actionable treatment regimes. We formulate this as a problem of learning a decision list – a sequence of if-then-else rules – which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments. We propose a novel objective to construct a decision list which maximizes outcomes for the population, and minimizes overall costs. Since we do not observe the outcomes corresponding to counterfactual scenarios, we use techniques from causal inference literature to infer them. We model the problem of learning the decision list as a Markov Decision Process (MDP) and employ a variant of the Upper Confidence Bound for Trees (UCT) strategy which leverages customized checks for pruning the search space effectively. Experimental results on real world observational data capturing judicial bail decisions and treatment recommendations for asthma patients demonstrate the effectiveness of our approach.'
volume: 54
URL: http://proceedings.mlr.press/v54/lakkaraju17a.html
PDF: http://proceedings.mlr.press/v54/lakkaraju17a/lakkaraju17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-lakkaraju17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lakkaraju
given: Himabindu
- family: Rudin
given: Cynthia
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 166-175
id: lakkaraju17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 166
lastpage: 175
published: 2017-04-10 00:00:00 +0000
- title: 'Linear Thompson Sampling Revisited'
abstract: 'We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order $O(d^3/2\sqrtT)$ as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to \textitoptimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrtd$ regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.'
volume: 54
URL: http://proceedings.mlr.press/v54/abeille17a.html
PDF: http://proceedings.mlr.press/v54/abeille17a/abeille17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-abeille17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Abeille
given: Marc
- family: Lazaric
given: Alessandro
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 176-184
id: abeille17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 176
lastpage: 184
published: 2017-04-10 00:00:00 +0000
- title: 'A Sub-Quadratic Exact Medoid Algorithm'
abstract: 'We present a new algorithm, ‘trimed’ for obtaining the medoid of a set, that is the element of the set which minimises the mean distance to all other elements. The algorithm is shown to have, under certain assumptions, expected run time $O(N^(3/2))$ in $R^d$ where N is the set size, making it the first sub-quadratic exact medoid algorithm for $d > 1$. Experiments show that it performs very well on spatial network data, frequently requiring two orders of magnitude fewer distance calculations than state-of-the-art approximate algorithms. As an application, we show how trimed can be used as a component in an accelerated K-medoids algorithm, and then how it can be relaxed to obtain further computational gains with only a minor loss in cluster quality. '
volume: 54
URL: http://proceedings.mlr.press/v54/newling17a.html
PDF: http://proceedings.mlr.press/v54/newling17a/newling17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-newling17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Newling
given: James
- family: Fleuret
given: Francois
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 185-193
id: newling17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 185
lastpage: 193
published: 2017-04-10 00:00:00 +0000
- title: 'Minimax Density Estimation for Growing Dimension'
abstract: 'This paper presents minimax rates for density estimation when the data dimension $d$ is allowed to grow with the number of observations $n$ rather than remaining fixed as in previous analyses. We prove a non-asymptotic lower bound which gives the worst-case rate over standard classes of smooth densities, and we show that kernel density estimators achieve this rate. We also give oracle choices for the bandwidth and derive the fastest rate $d$ can grow with $n$ to maintain estimation consistency.'
volume: 54
URL: http://proceedings.mlr.press/v54/mcdonald17a.html
PDF: http://proceedings.mlr.press/v54/mcdonald17a/mcdonald17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-mcdonald17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: McDonald
given: Daniel
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 194-203
id: mcdonald17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 194
lastpage: 203
published: 2017-04-10 00:00:00 +0000
- title: 'Estimating Density Ridges by Direct Estimation of Density-Derivative-Ratios'
abstract: 'Estimation of \emphdensity ridges has been gathering a great deal of attention since it enables us to reveal lower-dimensional structures hidden in data. Recently, \emphsubspace constrained mean shift (SCMS) was proposed as a practical algorithm for density ridge estimation. A key technical ingredient in SCMS is to accurately estimate the ratios of the density derivatives to the density. SCMS takes a three-step approach for this purpose — first estimating the data density, then computing its derivatives, and finally taking their ratios. However, this three-step approach can be unreliable because a good density estimator does not necessarily mean a good density derivative estimator and division by an estimated density could significantly magnify the estimation error. To overcome these problems, we propose a novel method that directly estimates the ratios without going through density estimation and division. Our proposed estimator has an analytic-form solution and it can be computed efficiently. We further establish a non-parametric convergence bound for the proposed ratio estimator. Finally, based on this direct ratio estimator, we develop a practical algorithm for density ridge estimation and experimentally demonstrate its usefulness on a variety of datasets.'
volume: 54
URL: http://proceedings.mlr.press/v54/sasaki17a.html
PDF: http://proceedings.mlr.press/v54/sasaki17a/sasaki17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-sasaki17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sasaki
given: Hiroaki
- family: Kanamori
given: Takafumi
- family: Sugiyama
given: Masashi
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 204-212
id: sasaki17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 204
lastpage: 212
published: 2017-04-10 00:00:00 +0000
- title: 'Learning Theory for Conditional Risk Minimization'
abstract: 'In this work we study the learnability of stochastic processes with respect to the conditional risk, i.e. the existence of a learning algorithm that improves its next-step performance with the amount of observed data. We introduce a notion of pairwise discrepancy between conditional distributions at different times steps and show how certain properties of these discrepancies can be used to construct a successful learning algorithm. Our main results are two theorems that establish criteria for learnability for many classes of stochastic processes, including all special cases studied previously in the literature. '
volume: 54
URL: http://proceedings.mlr.press/v54/zimin17a.html
PDF: http://proceedings.mlr.press/v54/zimin17a/zimin17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-zimin17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zimin
given: Alexander
- family: Lampert
given: Christoph
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 213-222
id: zimin17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 213
lastpage: 222
published: 2017-04-10 00:00:00 +0000
- title: 'Near-optimal Bayesian Active Learning with Correlated and Noisy Tests'
abstract: ' We consider the Bayesian active learning and experimental design problem, where the goal is to learn the value of some unknown target variable through a sequence of informative, noisy tests. In contrast to prior work, we focus on the challenging, yet practically relevant setting where test outcomes can be conditionally dependent given the hidden target variable. Under such assumptions, common heuristics, such as greedily performing tests that maximize the reduction in uncertainty of the target, often perform poorly. We propose ECED, a novel, efficient active learning algorithm, and prove strong theoretical guarantees that hold with correlated, noisy tests. Rather than directly optimizing the prediction error, at each step, ECED picks the test that maximizes the gain in a surrogate objective, which takes into account the dependencies between tests. Our analysis relies on an information-theoretic auxiliary function to track the progress of ECED, and utilizes adaptive submodularity to attain the approximation bound. We demonstrate strong empirical performance of ECED on two problem instances, including a Bayesian experimental design task intended to distinguish among economic theories of how people make risky decisions, and an active preference learning task via pairwise comparisons.'
volume: 54
URL: http://proceedings.mlr.press/v54/chen17b.html
PDF: http://proceedings.mlr.press/v54/chen17b/chen17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-chen17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Yuxin
- family: Hassani
given: Hamed
- family: Krause
given: Andreas
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 223-231
id: chen17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 223
lastpage: 231
published: 2017-04-10 00:00:00 +0000
- title: 'Learning Nash Equilibrium for General-Sum Markov Games from Batch Data'
abstract: 'This paper addresses the problem of learning a Nash equilibrium in $γ$-discounted multiplayer general-sum Markov Games (MGs) in a batch setting. As the number of players increases in MG, the agents may either collaborate or team apart to increase their final rewards. One solution to address this problem is to look for a Nash equilibrium. Although, several techniques were found for the subcase of two-player zero-sum MGs, those techniques fail to find a Nash equilibrium in general-sum Markov Games. In this paper, we introduce a new definition of $ε$-Nash equilibrium in MGs which grasps the strategy’s quality for multiplayer games. We prove that minimizing the norm of two Bellman-like residuals implies to learn such an $ε$-Nash equilibrium. Then, we show that minimizing an empirical estimate of the $L_p$ norm of these Bellman-like residuals allows learning for general-sum games within the batch setting. Finally, we introduce a neural network architecture that successfully learns a Nash equilibrium in generic multiplayer general-sum turn-based MGs.'
volume: 54
URL: http://proceedings.mlr.press/v54/perolat17a.html
PDF: http://proceedings.mlr.press/v54/perolat17a/perolat17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-perolat17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Perolat
given: Julien
- family: Strub
given: Florian
- family: Piot
given: Bilal
- family: Pietquin
given: Olivier
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 232-241
id: perolat17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 232
lastpage: 241
published: 2017-04-10 00:00:00 +0000
- title: 'Distance Covariance Analysis'
abstract: 'We propose a dimensionality reduction method to identify linear projections that capture interactions between two or more sets of variables. The method, distance covariance analysis (DCA), can detect both linear and nonlinear relationships, and can take dependent variables into account. On previous testbeds and a new testbed that systematically assesses the ability to detect both linear and nonlinear interactions, DCA performs better than or comparable to existing methods, while being one of the fastest methods. To showcase the versatility of DCA, we also applied it to three different neurophysiological datasets.'
volume: 54
URL: http://proceedings.mlr.press/v54/cowley17a.html
PDF: http://proceedings.mlr.press/v54/cowley17a/cowley17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-cowley17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Cowley
given: Benjamin
- family: Semedo
given: Joao
- family: Zandvakili
given: Amin
- family: Smith
given: Matthew
- family: Kohn
given: Adam
- family: Yu
given: Byron
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 242-251
id: cowley17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 242
lastpage: 251
published: 2017-04-10 00:00:00 +0000
- title: 'Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation'
abstract: 'We propose a flexible convex relaxation for the phase retrieval problem that operates in the natural domain of the signal. Therefore, we avoid the prohibitive computational cost associated with “lifting” and semidefinite programming (SDP) in methods such as PhaseLift and compete with recently developed non-convex techniques for phase retrieval. We relax the quadratic equations for phaseless measurements to inequality constraints each of which representing a symmetric “slab”. Through a simple convex program, our proposed estimator finds an extreme point of the intersection of these slabs that is best aligned with a given anchor vector. We characterize geometric conditions that certify success of the proposed estimator. Furthermore, using classic results in statistical learning theory, we show that for random measurements the geometric certificates hold with high probability at an optimal sample complexity. Phase transition of our estimator is evaluated through simulations. Our numerical experiments also suggest that the proposed method can solve phase retrieval problems with coded diffraction measurements as well.'
volume: 54
URL: http://proceedings.mlr.press/v54/bahmani17a.html
PDF: http://proceedings.mlr.press/v54/bahmani17a/bahmani17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-bahmani17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bahmani
given: Sohail
- family: Romberg
given: Justin
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 252-260
id: bahmani17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 252
lastpage: 260
published: 2017-04-10 00:00:00 +0000
- title: 'Regret Bounds for Lifelong Learning'
abstract: 'We consider the problem of transfer learning in an online setting. Different tasks are presented sequentially and processed by a within-task algorithm. We propose a lifelong learning strategy which refines the underlying data representation used by the within-task algorithm, thereby transferring information from one task to the next. We show that when the within-task algorithm comes with some regret bound, our strategy inherits this good property. Our bounds are in expectation for a general loss function, and uniform for a convex loss. We discuss applications to dictionary learning and finite set of predictors. In the latter case, we improve previous $O(1/\sqrtm)$ bounds to $O(1/m)$, where $m$ is the per task sample size.'
volume: 54
URL: http://proceedings.mlr.press/v54/alquier17a.html
PDF: http://proceedings.mlr.press/v54/alquier17a/alquier17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-alquier17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Alquier
given: Pierre
- family: Mai
given: The Tien
- family: Pontil
given: Massimiliano
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 261-269
id: alquier17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 261
lastpage: 269
published: 2017-04-10 00:00:00 +0000
- title: 'Poisson intensity estimation with reproducing kernels'
abstract: 'Despite the fundamental nature of the Poisson process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches exist, especially in high dimensional settings. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) formulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. The modeling challenge is that the usual representer theorem arguments no longer apply due to the form of the inhomogeneous Poisson process likelihood. However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite-dimensional problem. The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets. '
volume: 54
URL: http://proceedings.mlr.press/v54/flaxman17a.html
PDF: http://proceedings.mlr.press/v54/flaxman17a/flaxman17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-flaxman17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Flaxman
given: Seth
- family: Teh
given: Yee Whye
- family: Sejdinovic
given: Dino
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 270-279
id: flaxman17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 270
lastpage: 279
published: 2017-04-10 00:00:00 +0000
- title: 'Generalized Pseudolikelihood Methods for Inverse Covariance Estimation'
abstract: 'We introduce PseudoNet, a new pseudolikelihood-based estimator of the inverse covariance matrix, that has a number of useful statistical and computational properties. We show, through detailed experiments with synthetic and also real-world finance as well as wind power data, that PseudoNet outperforms related methods in terms of estimation error and support recovery, making it well-suited for use in a downstream application, where obtaining low estimation error can be important. We also show, under regularity conditions, that PseudoNet is consistent. Our proof assumes the existence of accurate estimates of the diagonal entries of the underlying inverse covariance matrix; we additionally provide a two-step method to obtain these estimates, even in a high-dimensional setting, going beyond the proofs for related methods. Unlike other pseudolikelihood-based methods, we also show that PseudoNet does not saturate, i.e., in high dimensions, there is no hard limit on the number of nonzero entries in the PseudoNet estimate. We present a fast algorithm as well as screening rules that make computing the PseudoNet estimate over a range of tuning parameters tractable.'
volume: 54
URL: http://proceedings.mlr.press/v54/ali17a.html
PDF: http://proceedings.mlr.press/v54/ali17a/ali17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-ali17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ali
given: Alnur
- family: Khare
given: Kshitij
- family: Oh
given: Sang-Yun
- family: Rajaratnam
given: Bala
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 280-288
id: ali17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 280
lastpage: 288
published: 2017-04-10 00:00:00 +0000
- title: 'Removing Phase Transitions from Gibbs Measures'
abstract: 'Gibbs measures are a fundamental class of distributions for the analysis of high dimensional data. Phase transitions, which are also known as degeneracy in the network science literature, are an emergent property of these models that well describe many physical systems. However, the reach of the Gibbs measure is now far outside the realm of physical systems, and in many of these domains multiphase behavior is a nuisance. This nuisance often makes distribution fitting impossible due to failure of the MCMC sampler, and even when an MLE fit is possible, if the solution is near a phase transition point, the plausibility of the fit can be highly questionable. We introduce a modification to the Gibbs distribution that reduces the effects of phase transitions, and with properly chosen hyper-parameters, provably removes all multiphase behavior. We show that this new distribution is just as easy to fit via MCMCMLE as the Gibbs measure, and provide examples in the Ising model from statistical physics and ERGMs from network science.'
volume: 54
URL: http://proceedings.mlr.press/v54/fellows17a.html
PDF: http://proceedings.mlr.press/v54/fellows17a/fellows17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-fellows17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Fellows
given: Ian
- family: Handcock
given: Mark
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 289-297
id: fellows17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 289
lastpage: 297
published: 2017-04-10 00:00:00 +0000
- title: 'Performance Bounds for Graphical Record Linkage'
abstract: 'Record linkage involves merging records in large, noisy databases to remove duplicate entities. It has become an important area because of its widespread occurrence in bibliometrics, public health, official statistics production, political science, and beyond. Traditional linkage methods directly linking records to one another are computationally infeasible as the number of records grows. As a result, it is increasingly common for researchers to treat linkage as a clustering task, in which each latent entity is associated with one or more noisy database records. We critically assess performance bounds using the Kullback-Leibler (KL) divergence under a Bayesian record linkage framework, making connections to Kolchin partition models. We provide an upper bound using the KL divergence and a lower bound on the minimum probability of misclassifying a latent entity. We give insights for when our bounds hold using simulated data and provide practical user guidance. '
volume: 54
URL: http://proceedings.mlr.press/v54/steorts17a.html
PDF: http://proceedings.mlr.press/v54/steorts17a/steorts17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-steorts17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Steorts
given: Rebecca C.
- family: Barnes
given: Mattew
- family: Neiswanger
given: Willie
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 298-306
id: steorts17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 298
lastpage: 306
published: 2017-04-10 00:00:00 +0000
- title: 'Regret Bounds for Transfer Learning in Bayesian Optimisation'
abstract: 'This paper studies the regret bound of two transfer learning algorithms in Bayesian optimisation. The first algorithm models any difference between the source and target functions as a noise process. The second algorithm proposes a new way to model the difference between the source and target as a Gaussian process which is then used to adapt the source data. We show that in both cases the regret bounds are tighter than in the no transfer case. We also experimentally compare the performance of these algorithms relative to no transfer learning and demonstrate benefits of transfer learning.'
volume: 54
URL: http://proceedings.mlr.press/v54/shilton17a.html
PDF: http://proceedings.mlr.press/v54/shilton17a/shilton17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-shilton17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Shilton
given: Alistair
- family: Gupta
given: Sunil
- family: Rana
given: Santu
- family: Venkatesh
given: Svetha
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 307-315
id: shilton17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 307
lastpage: 315
published: 2017-04-10 00:00:00 +0000
- title: 'Scaling Submodular Maximization via Pruned Submodularity Graphs'
abstract: 'We propose a new random pruning method (called “submodular sparsification (SS)”) to reduce the cost of submodular maximization. The pruning is applied via a “submodularity graph” over the $n$ ground elements, where each directed edge is associated with a pairwise dependency defined by the submodular function. In each step, SS prunes a $1-1/\sqrt{c}$ (for $c>1$) fraction of the nodes using weights on edges computed based on only a small number ($O(\log n)$) of randomly sampled nodes. The algorithm requires $\log_\sqrt{c}n$ steps with a small and highly parallelizable per-step computation. An accuracy-speed tradeoff parameter $c$, set as $c = 8$, leads to a fast shrink rate $\sqrt{2}/4$ and small iteration complexity $\log_{2\sqrt{2}}n$. Analysis shows that w.h.p., the greedy algorithm on the pruned set of size $O(\log^2 n)$ can achieve a guarantee similar to that of processing the original dataset. In news and video summarization tasks, SS is able to substantially reduce both computational costs and memory usage, while maintaining (or even slightly exceeding) the quality of the original (and much more costly) greedy algorithm.'
volume: 54
URL: http://proceedings.mlr.press/v54/zhou17a.html
PDF: http://proceedings.mlr.press/v54/zhou17a/zhou17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-zhou17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhou
given: Tianyi
- family: Ouyang
given: Hua
- family: Bilmes
given: Jeff
- family: Chang
given: Yi
- family: Guestrin
given: Carlos
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 316-324
id: zhou17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 316
lastpage: 324
published: 2017-04-10 00:00:00 +0000
- title: 'Localized Lasso for High-Dimensional Regression'
abstract: 'We introduce the localized Lasso, which learns models that both are interpretable and have a high predictive power in problems with high dimensionality d and small sample size n. More specifically, we consider a function defined by local sparse models, one at each data point. We introduce sample-wise network regularization to borrow strength across the models, and sample-wise exclusive group sparsity (a.k.a., l12 norm) to introduce diversity into the choice of feature sets in the local models. The local models are interpretable in terms of similarity of their sparsity patterns. The cost function is convex, and thus has a globally optimal solution. Moreover, we propose a simple yet efficient iterative least-squares based optimization procedure for the localized Lasso, which does not need a tuning parameter, and is guaranteed to converge to a globally optimal solution. The solution is empirically shown to outperform alternatives for both simulated and genomic personalized/precision medicine data.'
volume: 54
URL: http://proceedings.mlr.press/v54/yamada17a.html
PDF: http://proceedings.mlr.press/v54/yamada17a/yamada17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-yamada17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Yamada
given: Makoto
- family: Koh
given: Takeuchi
- family: Iwata
given: Tomoharu
- family: Shawe-Taylor
given: John
- family: Kaski
given: Samuel
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 325-333
id: yamada17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 325
lastpage: 333
published: 2017-04-10 00:00:00 +0000
- title: 'Encrypted Accelerated Least Squares Regression'
abstract: 'Information that is stored in an encrypted format is, by definition, usually not amenable to statistical analysis or machine learning methods. In this paper we present detailed analysis of coordinate and accelerated gradient descent algorithms which are capable of fitting least squares and penalised ridge regression models, using data encrypted under a fully homomorphic encryption scheme. Gradient descent is shown to dominate in terms of encrypted computational speed, and theoretical results are proven to give parameter bounds which ensure correctness of decryption. The characteristics of encrypted computation are empirically shown to favour a non-standard acceleration technique. This demonstrates the possibility of approximating conventional statistical regression methods using encrypted data without compromising privacy.'
volume: 54
URL: http://proceedings.mlr.press/v54/esperanca17a.html
PDF: http://proceedings.mlr.press/v54/esperanca17a/esperanca17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-esperanca17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Esperanca
given: Pedro
- family: Aslett
given: Louis
- family: Holmes
given: Chris
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 334-343
id: esperanca17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 334
lastpage: 343
published: 2017-04-10 00:00:00 +0000
- title: 'Random Consensus Robust PCA'
abstract: 'This paper presents R2PCA, a random consensus method for robust principal component analysis. R2PCA takes RANSAC’s principle of using as little data as possible one step further. It iteratively selects small subsets of the data to identify pieces of the principal components, to then stitch them together. We show that if the principal components are in general position and the errors are sufficiently sparse, R2PCA will exactly recover the principal components with probability 1, in lieu of assumptions on coherence or the distribution of the sparse errors, and even under adversarial settings. R2PCA enjoys many advantages: it works well under noise, its computational complexity scales linearly in the ambient dimension, it is easily parallelizable, and due to its low sample complexity, it can be used in settings where data is so large it cannot even be stored in memory. We complement our theoretical findings with synthetic and real data experiments showing that r2pca outperforms state-of-the-art methods in a broad range of settings.'
volume: 54
URL: http://proceedings.mlr.press/v54/pimentel-alarcon17a.html
PDF: http://proceedings.mlr.press/v54/pimentel-alarcon17a/pimentel-alarcon17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-pimentel-alarcon17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Pimentel-Alarcon
given: Daniel
- family: Nowak
given: Robert
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 344-352
id: pimentel-alarcon17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 344
lastpage: 352
published: 2017-04-10 00:00:00 +0000
- title: 'Gray-box Inference for Structured Gaussian Process Models'
abstract: 'We develop an automated variational inference method for Bayesian structured prediction problems with Gaussian process (GP) priors and linear-chain likelihoods. Our approach does not need to know the details of the structured likelihood model and can scale up to a large number of observations. Furthermore, we show that the required expected likelihood term and its gradients in the variational objective (ELBO) can be estimated efficiently by using expectations over very low-dimensional Gaussian distributions. Optimization of the ELBO is fully parallelizable over sequences and amenable to stochastic optimization, which we use along with control variate techniques to make our framework useful in practice. Results on a set of natural language processing tasks show that our method can be as good as (and sometimes better than, in particular with respect to expected log-likelihood) hard-coded approaches including SVM-struct and CRF, and overcomes the scalability limitations of previous inference algorithms based on sampling. Overall, this is a fundamental step to developing automated inference methods for Bayesian structured prediction.'
volume: 54
URL: http://proceedings.mlr.press/v54/galliani17a.html
PDF: http://proceedings.mlr.press/v54/galliani17a/galliani17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-galliani17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Galliani
given: Pietro
- family: Dezfouli
given: Amir
- family: Bonilla
given: Edwin
- family: Quadrianto
given: Novi
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 353-361
id: galliani17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 353
lastpage: 361
published: 2017-04-10 00:00:00 +0000
- title: 'Frank-Wolfe Algorithms for Saddle Point Problems'
abstract: 'We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point (SP) problems. Remarkably, the method only requires access to linear minimization oracles. Leveraging recent advances in FW optimization, we provide the first proof of convergence of a FW-type saddle point solver over polytopes, thereby partially answering a 30 year-old conjecture. We also survey other convergence results and highlight gaps in the theoretical underpinnings of FW-style algorithms. Motivating applications without known efficient alternatives are explored through structured prediction with combinatorial penalties as well as games over matching polytopes involving an exponential number of constraints.'
volume: 54
URL: http://proceedings.mlr.press/v54/gidel17a.html
PDF: http://proceedings.mlr.press/v54/gidel17a/gidel17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-gidel17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gidel
given: Gauthier
- family: Jebara
given: Tony
- family: Lacoste-Julien
given: Simon
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 362-371
id: gidel17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 362
lastpage: 371
published: 2017-04-10 00:00:00 +0000
- title: 'A Framework for Optimal Matching for Causal Inference'
abstract: 'We propose a novel framework for matching estimators for causal effect from observational data that is based on minimizing the dual norm of estimation error when expressed as an operator. We show that many popular matching estimators can be expressed as optimal in this framework, including nearest-neighbor matching, coarsened exact matching, and mean-matched sampling. This reveals their motivation and aptness as structural priors formulated by embedding the effect in a particular functional space. This also gives rise to a range of new, kernel-based matching estimators that arise when one embeds the effect in a reproducing kernel Hilbert space. Depending on the case, these estimators can be found using either quadratic optimization or integer optimization. We show that estimators based on universal kernels are universally consistent without model specification. In empirical results using both synthetic and real data, the new, kernel-based estimators outperform all standard causal estimators in estimation error.'
volume: 54
URL: http://proceedings.mlr.press/v54/kallus17a.html
PDF: http://proceedings.mlr.press/v54/kallus17a/kallus17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-kallus17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kallus
given: Nathan
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 372-381
id: kallus17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 372
lastpage: 381
published: 2017-04-10 00:00:00 +0000
- title: 'Quantifying the accuracy of approximate diffusions and Markov chains'
abstract: 'Markov chains and diffusion processes are indispensable tools in machine learning and statistics that are used for inference, sampling, and modeling. With the growth of large-scale datasets, the computational cost associated with simulating these stochastic processes can be considerable, and many algorithms have been proposed to approximate the underlying Markov chain or diffusion. A fundamental question is how the computational savings trade off against the statistical error incurred due to approximations. This paper develops general results that address this question. We bound the Wasserstein distance between the equilibrium distributions of two diffusions as a function of their mixing rates and the deviation in their drifts. We show that this error bound is tight in simple Gaussian settings. Our general result on continuous diffusions can be discretized to provide insights into the computational–statistical trade-off of Markov chains. As an illustration, we apply our framework to derive finite-sample error bounds of approximate unadjusted Langevin dynamics. We characterize computation-constrained settings where, by using fast-to-compute approximate gradients in the Langevin dynamics, we obtain more accurate samples compared to using the exact gradients. Finally, as an additional application of our approach, we quantify the accuracy of approximate zig-zag sampling. Our theoretical analyses are supported by simulation experiments. '
volume: 54
URL: http://proceedings.mlr.press/v54/huggins17a.html
PDF: http://proceedings.mlr.press/v54/huggins17a/huggins17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-huggins17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Huggins
given: Jonathan
- family: Zou
given: James
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 382-391
id: huggins17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 382
lastpage: 391
published: 2017-04-10 00:00:00 +0000
- title: 'Stochastic Rank-1 Bandits'
abstract: 'We propose stochastic rank-1 bandits, a class of online learning problems where at each step a learning agent chooses a pair of row and column arms, and receives the product of their values as a reward. The main challenge of the problem is that the individual values of the row and column are unobserved. We assume that these values are stochastic and drawn independently. We propose a computationally-efficient algorithm for solving our problem, which we call Rank1Elim. We derive a O((K + L) (1 / Delta) log n) upper bound on its n-step regret, where K is the number of rows, L is the number of columns, and Delta is the minimum of the row and column gaps; under the assumption that the mean row and column rewards are bounded away from zero. To the best of our knowledge, we present the first bandit algorithm that finds the maximum entry of a rank-1 matrix whose regret is linear in K + L, 1 / Delta, and log n. We also derive a nearly matching lower bound. Finally, we evaluate Rank1Elim empirically on multiple problems. We observe that it leverages the structure of our problems and can learn near-optimal solutions even if our modeling assumptions are mildly violated.'
volume: 54
URL: http://proceedings.mlr.press/v54/katariya17a.html
PDF: http://proceedings.mlr.press/v54/katariya17a/katariya17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-katariya17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Katariya
given: Sumeet
- family: Kveton
given: Branislav
- family: Szepesvari
given: Csaba
- family: Vernade
given: Claire
- family: Wen
given: Zheng
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 392-401
id: katariya17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 392
lastpage: 401
published: 2017-04-10 00:00:00 +0000
- title: 'On the Troll-Trust Model for Edge Sign Prediction in Social Networks'
abstract: 'In the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i.e., the positive or negative nature of the social relationships). Many successful heuristics for this problem are based on the troll-trust features, estimating at each node the fraction of outgoing and incoming positive/negative edges. We show that these heuristics can be understood, and rigorously analyzed, as approximators to the Bayes optimal classifier for a simple probabilistic model of the edge labels. We then show that the maximum likelihood estimator for this model approximately corresponds to the predictions of a Label Propagation algorithm run on a transformed version of the original social graph. Extensive experiments on a number of real-world datasets show that this algorithm is competitive against state-of-the-art classifiers in terms of both accuracy and scalability. Finally, we show that troll-trust features can also be used to derive online learning algorithms which have theoretical guarantees even when edges are adversarially labeled.'
volume: 54
URL: http://proceedings.mlr.press/v54/falher17a.html
PDF: http://proceedings.mlr.press/v54/falher17a/falher17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-falher17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Falher
given: Géraud Le
- family: Cesa-Bianchi
given: Nicolo
- family: Gentile
given: Claudio
- family: Vitale
given: Fabio
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 402-411
id: falher17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 402
lastpage: 411
published: 2017-04-10 00:00:00 +0000
- title: 'Online Optimization of Smoothed Piecewise Constant Functions'
abstract: 'We study online optimization of smoothed piecewise constant functions over the domain [0, 1). This is motivated by the problem of adaptively picking parameters of learning algorithms as in the recently introduced framework by Gupta and Roughgarden (2016). Majority of the machine learning literature has focused on Lipschitz-continuous functions or functions with bounded gradients. This is with good reason-any learning algorithm suffers linear regret even against piecewise constant functions that are chosen adversarially, arguably the simplest of non-Lipschitz continuous functions. The smoothed setting we consider is inspired by the seminal work of Spielman and Teng (2004) and the recent work of Gupta and Roughgarden (2016)-in this setting, the sequence of functions may be chosen by an adversary, however, with some uncertainty in the location of discontinuities. We give algorithms that achieve sublinear regret in the full information and bandit settings.'
volume: 54
URL: http://proceedings.mlr.press/v54/cohen-addad17a.html
PDF: http://proceedings.mlr.press/v54/cohen-addad17a/cohen-addad17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-cohen-addad17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Cohen-Addad
given: Vincent
- family: Kanade
given: Varun
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 412-420
id: cohen-addad17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 412
lastpage: 420
published: 2017-04-10 00:00:00 +0000
- title: 'Combinatorial Topic Models using Small-Variance Asymptotics'
abstract: 'Modern topic models typically have a probabilistic formulation, and derive their inference algorithms based on Latent Dirichlet Allocation (LDA) and its variants. In contrast, we approach topic modeling via combinatorial optimization, and take a small-variance limit of LDA to derive a new objective function. We minimize this objective by using ideas from combinatorial optimization, obtaining a new, fast, and high-quality topic modeling algorithm. In particular, we show that our results are not only significantly better than traditional SVA algorithms, but also truly competitive with popular LDA-based approaches; we also discuss the (dis)similarities between our approach and its probabilistic counterparts.'
volume: 54
URL: http://proceedings.mlr.press/v54/jiang17a.html
PDF: http://proceedings.mlr.press/v54/jiang17a/jiang17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-jiang17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jiang
given: Ke
- family: Sra
given: Suvrit
- family: Kulis
given: Brian
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 421-429
id: jiang17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 421
lastpage: 429
published: 2017-04-10 00:00:00 +0000
- title: 'ConvNets with Smooth Adaptive Activation Functions for Regression'
abstract: 'Within Neural Networks (NN), the parameters of Adaptive Activation Functions (AAF) control the shapes of activation functions. These parameters are trained along with other parameters in the NN. AAFs have improved performance of Convolutional Neural Networks (CNN) in multiple classification tasks. In this paper, we propose and apply AAFs on CNNs for regression tasks. We argue that applying AAFs in the regression (second-to-last) layer of a NN can significantly decrease the bias of the regression NN. However, using existing AAFs may lead to overfitting. To address this problem, we propose a Smooth Adaptive Activation Function (SAAF) with a piecewise polynomial form which can approximate any continuous function to arbitrary degree of error, while having a bounded Lipschitz constant for given bounded model parameters. As a result, NNs with SAAF can avoid overfitting by simply regularizing model parameters. We empirically evaluated CNNs with SAAFs and achieved state-of-the-art results on age and pose estimation datasets.'
volume: 54
URL: http://proceedings.mlr.press/v54/hou17a.html
PDF: http://proceedings.mlr.press/v54/hou17a/hou17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-hou17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hou
given: Le
- family: Samaras
given: Dimitris
- family: Kurc
given: Tahsin
- family: Gao
given: Yi
- family: Saltz
given: Joel
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 430-439
id: hou17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 430
lastpage: 439
published: 2017-04-10 00:00:00 +0000
- title: 'Rapid Mixing Swendsen-Wang Sampler for Stochastic Partitioned Attractive Models'
abstract: 'The Gibbs sampler is the most popular Markov chain used for learning and inference problems in Graphical Models (GM). These tasks are computationally intractable in general, and the Gibbs sampler often suffers from slow mixing. In this paper, we study the Swendsen-Wang dynamics which is a more sophisticated Markov chain designed to overcome bottlenecks that impede Gibbs sampler. We prove O(log n) mixing time for attractive binary pairwise GMs (i.e., ferromagnetic Ising models) on stochastic partitioned graphs having n vertices, under some mild conditions including low temperature regions where the Gibbs sampler provably mixes exponentially slow. Our experiments also confirm that the Swendsen-Wang sampler significantly outperforms the Gibbs sampler for learning parameters of attractive GMs.'
volume: 54
URL: http://proceedings.mlr.press/v54/park17b.html
PDF: http://proceedings.mlr.press/v54/park17b/park17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-park17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Park
given: Sejun
- family: Jang
given: Yunhun
- family: Galanis
given: Andreas
- family: Shin
given: Jinwoo
- family: Stefankovic
given: Daniel
- family: Vigoda
given: Eric
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 440-449
id: park17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 440
lastpage: 449
published: 2017-04-10 00:00:00 +0000
- title: 'Efficient Rank Aggregation via Lehmer Codes'
abstract: 'We propose a novel rank aggregation method based on converting permutations into their corresponding Lehmer codes or other subdiagonal images. Lehmer codes, also known as inversion vectors, are vector representations of permutations in which each coordinate can take values not restricted by the values of other coordinates. This transformation allows for decoupling of the coordinates and for performing aggregation via simple scalar median or mode computations. We present simulation results illustrating the performance of this completely parallelizable approach and analytically prove that both the mode and median aggregation procedure recover the correct centroid aggregate with small sample complexity when the permutations are drawn according to the well-known Mallows models. The proposed Lehmer code approach may also be used on partial rankings, with similar performance guarantees. '
volume: 54
URL: http://proceedings.mlr.press/v54/li17b.html
PDF: http://proceedings.mlr.press/v54/li17b/li17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-li17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Pan
- family: Mazumdar
given: Arya
- family: Milenkovic
given: Olgica
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 450-459
id: li17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 450
lastpage: 459
published: 2017-04-10 00:00:00 +0000
- title: 'Nonlinear ICA of Temporally Dependent Stationary Sources'
abstract: 'We develop a nonlinear generalization of independent component analysis (ICA) or blind source separation, based on temporal dependencies (e.g. autocorrelations). We introduce a nonlinear generative model where the independent sources are assumed to be temporally dependent, non-Gaussian, and stationary, and we observe arbitrarily nonlinear mixtures of them. We develop a method for estimating the model (i.e. separating the sources) based on logistic regression in a neural network which learns to discriminate between a short temporal window of the data vs. a temporal window of temporally permuted data. We prove that the method estimates the sources for general smooth mixing nonlinearities, assuming the sources have sufficiently strong temporal dependencies, and these dependencies are in a certain way different from dependencies found in Gaussian processes. For Gaussian (and similar) sources, the method estimates the nonlinear part of the mixing. We thus provide the first rigorous and general proof of identifiability of nonlinear ICA for temporally dependent sources, together with a practical method for its estimation.'
volume: 54
URL: http://proceedings.mlr.press/v54/hyvarinen17a.html
PDF: http://proceedings.mlr.press/v54/hyvarinen17a/hyvarinen17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-hyvarinen17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hyvarinen
given: Aapo
- family: Morioka
given: Hiroshi
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 460-469
id: hyvarinen17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 460
lastpage: 469
published: 2017-04-10 00:00:00 +0000
- title: 'Stochastic Difference of Convex Algorithm and its Application to Training Deep Boltzmann Machines'
abstract: 'Difference of convex functions (DC) programming is an important approach to nonconvex optimization problems because these structures can be encountered in several fields. Effective optimization methods, called DC algorithms, have been developed in deterministic optimization literature. In machine learning, a lot of important learning problems such as the Boltzmann machines (BMs) can be formulated as DC programming. However, there is no DC-like algorithm guaranteed by convergence rate analysis for stochastic problems that are more suitable settings for machine learning tasks. In this paper, we propose a stochastic variant of DC algorithm and give computational complexities to converge to a stationary point under several situations. Moreover, we show our method includes expectation-maximization (EM) and Monte Carlo EM (MCEM) algorithm as special cases on training BMs. In other words, we extend EM/MCEM algorithm to more effective methods from DC viewpoint with theoretical convergence guarantees. Experimental results indicate that our method performs well for training binary restricted Boltzmann machines and deep Boltzmann machines without pre-training.'
volume: 54
URL: http://proceedings.mlr.press/v54/nitanda17a.html
PDF: http://proceedings.mlr.press/v54/nitanda17a/nitanda17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-nitanda17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Nitanda
given: Atsushi
- family: Suzuki
given: Taiji
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 470-478
id: nitanda17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 470
lastpage: 478
published: 2017-04-10 00:00:00 +0000
- title: 'Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot'
abstract: 'While there has been a significant amount of work studying gradient descent techniques for non-convex optimization problems over the last few years, all existing results establish either local convergence with good rates or global convergence with highly suboptimal rates, for many problems of interest. In this paper, we take the first step in getting the best of both worlds – establishing global convergence and obtaining a good rate of convergence for the problem of computing squareroot of a positive semidefinite (PSD) matrix, which is a widely studied problem in numerical linear algebra with applications in machine learning and statistics among others. Given a PSD matrix M and a PSD starting point $U_0$, we show that gradient descent with appropriately chosen step-size finds an epsilon-accurate squareroot of M in $O(α\log(||M-U_0||_F^2 / ε))$ iterations, where $α= (\max{||U_0||^2, ||M||} / \min{\sigma_min^2(U_0), \sigma_min(M)} )^3/2$. Our result is the first to establish global convergence for this problem and that it is robust to errors in each iteration. A key contribution of our work is the general proof technique which we believe should further excite research in understanding deterministic and stochastic variants of simple non-convex gradient descent algorithms with good global convergence rates for other problems in machine learning and numerical linear algebra. '
volume: 54
URL: http://proceedings.mlr.press/v54/jain17a.html
PDF: http://proceedings.mlr.press/v54/jain17a/jain17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-jain17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jain
given: Prateek
- family: Jin
given: Chi
- family: Kakade
given: Sham
- family: Netrapalli
given: Praneeth
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 479-488
id: jain17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 479
lastpage: 488
published: 2017-04-10 00:00:00 +0000
- title: 'Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms'
abstract: 'Variational inference using the reparameterization trick has enabled large-scale approximate Bayesian inference in complex probabilistic models, leveraging stochastic optimization to sidestep intractable expectations. The reparameterization trick is applicable when we can simulate a random variable by applying a differentiable deterministic function on an auxiliary random variable whose distribution is fixed. For many distributions of interest (such as the gamma or Dirichlet), simulation of random variables relies on acceptance-rejection sampling. The discontinuity introduced by the accept-reject step means that standard reparameterization tricks are not applicable. We propose a new method that lets us leverage reparameterization gradients even when variables are outputs of a acceptance-rejection sampling algorithm. Our approach enables reparameterization on a larger class of variational distributions. In several studies of real and synthetic data, we show that the variance of the estimator of the gradient is significantly lower than other state-of-the-art methods. This leads to faster convergence of stochastic gradient variational inference.'
volume: 54
URL: http://proceedings.mlr.press/v54/naesseth17a.html
PDF: http://proceedings.mlr.press/v54/naesseth17a/naesseth17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-naesseth17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Naesseth
given: Christian
- family: Ruiz
given: Francisco
- family: Linderman
given: Scott
- family: Blei
given: David
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 489-498
id: naesseth17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 489
lastpage: 498
published: 2017-04-10 00:00:00 +0000
- title: 'Asymptotically exact inference in differentiable generative models'
abstract: 'Many generative models can be expressed as a differentiable function of random inputs drawn from some simple probability density. This framework includes both deep generative architectures such as Variational Autoencoders and a large class of procedurally defined simulator models. We present a method for performing efficient MCMC inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where Approximate Bayesian Computation might otherwise be employed. We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to coherently move between inputs exactly consistent with observations. We validate the method by performing inference tasks in a diverse set of models.'
volume: 54
URL: http://proceedings.mlr.press/v54/graham17a.html
PDF: http://proceedings.mlr.press/v54/graham17a/graham17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-graham17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Graham
given: Matthew
- family: Storkey
given: Amos
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 499-508
id: graham17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 499
lastpage: 508
published: 2017-04-10 00:00:00 +0000
- title: 'Decentralized Collaborative Learning of Personalized Models over Networks'
abstract: 'We consider a set of learning agents in a collaborative peer-to-peer network, where each agent learns a personalized model according to its own learning objective. The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives? We introduce and analyze two asynchronous gossip algorithms running in a fully decentralized manner. Our first approach, inspired from label propagation, aims to smooth pre-trained local models over the network while accounting for the confidence that each agent has in its initial model. In our second approach, agents jointly learn and propagate their model by making iterative updates based on both their local dataset and the behavior of their neighbors. To optimize this challenging objective, our decentralized algorithm is based on ADMM.'
volume: 54
URL: http://proceedings.mlr.press/v54/vanhaesebrouck17a.html
PDF: http://proceedings.mlr.press/v54/vanhaesebrouck17a/vanhaesebrouck17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-vanhaesebrouck17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Vanhaesebrouck
given: Paul
- family: Bellet
given: Aurélien
- family: Tommasi
given: Marc
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 509-517
id: vanhaesebrouck17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 509
lastpage: 517
published: 2017-04-10 00:00:00 +0000
- title: 'Contextual Bandits with Latent Confounders: An NMF Approach'
abstract: 'Motivated by online recommendation and advertising systems, we consider a causal model for stochastic contextual bandits with a latent low-dimensional confounder. In our model, there are $L$ observed contexts and $K$ arms of the bandit. The observed context influences the reward obtained through a latent confounder variable with cardinality $m$ ($m ≪L,K$). The arm choice and the latent confounder causally determines the reward while the observed context is correlated with the confounder. Under this model, the $L \times K$ mean reward matrix $\mathbfU$ (for each context in $[L]$ and each arm in $[K]$) factorizes into non-negative factors $\mathbfA$ ($L \times m$) and $\mathbfW$ ($m \times K$). This insight enables us to propose an $ε$-greedy NMF-Bandit algorithm that designs a sequence of interventions (selecting specific arms), that achieves a balance between learning this low-dimensional structure and selecting the best arm to minimize regret. Our algorithm achieves a regret of $\mathcalO\left(L\mathrmpoly(m, \log K) \log T \right)$ at time $T$, as compared to $\mathcalO(LK\log T)$ for conventional contextual bandits, assuming a constant gap between the best arm and the rest for each context. These guarantees are obtained under mild sufficiency conditions on the factors that are weaker versions of the well-known Statistical RIP condition. We further propose a class of generative models that satisfy our sufficient conditions, and derive a lower bound of $\mathcalO\left(Km\log T\right)$. These are the first regret guarantees for online matrix completion with bandit feedback, when the rank is greater than one. We further compare the performance of our algorithm with the state of the art, on synthetic and real world data-sets.'
volume: 54
URL: http://proceedings.mlr.press/v54/sen17a.html
PDF: http://proceedings.mlr.press/v54/sen17a/sen17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-sen17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sen
given: Rajat
- family: Shanmugam
given: Karthikeyan
- family: Kocaoglu
given: Murat
- family: Dimakis
given: Alex
- family: Shakkottai
given: Sanjay
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 518-527
id: sen17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 518
lastpage: 527
published: 2017-04-10 00:00:00 +0000
- title: 'Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets'
abstract: 'Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed FABOLAS, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that FABOLAS often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.'
volume: 54
URL: http://proceedings.mlr.press/v54/klein17a.html
PDF: http://proceedings.mlr.press/v54/klein17a/klein17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-klein17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Klein
given: Aaron
- family: Falkner
given: Stefan
- family: Bartels
given: Simon
- family: Hennig
given: Philipp
- family: Hutter
given: Frank
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 528-536
id: klein17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 528
lastpage: 536
published: 2017-04-10 00:00:00 +0000
- title: 'Least-Squares Log-Density Gradient Clustering for Riemannian Manifolds'
abstract: 'Mean shift is a mode-seeking clustering algorithm that has been successfully used in a wide range of applications such as image segmentation and object tracking. To further improve the clustering performance, mean shift has been extended to various directions, including generalization to handle data on Riemannian manifolds and extension to directly estimating the density gradient without density estimation. In this paper, we combine these ideas and propose a novel mode-seeking algorithm for Riemannian manifolds with direct density-gradient estimation. Although the idea of combining the two extensions is rather straightforward, directly estimating the density gradient on Riemannian manifolds is mathematically challenging. We will provide a mathematically sound algorithm and demonstrate its usefulness through experiments.'
volume: 54
URL: http://proceedings.mlr.press/v54/ashizawa17a.html
PDF: http://proceedings.mlr.press/v54/ashizawa17a/ashizawa17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-ashizawa17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ashizawa
given: Mina
- family: Sasaki
given: Hiroaki
- family: Sakai
given: Tomoya
- family: Sugiyama
given: Masashi
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 537-546
id: ashizawa17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 537
lastpage: 546
published: 2017-04-10 00:00:00 +0000
- title: 'Fast column generation for atomic norm regularization'
abstract: 'We consider optimization problems that consist in minimizing a quadratic function under an atomic norm regularization or constraint. In the line of work on conditional gradient algorithms, we show that the fully corrective Frank-Wolfe (FCFW) algorithm — which is most naturally reformulated as a column generation algorithm in the regularized case — can be made particularly efficient for difficult problems in this family by solving the simplicial or conical subproblems produced by FCFW using a special instance of a classical active set algorithm for quadratic programming that generalizes the min-norm point algorithm.'
volume: 54
URL: http://proceedings.mlr.press/v54/vinyes17a.html
PDF: http://proceedings.mlr.press/v54/vinyes17a/vinyes17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-vinyes17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Vinyes
given: Marina
- family: Obozinski
given: Guillaume
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 547-556
id: vinyes17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 547
lastpage: 556
published: 2017-04-10 00:00:00 +0000
- title: 'Bayesian Hybrid Matrix Factorisation for Data Integration'
abstract: 'We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for data integration, based on combining multiple matrix factorisation methods, that can be used for in- and out-of-matrix prediction of missing values. The model is very general and can be used to integrate many datasets across different entity types, including repeated experiments, similarity matrices, and very sparse datasets. We apply our method on two biological applications, and extensively compare it to state-of-the-art machine learning and matrix factorisation models. For in-matrix predictions on drug sensitivity datasets we obtain consistently better performances than existing methods. This is especially the case when we increase the sparsity of the datasets. Furthermore, we perform out-of-matrix predictions on methylation and gene expression datasets, and obtain the best results on two of the three datasets, especially when the predictivity of datasets is high.'
volume: 54
URL: http://proceedings.mlr.press/v54/brouwer17a.html
PDF: http://proceedings.mlr.press/v54/brouwer17a/brouwer17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-brouwer17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Brouwer
given: Thomas
- family: Lio
given: Pietro
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 557-566
id: brouwer17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 557
lastpage: 566
published: 2017-04-10 00:00:00 +0000
- title: 'Co-Occurring Directions Sketching for Approximate Matrix Multiply'
abstract: 'We introduce co-occurring directions sketching, a deterministic algorithm for approximate matrix product (AMM), in the streaming model. We show that co-occurring directions achieves a better error bound for AMM than other randomized and deterministic approaches for AMM. Co-occurring directions gives a (1 + epsilon) - approximation of the optimal low rank approximation of a matrix product. Empirically our algorithm outperforms competing methods for AMM, for a small sketch size. We validate empirically our theoretical findings and algorithms.'
volume: 54
URL: http://proceedings.mlr.press/v54/mroueh17a.html
PDF: http://proceedings.mlr.press/v54/mroueh17a/mroueh17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-mroueh17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Mroueh
given: Youssef
- family: Marcheret
given: Etienne
- family: Goel
given: Vaibahava
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 567-575
id: mroueh17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 567
lastpage: 575
published: 2017-04-10 00:00:00 +0000
- title: 'Exploration-Exploitation in MDPs with Options'
abstract: 'While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first analyze the algorithm in the general case of semi-Markov decision processes (SMDPs), we show how these results can be translated to the specific case of MDPs with options and we illustrate simple scenarios in which the regret of learning with options can be provably much smaller than the regret suffered when learning with primitive actions. '
volume: 54
URL: http://proceedings.mlr.press/v54/fruit17a.html
PDF: http://proceedings.mlr.press/v54/fruit17a/fruit17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-fruit17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Fruit
given: Ronan
- family: Lazaric
given: Alessandro
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 576-584
id: fruit17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 576
lastpage: 584
published: 2017-04-10 00:00:00 +0000
- title: 'Local Perturb-and-MAP for Structured Prediction'
abstract: 'Conditional random fields (CRFs) provide a powerful tool for structured prediction, but cast significant challenges in both the learning and inference steps. Approximation techniques are widely used in both steps, which should be considered jointly to guarantee good performance (a.k.a. “inferning"). Perturb-and-MAP models provide a promising alternative to CRFs, but require global combinatorial optimization and hence they are usable only on specific models. In this work, we present a new Local Perturb-and-MAP (locPMAP) framework that replaces the global optimization with a local optimization by exploiting our observed connection between locPMAP and the pseudolikelihood of the original CRF model. We test our approach on three different vision tasks and show that our method achieves consistently improved performance over other approximate inference techniques optimized to a pseudolikelihood objective. Additionally, we demonstrate that we can integrate our method in the fully convolutional network framework to increase our model’s complexity. Finally, our observed connection between locPMAP and the pseudolikelihood leads to a novel perspective for understanding and using pseudolikelihood. '
volume: 54
URL: http://proceedings.mlr.press/v54/bertasius17a.html
PDF: http://proceedings.mlr.press/v54/bertasius17a/bertasius17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-bertasius17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bertasius
given: Gedas
- family: Liu
given: Qiang
- family: Torresani
given: Lorenzo
- family: Shi
given: Jianbo
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 585-594
id: bertasius17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 585
lastpage: 594
published: 2017-04-10 00:00:00 +0000
- title: 'Gradient Boosting on Stochastic Data Streams'
abstract: 'Boosting is a popular ensemble algorithm that generates more powerful learners by linearly combining base models from a simpler hypothesis class. In this work, we investigate the problem of adapting batch gradient boosting for minimizing convex loss functions to online setting where the loss at each iteration is i.i.d sampled from an unknown distribution. To generalize from batch to online, we first introduce the definition of online weak learning edge with which for strongly convex and smooth loss functions, we present an algorithm, Streaming Gradient Boosting (SGB) with exponential shrinkage guarantees in the number of weak learners. We further present an adaptation of SGB to optimize non-smooth loss functions, for which we derive a $O(\ln(N)/N)$ convergence rate. We also show that our analysis can extend to adversarial online learning setting under a stronger assumption that the online weak learning edge will hold in adversarial setting. We finally demonstrate experimental results showing that in practice our algorithms can achieve competitive results as classic gradient boosting while using less computation.'
volume: 54
URL: http://proceedings.mlr.press/v54/hu17a.html
PDF: http://proceedings.mlr.press/v54/hu17a/hu17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-hu17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hu
given: Hanzhang
- family: Sun
given: Wen
- family: Venkatraman
given: Arun
- family: Hebert
given: Martial
- family: Bagnell
given: Andrew
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 595-603
id: hu17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 595
lastpage: 603
published: 2017-04-10 00:00:00 +0000
- title: 'Online Learning and Blackwell Approachability with Partial Monitoring: Optimal Convergence Rates'
abstract: 'Blackwell approachability is an online learning setup generalizing the classical problem of regret minimization by allowing for instance multi-criteria optimization, global (online) optimization of a convex loss, or online linear optimization under some cumulative constraint. We consider partial monitoring where the decision maker does not necessarily observe the outcomes of his decision (unlike the traditional regret/bandit literature). Instead, he receives a random signal correlated to the decision–outcome pair, or only to the outcome. We construct, for the first time, approachability algorithms with convergence rate of order $O(T^-1/2)$ when the signal is independent of the decision and of order $O(T^-1/3)$ in the case of general signals. Those rates are optimal in the sense that they cannot be improved without further assumption on the structure of the objectives and/or the signals.'
volume: 54
URL: http://proceedings.mlr.press/v54/kwon17a.html
PDF: http://proceedings.mlr.press/v54/kwon17a/kwon17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-kwon17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kwon
given: Joon
- family: Perchet
given: Vianney
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 604-613
id: kwon17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 604
lastpage: 613
published: 2017-04-10 00:00:00 +0000
- title: 'Tensor Decompositions via Two-Mode Higher-Order SVD (HOSVD)'
abstract: 'Tensor decompositions have rich applications in statistics and machine learning, and developing efficient, accurate algorithms for the problem has received much attention recently. Here, we present a new method built on Kruskal’s uniqueness theorem to decompose symmetric, nearly orthogonally decomposable tensors. Unlike the classical higher-order singular value decomposition which unfolds a tensor along a single mode, we consider unfoldings along two modes and use rank-1 constraints to characterize the underlying components. This tensor decomposition method provably handles a greater level of noise compared to previous methods and achieves a high estimation accuracy. Numerical results demonstrate that our algorithm is robust to various noise distributions and that it performs especially favorably as the order increases.'
volume: 54
URL: http://proceedings.mlr.press/v54/wang17a.html
PDF: http://proceedings.mlr.press/v54/wang17a/wang17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-wang17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Miaoyan
- family: Song
given: Yun
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 614-622
id: wang17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 614
lastpage: 622
published: 2017-04-10 00:00:00 +0000
- title: 'Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers'
abstract: 'For optimal decision making under variable class distributions and misclassification costs a classifier needs to produce well-calibrated estimates of the posterior probability. Isotonic calibration is a powerful non-parametric method that is however prone to overfitting on smaller datasets; hence a parametric method based on the logistic curve is commonly used. While logistic calibration is designed for normally distributed per-class scores, we demonstrate experimentally that many classifiers including Naive Bayes and Adaboost suffer from a particular distortion where these score distributions are heavily skewed. In such cases logistic calibration can easily yield probability estimates that are worse than the original scores. Moreover, the logistic curve family does not include the identity function, and hence logistic calibration can easily uncalibrate a perfectly calibrated classifier. In this paper we solve all these problems with a richer class of calibration maps based on the beta distribution. We derive the method from first principles and show that fitting it is as easy as fitting a logistic curve. Extensive experiments show that beta calibration is superior to logistic calibration for Naive Bayes and Adaboost. '
volume: 54
URL: http://proceedings.mlr.press/v54/kull17a.html
PDF: http://proceedings.mlr.press/v54/kull17a/kull17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-kull17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kull
given: Meelis
- family: Filho
given: Telmo Silva
- family: Flach
given: Peter
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 623-631
id: kull17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 623
lastpage: 631
published: 2017-04-10 00:00:00 +0000
- title: 'Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes'
abstract: 'Datasets with hundreds of variables and many missing values are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false positives. This paper proposes an approach that combines probabilistic programming, information theory, and non-parametric Bayes. It shows how to use Bayesian non-parametric modeling to (i) build an ensemble of joint probability models for all the variables; (ii) efficiently detect marginal independencies; and (iii) estimate the conditional mutual information between arbitrary subsets of variables, subject to a broad class of constraints. Users can access these capabilities using BayesDB, a probabilistic programming platform for probabilistic data analysis, by writing queries in a simple, SQL-like language. This paper demonstrates empirically that the method can (i) detect context-specific (in)dependencies on challenging synthetic problems and (ii) yield improved sensitivity and specificity over baselines from statistics and machine learning, on a real-world database of over 300 sparsely observed indicators of macroeconomic development and public health.'
volume: 54
URL: http://proceedings.mlr.press/v54/saad17a.html
PDF: http://proceedings.mlr.press/v54/saad17a/saad17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-saad17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Saad
given: Feras
- family: Mansinghka
given: Vikash
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 632-641
id: saad17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 632
lastpage: 641
published: 2017-04-10 00:00:00 +0000
- title: 'High-dimensional Time Series Clustering via Cross-Predictability'
abstract: 'The key to time series clustering is how to characterize the similarity between any two time series. In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others. However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other. We demonstrate that this approach is computationally attractive, and provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee. We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.'
volume: 54
URL: http://proceedings.mlr.press/v54/hong17a.html
PDF: http://proceedings.mlr.press/v54/hong17a/hong17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-hong17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hong
given: Dezhi
- family: Gu
given: Quanquan
- family: Whitehouse
given: Kamin
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 642-651
id: hong17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 642
lastpage: 651
published: 2017-04-10 00:00:00 +0000
- title: 'Minimax Approach to Variable Fidelity Data Interpolation'
abstract: 'Engineering problems often involve data sources of variable fidelity with different costs of obtaining an observation. In particular, one can use both a cheap low fidelity function (e.g. a computational experiment with a CFD code) and an expensive high fidelity function (e.g. a wind tunnel experiment) to generate a data sample in order to construct a regression model of a high fidelity function. The key question in this setting is how the sizes of the high and low fidelity data samples should be selected in order to stay within a given computational budget and maximize accuracy of the regression model prior to committing resources on data acquisition. In this paper we obtain minimax interpolation errors for single and variable fidelity scenarios for a multivariate Gaussian process regression. Evaluation of the minimax errors allows us to identify cases when the variable fidelity data provides better interpolation accuracy than the exclusively high fidelity data for the same computational budget. These results allow us to calculate the optimal shares of variable fidelity data samples under the given computational budget constraint. Real and synthetic data experiments suggest that using the obtained optimal shares often outperforms natural heuristics in terms of the regression accuracy. '
volume: 54
URL: http://proceedings.mlr.press/v54/zaytsev17a.html
PDF: http://proceedings.mlr.press/v54/zaytsev17a/zaytsev17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-zaytsev17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zaytsev
given: Alexey
- family: Burnaev
given: Evgeny
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 652-661
id: zaytsev17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 652
lastpage: 661
published: 2017-04-10 00:00:00 +0000
- title: 'Data Driven Resource Allocation for Distributed Learning'
abstract: 'In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be “locally simple but globally complex” (Vapnik and Bottou, 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based on random partitioning, balanced partition trees, and locality sensitive hashing, showing that we achieve significantly higher accuracy on both synthetic and real world image and advertising datasets. We also demonstrate that our technique strongly scales with the available computing power.'
volume: 54
URL: http://proceedings.mlr.press/v54/dick17a.html
PDF: http://proceedings.mlr.press/v54/dick17a/dick17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-dick17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Dick
given: Travis
- family: Li
given: Mu
- family: Pillutla
given: Venkata Krishna
- family: White
given: Colin
- family: Balcan
given: Nina
- family: Smola
given: Alex
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 662-671
id: dick17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 662
lastpage: 671
published: 2017-04-10 00:00:00 +0000
- title: 'Learning Nonparametric Forest Graphical Models with Prior Information'
abstract: 'We present a framework for incorporating prior information into nonparametric estimation of graphical models. To avoid distributional assumptions, we restrict the graph to be a forest and build on the work of forest density estimation (FDE). We reformulate the FDE approach from a Bayesian perspective, and introduce prior distributions on the graphs. As two concrete examples, we apply this framework to estimating scale-free graphs and learning multiple graphs with similar structures. The resulting algorithms are equivalent to finding a maximum spanning tree of a weighted graph with a penalty term on the connectivity pattern of the graph. We solve the optimization problem via a minorize-maximization procedure with Kruskal’s algorithm. Simulations show that the proposed methods outperform competing parametric methods, and are robust to the true data distribution. They also lead to improvement in predictive power and interpretability in two real data sets.'
volume: 54
URL: http://proceedings.mlr.press/v54/zhu17a.html
PDF: http://proceedings.mlr.press/v54/zhu17a/zhu17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-zhu17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhu
given: Yuancheng
- family: Liu
given: Zhe
- family: Sun
given: Siqi
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 672-680
id: zhu17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 672
lastpage: 680
published: 2017-04-10 00:00:00 +0000
- title: 'Sparse Randomized Partition Trees for Nearest Neighbor Search'
abstract: 'Randomized partition trees have recently been shown to be very effective in solving nearest neighbor search problem. In spite of enjoying strong theoretical guarantee, it suffers from high space complexity, since each internal node of the tree needs to store a $d$ dimensional projection direction leading to a $O(nd)$ space complexity for a dataset of size $n$. Inspired by the fast Johnson-Lindenstrauss transform, in this paper, we propose a sparse version of randomized partition tree where each internal node needs to store only a few non-zero entries, as opposed to all $d$ entries, leading to significant space savings without sacrificing much in terms of nearest neighbor search accuracy. As a by product of this, query time of our proposed method is slightly better than that of its non-sparse counterpart for large dataset size. Our theoretical results indicate that our proposed method enjoys the same theoretical guarantee as that of the original non-sparse RP-tree. Experimental evaluations on four real world dataset strongly suggest that nearest neighbor search performance of our proposed sparse RP-tree is very similar to that of its non-sparse counterpart in terms of accuracy and number of retrieved points.'
volume: 54
URL: http://proceedings.mlr.press/v54/sinha17a.html
PDF: http://proceedings.mlr.press/v54/sinha17a/sinha17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-sinha17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sinha
given: Kaushik
- family: Keivani
given: Omid
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 681-689
id: sinha17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 681
lastpage: 689
published: 2017-04-10 00:00:00 +0000
- title: 'Horde of Bandits using Gaussian Markov Random Fields'
abstract: 'The gang of bandits (GOB) model [7] is a recent contextual bandits framework that shares information between a set of bandit problems, related by a known (possibly noisy) graph. This model is useful in problems like recommender systems where the large number of users makes it vital to transfer information between users. Despite its effectiveness, the existing GOB model can only be applied to small problems due to its quadratic time-dependence on the number of nodes. Existing solutions to combat the scalability issue require an often-unrealistic clustering assumption. By exploiting a connection to Gaussian Markov random fields (GMRFs), we show that the GOB model can be made to scale to much larger graphs without additional assumptions. In addition, we propose a Thompson sampling algorithm which uses the recent GMRF sampling-by-perturbation technique, allowing it to scale to even larger problems (leading to a “horde” of bandits). We give regret bounds and experimental results for GOB with Thompson sampling and epoch-greedy algorithms, indicating that these methods are as good as or significantly better than ignoring the graph or adopting a clustering-based approach. Finally, when an existing graph is not available, we propose a heuristic for learning it on the fly and show promising results. '
volume: 54
URL: http://proceedings.mlr.press/v54/vaswani17a.html
PDF: http://proceedings.mlr.press/v54/vaswani17a/vaswani17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-vaswani17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Vaswani
given: Sharan
- family: Schmidt
given: Mark
- family: Lakshmanan
given: Laks
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 690-699
id: vaswani17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 690
lastpage: 699
published: 2017-04-10 00:00:00 +0000
- title: 'Random projection design for scalable implicit smoothing of randomly observed stochastic processes'
abstract: 'Sampling at random timestamps, long range dependencies, and scale hamper standard meth- ods for multivariate time series analysis. In this paper we present a novel estimator for cross-covariance of randomly observed time series which unravels the dynamics of an unobserved stochastic process. We analyze the statistical properties of our estimator without needing the assumption that observation timestamps are independent from the process of interest and show that our solution is not hindered by the issues affecting standard estimators for cross-covariance. We implement and evaluate our statistically sound and scalable approach in the distributed setting using Apache Spark and demonstrate its ability to unravel causal dynamics on both simulations and high-frequency financial trading data.'
volume: 54
URL: http://proceedings.mlr.press/v54/belletti17a.html
PDF: http://proceedings.mlr.press/v54/belletti17a/belletti17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-belletti17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Belletti
given: Francois
- family: Sparks
given: Evan
- family: Bayen
given: Alexandre
- family: Gonzalez
given: Joseph
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 700-708
id: belletti17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 700
lastpage: 708
published: 2017-04-10 00:00:00 +0000
- title: 'Trading off Rewards and Errors in Multi-Armed Bandits'
abstract: 'In multi-armed bandits, the most common objective is the maximization of the cumulative reward. Alternative settings include active exploration, where a learner tries to gain accurate estimates of the rewards of all arms. While these objectives are contrasting, in many scenarios it is desirable to trade off rewards and errors. For instance, in educational games the designer wants to gather generalizable knowledge about the behavior of the students and teaching strategies (small estimation errors) but, at the same time, the system needs to avoid giving a bad experience to the players, who may leave the system permanently (large reward). In this paper, we formalize this tradeoff and introduce the ForcingBalance algorithm whose performance is provably close to the best possible tradeoff strategy. Finally, we demonstrate on real-world educational data that ForcingBalance returns useful information about the arms without compromising the overall reward.'
volume: 54
URL: http://proceedings.mlr.press/v54/erraqabi17a.html
PDF: http://proceedings.mlr.press/v54/erraqabi17a/erraqabi17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-erraqabi17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Erraqabi
given: Akram
- family: Lazaric
given: Alessandro
- family: Valko
given: Michal
- family: Brunskill
given: Emma
- family: Liu
given: Yun-En
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 709-717
id: erraqabi17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 709
lastpage: 717
published: 2017-04-10 00:00:00 +0000
- title: 'Adaptive ADMM with Spectral Penalty Parameter Selection'
abstract: 'The alternating direction method of multipliers (ADMM) is a versatile tool for solving a wide range of constrained optimization problems. However, its performance is highly sensitive to a penalty parameter, making ADMM often unreliable and hard to automate for a non-expert user. We tackle this weakness of ADMM by proposing a method that adaptively tunes the penalty parameter to achieve fast convergence. The resulting adaptive ADMM (AADMM) algorithm, inspired by the successful Barzilai-Borwein spectral method for gradient descent, yields fast convergence and relative insensitivity to the initial stepsize and problem scaling.'
volume: 54
URL: http://proceedings.mlr.press/v54/xu17a.html
PDF: http://proceedings.mlr.press/v54/xu17a/xu17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-xu17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Xu
given: Zheng
- family: Figueiredo
given: Mario
- family: Goldstein
given: Tom
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 718-727
id: xu17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 718
lastpage: 727
published: 2017-04-10 00:00:00 +0000
- title: 'The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits'
abstract: 'Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the otimism principle and Thompson sampling. Prior analysis has mostly focussed on the worst-case setting. We analyse the asymptotic regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate. In fact, they can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation, for example, generalised linear bandits and reinforcement learning. '
volume: 54
URL: http://proceedings.mlr.press/v54/lattimore17a.html
PDF: http://proceedings.mlr.press/v54/lattimore17a/lattimore17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-lattimore17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lattimore
given: Tor
- family: Szepesvari
given: Csaba
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 728-737
id: lattimore17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 728
lastpage: 737
published: 2017-04-10 00:00:00 +0000
- title: 'Dynamic Collaborative Filtering With Compound Poisson Factorization'
abstract: 'Model-based collaborative filtering (CF) analyzes user–item interactions to infer latent factors that represent user preferences and item characteristics in order to predict future interactions. Most CF approaches assume that these latent factors are static; however, in most CF data, user preferences and item perceptions drift over time. Here, we propose a new conjugate and numerically stable dynamic matrix factorization (DCPF) based on hierarchical Poisson factorization that models the smoothly drifting latent factors using gamma-Markov chains. We propose a conjugate gamma chain construction that is numerically stable within our compound-Poisson framework. We then derive a scalable stochastic variational inference approach to estimate the parameters of our model. We apply our model to time-stamped ratings data sets from Netflix, Yelp, and Last.fm. We empirically demonstrate that DCPF achieves a higher predictive accuracy than state-of-the-art static and dynamic factorization algorithms. '
volume: 54
URL: http://proceedings.mlr.press/v54/jerfel17a.html
PDF: http://proceedings.mlr.press/v54/jerfel17a/jerfel17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-jerfel17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jerfel
given: Ghassen
- family: Basbug
given: Mehmet
- family: Engelhardt
given: Barbara
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 738-747
id: jerfel17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 738
lastpage: 747
published: 2017-04-10 00:00:00 +0000
- title: 'Rank Aggregation and Prediction with Item Features'
abstract: 'We study the problem of rank aggregation with features, where both pairwise comparisons and item features are available to help the rank aggregation task. Observing that traditional rank aggregation methods disregard features, while models adapted from learning-to-rank task are sensitive to feature noise, we propose a general model to learn a total ranking by balancing between comparisons and feature information jointly. As a result, our proposed model takes advantage of item features and is also robust to noise. More importantly, we study the effectiveness of item features in our model and show that given sufficiently informative features, the sample complexity of our model can be asymptotically lower than models based only on comparisons for deriving an $ε$-accurate ranking. The results theoretically justify that our model can achieve efficient learning by leveraging item feature information. In addition, we show that the proposed model can also be extended to two other related problems—online rank aggregation and rank prediction of new items. Finally, experiments show that our model is more effective and robust compared to existing methods on both synthetic and real datasets. '
volume: 54
URL: http://proceedings.mlr.press/v54/chiang17a.html
PDF: http://proceedings.mlr.press/v54/chiang17a/chiang17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-chiang17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chiang
given: Kai-Yang
- family: Hsieh
given: Cho-Jui
- family: Dhillon
given: Inderjit
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 748-756
id: chiang17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 748
lastpage: 756
published: 2017-04-10 00:00:00 +0000
- title: 'Robust and Efficient Computation of Eigenvectors in a Generalized Spectral Method for Constrained Clustering'
abstract: 'FAST-GE is a generalized spectral method for constrained clustering [Cucuringu et al., AISTATS 2016]. It incorporates the must-link and cannot-link constraints into two Laplacian matrices and then minimizes a Rayleigh quotient via solving a generalized eigenproblem, and is considered to be simple and scalable. However, there are two unsolved issues. Theoretically, since both Laplacian matrices are positive semi-definite and the corresponding pencil is singular, it is not proven whether the minimum of the Rayleigh quotient exists and is equivalent to an eigenproblem. Computationally, the locally optimal block preconditioned conjugate gradient (LOBPCG) method is not designed for solving the eigenproblem of a singular pencil. In fact, to the best of our knowledge, there is no existing eigensolver that is immediately applicable. In this paper, we provide solutions to these two critical issues. We prove a generalization of Courant-Fischer variational principle for the Laplacian singular pencil. We propose a regularization for the pencil so that LOBPCG is applicable. We demonstrate the robustness and efficiency of proposed solutions for constrained image segmentation. The proposed theoretical and computational solutions can be applied to eigenproblems of positive semi-definite pencils arising in other machine learning algorithms, such as generalized linear discriminant analysis in dimension reduction and multisurface classification via eigenvectors. '
volume: 54
URL: http://proceedings.mlr.press/v54/jiang17b.html
PDF: http://proceedings.mlr.press/v54/jiang17b/jiang17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-jiang17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jiang
given: Chengming
- family: Xie
given: Huiqing
- family: Bai
given: Zhaojun
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 757-766
id: jiang17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 757
lastpage: 766
published: 2017-04-10 00:00:00 +0000
- title: 'Information-theoretic limits of Bayesian network structure learning'
abstract: 'In this paper, we study the information-theoretic limits of learning the structure of Bayesian networks (BNs), on discrete as well as continuous random variables, from a finite number of samples. We show that the minimum number of samples required by any procedure to recover the correct structure grows as $Ω(m)$ and $Ω(k \log m + (k^2)/m)$ for non-sparse and sparse BNs respectively, where m is the number of variables and k is the maximum number of parents per node. We provide a simple recipe, based on an extension of the Fano’s inequality, to obtain information-theoretic limits of structure recovery for any exponential family BN. We instantiate our result for specific conditional distributions in the exponential family to characterize the fundamental limits of learning various commonly used BNs, such as conditional probability table based networks, Gaussian BNs, noisy-OR networks, and logistic regression networks. En route to obtaining our main results, we obtain tight bounds on the number of sparse and non-sparse essential-DAGs. Finally, as a byproduct, we recover the information-theoretic limits of sparse variable selection for logistic regression.'
volume: 54
URL: http://proceedings.mlr.press/v54/ghoshal17a.html
PDF: http://proceedings.mlr.press/v54/ghoshal17a/ghoshal17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-ghoshal17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ghoshal
given: Asish
- family: Honorio
given: Jean
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 767-775
id: ghoshal17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 767
lastpage: 775
published: 2017-04-10 00:00:00 +0000
- title: 'Markov Chain Truncation for Doubly-Intractable Inference'
abstract: 'Computing partition functions, the normalizing constants of probability distributions, is often hard. Variants of importance sampling give unbiased estimates of a normalizer Z, however, unbiased estimates of the reciprocal 1/Z are harder to obtain. Unbiased estimates of 1/Z allow Markov chain Monte Carlo sampling of “doubly-intractable” distributions, such as the parameter posterior for Markov Random Fields or Exponential Random Graphs. We demonstrate how to construct unbiased estimates for 1/Z given access to black-box importance sampling estimators for Z. We adapt recent work on random series truncation and Markov chain coupling, producing estimators with lower variance and a higher percentage of positive estimates than before. Our debiasing algorithms are simple to implement, and have some theoretical and empirical advantages over existing methods.'
volume: 54
URL: http://proceedings.mlr.press/v54/wei17a.html
PDF: http://proceedings.mlr.press/v54/wei17a/wei17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-wei17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wei
given: Colin
- family: Murray
given: Iain
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 776-784
id: wei17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 776
lastpage: 784
published: 2017-04-10 00:00:00 +0000
- title: 'Regression Uncertainty on the Grassmannian'
abstract: 'Trends in longitudinal or cross-sectional studies over time are often captured through regression models. In their simplest manifestation, these regression models are formulated in $R^n$. However, in the context of imaging studies, the objects of interest which are to be regressed are frequently best modeled as elements of a Riemannian manifold. Regression on such spaces can be accomplished through geodesic regression. This paper develops an approach to compute confidence intervals for geodesic regression models. The approach is general, but illustrated and specifically developed for the Grassmann manifold, which allows us, e.g., to regress shapes or linear dynamical systems. Extensions to other manifolds can be obtained in a similar manner. We demonstrate our approach for regression with 2D/3D shapes using synthetic and real data.'
volume: 54
URL: http://proceedings.mlr.press/v54/hong17b.html
PDF: http://proceedings.mlr.press/v54/hong17b/hong17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-hong17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hong
given: Yi
- family: Yang
given: Xiao
- family: Kwitt
given: Roland
- family: Styner
given: Martin
- family: Niethammer
given: Marc
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 785-793
id: hong17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 785
lastpage: 793
published: 2017-04-10 00:00:00 +0000
- title: 'Attributing Hacks'
abstract: 'In this paper, we describe an algorithm for estimating the provenance of hacks on websites. That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimating the evolution of these vulnerabilities over time. Specifically, we use hazard regression with a time-varying additive hazard function parameterized in a generalized linear form. The activation coefficients on each feature are continuous-time functions over time. We formulate the problem of learning these functions as a constrained variational maximum likelihood estimation problem with total variation penalty and show that the optimal solution is a 0th order spline (a piecewise constant function) with a finite number of adaptively chosen knots. This allows the inference problem to be solved efficiently and at scale by solving a finite dimensional optimization problem. Extensive experiments on real data sets show that our method significantly outperforms Cox’s proportional hazard model. We also conduct case studies and verify that the fitted functions are indeed recovering vulnerable features.'
volume: 54
URL: http://proceedings.mlr.press/v54/liu17a.html
PDF: http://proceedings.mlr.press/v54/liu17a/liu17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-liu17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Liu
given: Ziqi
- family: Smola
given: Alex
- family: Soska
given: Kyle
- family: Wang
given: Yu-Xiang
- family: Zheng
given: Qinghua
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 794-802
id: liu17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 794
lastpage: 802
published: 2017-04-10 00:00:00 +0000
- title: 'Unsupervised Sequential Sensor Acquisition'
abstract: 'In many security and healthcare systems a sequence of sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to learn strategies for selecting tests to optimize accuracy and costs. Unfortunately it is often impossible to acquire in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an unusual reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well.'
volume: 54
URL: http://proceedings.mlr.press/v54/hanawal17a.html
PDF: http://proceedings.mlr.press/v54/hanawal17a/hanawal17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-hanawal17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hanawal
given: Manjesh
- family: Szepesvari
given: Csaba
- family: Saligrama
given: Venkatesh
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 803-811
id: hanawal17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 803
lastpage: 811
published: 2017-04-10 00:00:00 +0000
- title: 'A Stochastic Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization'
abstract: 'Symmetric nonnegative matrix factorization (SymNMF) plays an important role in applications of many data analytics problems such as community detection, document clustering and image segmentation. In this paper, we consider a stochastic SymNMF problem in which the observation matrix is generated in a random and sequential manner. We propose a stochastic nonconvex splitting method, which not only guarantees convergence to the set of stationary points of the problem (in the mean-square sense), but further achieves a sublinear convergence rate. Numerical results show that for clustering problems over both synthetic and real world datasets, the proposed algorithm converges quickly to the set of stationary points.'
volume: 54
URL: http://proceedings.mlr.press/v54/lu17a.html
PDF: http://proceedings.mlr.press/v54/lu17a/lu17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-lu17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lu
given: Songtao
- family: Hong
given: Mingyi
- family: Wang
given: Zhengdao
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 812-821
id: lu17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 812
lastpage: 821
published: 2017-04-10 00:00:00 +0000
- title: 'Hierarchically-partitioned Gaussian Process Approximation'
abstract: 'The Gaussian process (GP) is a simple yet powerful probabilistic framework for various machine learning tasks. However, exact algorithms for learning and prediction are prohibitive to be applied to large datasets due to inherent computational complexity. To overcome this main limitation, various techniques have been proposed, and in particular, local GP algorithms that scales “truly linearly” with respect to the dataset size. In this paper, we introduce a hierarchical model based on local GP for large-scale datasets, which stacks inducing points over inducing points in layers. By using different kernels in each layer, the overall model becomes multi-scale and is able to capture both long- and short-range dependencies. We demonstrate the effectiveness of our model by speed-accuracy performance on challenging real-world datasets.'
volume: 54
URL: http://proceedings.mlr.press/v54/lee17a.html
PDF: http://proceedings.mlr.press/v54/lee17a/lee17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-lee17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lee
given: Byung-Jun
- family: Lee
given: Jongmin
- family: Kim
given: Kee-Eung
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 822-831
id: lee17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 822
lastpage: 831
published: 2017-04-10 00:00:00 +0000
- title: 'Scalable Learning of Non-Decomposable Objectives'
abstract: 'Modern retrieval systems are often driven by an underlying machine learning model. The goal of such systems is to identify and possibly rank the few most relevant items for a given query or context. Thus, such systems are typically evaluated using a ranking-based performance metric such as the area under the precision-recall curve, the F score, precision at fixed recall, etc. Obviously, it is desirable to train such systems to optimize the metric of interest. In practice, due to the scalability limitations of existing approaches for optimizing such objectives, large-scale retrieval systems are instead trained to maximize classification accuracy, in the hope that performance as measured via the true objective will also be favorable. In this work we present a unified framework that, using straightforward building block bounds, allows for highly scalable optimization of a wide range of ranking-based objectives. We demonstrate the advantage of our approach on several real-life retrieval problems that are significantly larger than those considered in the literature, while achieving substantial improvement in performance over the accuracy-objective baseline. '
volume: 54
URL: http://proceedings.mlr.press/v54/eban17a.html
PDF: http://proceedings.mlr.press/v54/eban17a/eban17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-eban17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Eban
given: Elad
- family: Schain
given: Mariano
- family: Mackey
given: Alan
- family: Gordon
given: Ariel
- family: Rifkin
given: Ryan
- family: Elidan
given: Gal
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 832-840
id: eban17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 832
lastpage: 840
published: 2017-04-10 00:00:00 +0000
- title: 'CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC'
abstract: 'In recent years, stochastic gradient Markov Chain Monte Carlo (SG-MCMC) methods have been raised to process large-scale dataset by iterative learning from small minibatches. However, the high variance caused by naive subsampling usually slows down the convergence to the desired posterior distribution. In this paper, we propose an effective subsampling strategy to reduce the variance based on a failed attempt to do importance sampling. In particular, before sampling, we partition the dataset with k-means clustering algorithm in a preprocessing step and use the fixed clustering throughout the entire MCMC simulation. Then during simulation, we approximate the gradient of log-posterior via summing the estimated gradient of each cluster. The resulting procedure is surprisingly simple without enhancing the complexity of the original algorithm during sampling procedure. We apply our Clustering-based Preprocessing strategy on stochastic gradient Langevin dynamics, stochastic gradient Hamilton Monte Carlo and stochastic gradient Riemann Langevin dynamics. Empirically, we provide thorough numerical results to back up the effectiveness and efficiency of our approach.'
volume: 54
URL: http://proceedings.mlr.press/v54/fu17a.html
PDF: http://proceedings.mlr.press/v54/fu17a/fu17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-fu17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Fu
given: Tianfan
- family: Zhang
given: Zhihua
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 841-850
id: fu17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 841
lastpage: 850
published: 2017-04-10 00:00:00 +0000
- title: 'Comparison-Based Nearest Neighbor Search'
abstract: 'We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points i and j is smaller than the distance between the points i and k. We are concerned with data structures and algorithms to find nearest neighbors based on such comparisons. We focus on a simple yet effective algorithm that recursively splits the space by first selecting two random pivot points and then assigning all other points to the closer of the two (comparison tree). We prove that if the metric space satisfies certain expansion conditions, then with high probability the height of the comparison tree is logarithmic in the number of points, leading to efficient search performance. We also provide an upper bound for the failure probability to return the true nearest neighbor. Experiments show that the comparison tree is competitive with algorithms that have access to the actual distance values, and needs less triplet comparisons than other competitors.'
volume: 54
URL: http://proceedings.mlr.press/v54/haghiri17a.html
PDF: http://proceedings.mlr.press/v54/haghiri17a/haghiri17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-haghiri17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Haghiri
given: Siavash
- family: Ghoshdastidar
given: Debarghya
- family: Luxburg
given: Ulrike von
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 851-859
id: haghiri17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 851
lastpage: 859
published: 2017-04-10 00:00:00 +0000
- title: 'A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe'
abstract: 'Two of the most fundamental prototypes of greedy optimization are the matching pursuit and Frank-Wolfe algorithms. In this paper we take a unified view on both classes of methods, leading to the first explicit convergence rates of matching pursuit methods in an optimization sense, for general sets of atoms. We derive sublinear (1/t) convergence for both classes on general smooth objectives, and linear convergence on strongly convex objectives, as well as a clear correspondence of algorithm variants. Our presented algorithms and rates are affine invariant, and do not need any incoherence or sparsity assumptions.'
volume: 54
URL: http://proceedings.mlr.press/v54/locatello17a.html
PDF: http://proceedings.mlr.press/v54/locatello17a/locatello17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-locatello17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Locatello
given: Francesco
- family: Khanna
given: Rajiv
- family: Tschannen
given: Michael
- family: Jaggi
given: Martin
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 860-868
id: locatello17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 860
lastpage: 868
published: 2017-04-10 00:00:00 +0000
- title: 'Faster Coordinate Descent via Adaptive Importance Sampling'
abstract: 'Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean that our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically characterize the performance of our selection rules and demonstrate improvements over the state-of-the-art, and extend our theory and algorithms to general convex objectives. Numerical evidence with hinge-loss support vector machines and Lasso confirm that the practice follows the theory. '
volume: 54
URL: http://proceedings.mlr.press/v54/perekrestenko17a.html
PDF: http://proceedings.mlr.press/v54/perekrestenko17a/perekrestenko17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-perekrestenko17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Perekrestenko
given: Dmytro
- family: Cevher
given: Volkan
- family: Jaggi
given: Martin
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 869-877
id: perekrestenko17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 869
lastpage: 877
published: 2017-04-10 00:00:00 +0000
- title: 'Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models'
abstract: 'Variational inference is computationally challenging in models that contain both conjugate and non-conjugate terms. Methods specifically designed for conjugate models, even though computationally efficient, find it difficult to deal with non-conjugate terms. On the other hand, stochastic-gradient methods can handle the non-conjugate terms but they usually ignore the conjugate structure of the model which might result in slow convergence. In this paper, we propose a new algorithm called Conjugate-computation Variational Inference (CVI) which brings the best of the two worlds together – it uses conjugate computations for the conjugate terms and employs stochastic gradients for the rest. We derive this algorithm by using a stochastic mirror-descent method in the mean-parameter space, and then expressing each gradient step as a variational inference in a conjugate model. We demonstrate our algorithm’s applicability to a large class of models and establish its convergence. Our experimental results show that our method converges much faster than the methods that ignore the conjugate structure of the model. '
volume: 54
URL: http://proceedings.mlr.press/v54/khan17a.html
PDF: http://proceedings.mlr.press/v54/khan17a/khan17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-khan17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Khan
given: Mohammad
- family: Lin
given: Wu
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 878-887
id: khan17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 878
lastpage: 887
published: 2017-04-10 00:00:00 +0000
- title: 'Hit-and-Run for Sampling and Planning in Non-Convex Spaces'
abstract: 'We propose the Hit-and-Run algorithm for planning and sampling problems in non- convex spaces. For sampling, we show the first analysis of the Hit-and-Run algorithm in non-convex spaces and show that it mixes fast as long as certain smoothness conditions are satisfied. In particular, our analysis reveals an intriguing connection between fast mixing and the existence of smooth measure-preserving mappings from a convex space to the non-convex space. For planning, we show advantages of Hit-and- Run compared to state-of-the-art planning methods such as Rapidly-Exploring Random Trees.'
volume: 54
URL: http://proceedings.mlr.press/v54/abbasi-yadkori17a.html
PDF: http://proceedings.mlr.press/v54/abbasi-yadkori17a/abbasi-yadkori17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-abbasi-yadkori17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Abbasi-Yadkori
given: Yasin
- family: Bartlett
given: Peter
- family: Gabillon
given: Victor
- family: Malek
given: Alan
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 888-895
id: abbasi-yadkori17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 888
lastpage: 895
published: 2017-04-10 00:00:00 +0000
- title: 'DP-EM: Differentially Private Expectation Maximization'
abstract: 'The iterative nature of the expectation maximization (EM) algorithm presents a challenge for privacy-preserving estimation, as each iteration increases the amount of noise needed. We propose a practical private EM algorithm that overcomes this challenge using two innovations: (1) a novel moment perturbation formulation for differentially private EM (DP-EM), and (2) the use of two recently developed composition methods to bound the privacy “cost” of multiple EM iterations: the moments accountant (MA) and zero-mean concentrated differential privacy (zCDP). Both MA and zCDP bound the moment generating function of the privacy loss random variable and achieve a refined tail bound, which effectively decrease the amount of additive noise. We present empirical results showing the benefits of our approach, as well as similar performance between these two composition methods in the DP-EM setting for Gaussian mixture models. Our approach can be readily extended to many iterative learning algorithms, opening up various exciting future directions. '
volume: 54
URL: http://proceedings.mlr.press/v54/park17c.html
PDF: http://proceedings.mlr.press/v54/park17c/park17c.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-park17c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Park
given: Mijung
- family: Foulds
given: James
- family: Choudhary
given: Kamalika
- family: Welling
given: Max
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 896-904
id: park17c
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 896
lastpage: 904
published: 2017-04-10 00:00:00 +0000
- title: 'On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior'
abstract: 'The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but as shown in this paper, the results can be sensitive to the prior choice for the global shrinkage hyperparameter. We argue that the previous default choices are dubious due to their tendency to favor solutions with more unshrunk coefficients than we typically expect a priori. This can lead to bad results if this parameter is not strongly identified by data. We derive the relationship between the global parameter and the effective number of nonzeros in the coefficient vector, and show an easy and intuitive way of setting up the prior for the global parameter based on our prior beliefs about the number of nonzero coefficients in the model. The results on real world data show that one can benefit greatly – in terms of improved parameter estimates, prediction accuracy, and reduced computation time – from transforming even a crude guess for the number of nonzero coefficients into the prior for the global parameter using our framework.'
volume: 54
URL: http://proceedings.mlr.press/v54/piironen17a.html
PDF: http://proceedings.mlr.press/v54/piironen17a/piironen17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-piironen17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Piironen
given: Juho
- family: Vehtari
given: Aki
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 905-913
id: piironen17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 905
lastpage: 913
published: 2017-04-10 00:00:00 +0000
- title: 'Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems'
abstract: 'Many natural systems, such as neurons firing in the brain or basketball teams traversing a court, give rise to time series data with complex, nonlinear dynamics. We can gain insight into these systems by decomposing the data into segments that are each explained by simpler dynamic units. Building on switching linear dynamical systems (SLDS), we develop a model class and Bayesian inference algorithms that not only discover these dynamical units but also, by learning how transition probabilities depend on observations or continuous latent states, explain their switching behavior. Our key innovation is to design these recurrent SLDS models to enable recent Pólya-gamma auxiliary variable techniques and thus make approximate Bayesian learning and inference in these models easy, fast, and scalable.'
volume: 54
URL: http://proceedings.mlr.press/v54/linderman17a.html
PDF: http://proceedings.mlr.press/v54/linderman17a/linderman17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-linderman17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Linderman
given: Scott
- family: Johnson
given: Matthew
- family: Miller
given: Andrew
- family: Adams
given: Ryan
- family: Blei
given: David
- family: Paninski
given: Liam
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 914-922
id: linderman17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 914
lastpage: 922
published: 2017-04-10 00:00:00 +0000
- title: 'Efficient Algorithm for Sparse Tensor-variate Gaussian Graphical Models via Gradient Descent'
abstract: 'We study the sparse tensor-variate Gaussian graphical model (STGGM), where each way of the tensor follows a multivariate normal distribution whose precision matrix has sparse structures. In order to estimate the precision matrices, we propose a sparsity constrained maximum likelihood estimator. However, due to the complex structure of the tensor-variate GGMs, the likelihood based estimator is non-convex, which poses great challenges for both computation and theoretical analysis. In order to address these challenges, we propose an efficient alternating gradient descent algorithm to solve this estimator, and prove that, under certain conditions on the initial estimator, our algorithm is guaranteed to linearly converge to the unknown precision matrices up to the optimal statistical error. Experiments on both synthetic data and real world brain imaging data corroborate our theory.'
volume: 54
URL: http://proceedings.mlr.press/v54/xu17b.html
PDF: http://proceedings.mlr.press/v54/xu17b/xu17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-xu17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Xu
given: Pan
- family: Zhang
given: Tingting
- family: Gu
given: Quanquan
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 923-932
id: xu17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 923
lastpage: 932
published: 2017-04-10 00:00:00 +0000
- title: 'Minimax-optimal semi-supervised regression on unknown manifolds'
abstract: 'We consider semi-supervised regression when the predictor variables are drawn from an unknown manifold. A simple two step approach to this problem is to: (i) estimate the manifold geodesic distance between any pair of points using both the labeled and unlabeled instances; and (ii) apply a k nearest neighbor regressor based on these distance estimates. We prove that given sufficiently many unlabeled points, this simple method of geodesic kNN regression achieves the optimal finite-sample minimax bound on the mean squared error, as if the manifold were known. Furthermore, we show how this approach can be efficiently implemented, requiring only O(k N log N) operations to estimate the regression function at all N labeled and unlabeled points. We illustrate this approach on two datasets with a manifold structure: indoor localization using WiFi fingerprints and facial pose estimation. In both cases, geodesic kNN is more accurate and much faster than the popular Laplacian eigenvector regressor.'
volume: 54
URL: http://proceedings.mlr.press/v54/moscovich17a.html
PDF: http://proceedings.mlr.press/v54/moscovich17a/moscovich17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-moscovich17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Moscovich
given: Amit
- family: Jaffe
given: Ariel
- family: Boaz
given: Nadler
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 933-942
id: moscovich17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 933
lastpage: 942
published: 2017-04-10 00:00:00 +0000
- title: 'Improved Strongly Adaptive Online Learning using Coin Betting'
abstract: 'This paper describes a new parameter-free online learning algorithm for changing environments. In comparing against algorithms with the same time complexity as ours, we obtain a strongly adaptive regret bound that is a factor of at least $\sqrt\log(T)$ better, where $T$ is the time horizon. Empirical results show that our algorithm outperforms state-of-the-art methods in learning with expert advice and metric learning scenarios. '
volume: 54
URL: http://proceedings.mlr.press/v54/jun17a.html
PDF: http://proceedings.mlr.press/v54/jun17a/jun17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-jun17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jun
given: Kwang-Sung
- family: Orabona
given: Francesco
- family: Wright
given: Stephen
- family: Willett
given: Rebecca
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 943-951
id: jun17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 943
lastpage: 951
published: 2017-04-10 00:00:00 +0000
- title: 'Black-box Importance Sampling'
abstract: 'Importance sampling is widely used in machine learning and statistics, but its power is limited by the restriction of using simple proposals for which the importance weights can be tractably calculated. We address this problem by studying black-box importance sampling methods that calculate importance weights for samples generated from any unknown proposal or black-box mechanism. Our method allows us to use better and richer proposals to solve difficult problems, and (somewhat counter-intuitively) also has the additional benefit of improving the estimation accuracy beyond typical importance sampling. Both theoretical and empirical analyses are provided. '
volume: 54
URL: http://proceedings.mlr.press/v54/liu17b.html
PDF: http://proceedings.mlr.press/v54/liu17b/liu17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-liu17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Liu
given: Qiang
- family: Lee
given: Jason
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 952-961
id: liu17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 952
lastpage: 961
published: 2017-04-10 00:00:00 +0000
- title: 'Fairness Constraints: Mechanisms for Fair Classification'
abstract: 'Algorithmic decision making systems are ubiquitous across a wide variety of online as well as offline services. These systems rely on complex learning methods and vast amounts of data to optimize the service functionality, satisfaction of the end user and profitability. However, there is a growing concern that these automated decisions can lead, even in the absence of intent, to a lack of fairness, i.e., their outcomes can disproportionately hurt (or, benefit) particular groups of people sharing one or more sensitive attributes (e.g., race, sex). In this paper, we introduce a flexible mechanism to design fair classifiers by leveraging a novel intuitive measure of decision boundary (un)fairness. We instantiate this mechanism with two well-known classifiers, logistic regression and support vector machines, and show on real-world data that our mechanism allows for a fine-grained control on the degree of fairness, often at a small cost in terms of accuracy.'
volume: 54
URL: http://proceedings.mlr.press/v54/zafar17a.html
PDF: http://proceedings.mlr.press/v54/zafar17a/zafar17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-zafar17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zafar
given: Muhammad Bilal
- family: Valera
given: Isabel
- family: Rogriguez
given: Manuel Gomez
- family: Gummadi
given: Krishna P.
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 962-970
id: zafar17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 962
lastpage: 970
published: 2017-04-10 00:00:00 +0000
- title: 'Frequency Domain Predictive Modelling with Aggregated Data'
abstract: 'Existing work in spatio-temporal data analysis invariably assumes data available as individual measurements with localised estimates. However, for many applications like econometrics, financial forecasting and climate science, data is often obtained as aggregates. Data aggregation presents severe mathematical challenges to learning and inference, and application of standard techniques is susceptible to ecological fallacy. In this manuscript we investigate the problem of predictive linear modelling in the scenario where data is aggregated in a non-uniform manner across targets and features. We introduce a novel formulation of the problem in the frequency domain, and develop algorithmic techniques that exploit the duality properties of Fourier analysis to bypass the inherent structural challenges of this setting. We provide theoretical guarantees for generalisation error for our estimation procedure and extend our analysis to capture approximation effects arising from aliasing. Finally, we perform empirical evaluation to demonstrate the efficacy of our algorithmic aproach in predictive modelling on synthetic data, and on three real datasets from agricultural studies, ecological surveys and climate science. approximation effects arising from aliasing. Finally, we perform empirical evaluation to demonstrate the efficacy of our algorithmic aproach in predictive modelling on synthetic data, and on three real datasets from agricultural studies, ecological surveys and climate science.'
volume: 54
URL: http://proceedings.mlr.press/v54/bhowmik17a.html
PDF: http://proceedings.mlr.press/v54/bhowmik17a/bhowmik17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-bhowmik17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bhowmik
given: Avradeep
- family: Ghosh
given: Joydeep
- family: Koyejo
given: Oluwasanmi
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 971-980
id: bhowmik17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 971
lastpage: 980
published: 2017-04-10 00:00:00 +0000
- title: 'A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation'
abstract: 'We propose a unified framework for estimating low-rank matrices through nonconvex optimization based on gradient descent algorithm. Our framework is quite general and can be applied to both noisy and noiseless observations. In the general case with noisy observations, we show that our algorithm is guaranteed to linearly converge to the unknown low-rank matrix up to a minimax optimal statistical error, provided an appropriate initial estimator. While in the generic noiseless setting, our algorithm converges to the unknown low-rank matrix at a linear rate and enables exact recovery with optimal sample complexity. In addition, we develop a new initialization algorithm to provide the desired initial estimator, which outperforms existing initialization algorithms for nonconvex low-rank matrix estimation. We illustrate the superiority of our framework through three examples: matrix regression, matrix completion, and one-bit matrix completion. We also corroborate our theory through extensive experiments on synthetic data.'
volume: 54
URL: http://proceedings.mlr.press/v54/wang17b.html
PDF: http://proceedings.mlr.press/v54/wang17b/wang17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-wang17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Lingxiao
- family: Zhang
given: Xiao
- family: Gu
given: Quanquan
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 981-990
id: wang17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 981
lastpage: 990
published: 2017-04-10 00:00:00 +0000
- title: 'A New Class of Private Chi-Square Hypothesis Tests'
abstract: 'In this paper, we develop new test statistics for hypothesis testing over differentially private data. These statistics are designed specifically so that their asymptotic distributions, after accounting for privacy noise, match the asymptotics of the non-private chi-square tests for testing if the multinomial data parameters lie in lower dimensional manifolds (examples include goodness-of-fit and independence testing). Empirically, these new test statistics outperform prior work, which focused on noisy versions of existing statistics.'
volume: 54
URL: http://proceedings.mlr.press/v54/rogers17a.html
PDF: http://proceedings.mlr.press/v54/rogers17a/rogers17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-rogers17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Rogers
given: Ryan
- family: Kifer
given: Daniel
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 991-1000
id: rogers17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 991
lastpage: 1000
published: 2017-04-10 00:00:00 +0000
- title: 'A Learning Theory of Ranking Aggregation'
abstract: 'Originally formulated in Social Choice theory, Ranking Aggregation, also referred to as Consensus Ranking, has motivated the development of numerous statistical models since the middle of the 20th century. Recently, the analysis of ranking/preference data has been the subject of a renewed interest in machine-learning, boosted by modern applications such as meta-search engines, giving rise to the design of various scalable algorithmic approaches for approximately computing ranking medians, viewed as solutions of a discrete (generally NP-hard) minimization problem. This paper develops a statistical learning theory for ranking aggregation in a general probabilistic setting (avoiding any rigid ranking model assumptions), assessing the generalization ability of empirical ranking medians. Universal rate bounds are established and the situations where convergence occurs at an exponential rate are fully characterized. Minimax lower bounds are also proved, showing that the rate bounds we obtain are optimal.'
volume: 54
URL: http://proceedings.mlr.press/v54/korba17a.html
PDF: http://proceedings.mlr.press/v54/korba17a/korba17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-korba17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Korba
given: Anna
- family: Clemencon
given: Stéphan
- family: Sibony
given: Eric
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1001-1010
id: korba17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1001
lastpage: 1010
published: 2017-04-10 00:00:00 +0000
- title: 'Anomaly Detection in Extreme Regions via Empirical MV-sets on the Sphere'
abstract: 'Extreme regions in the feature space are of particular concern for anomaly detection: anomalies are likely to be located in the tails, whereas data scarcity in such regions makes it difficult to distinguish between large normal instances and anomalies. This paper presents an unsupervised algorithm for anomaly detection in extreme regions. We propose a Minimum Volume set (MV-set) approach relying on multivariate extreme value theory. This framework includes a canonical pre-processing step, which addresses the issue of output sensitivity to standardization choices. The resulting data representation on the sphere highlights the dependence structure of the extremal observations. Anomaly detection is then cast as a MV-set estimation problem on the sphere, where volume is measured by the spherical measure and mass refers to the angular measure. An anomaly then corresponds to an unusual observation given that one of its variables is large. A preliminary rate bound analysis is carried out for the learning method we introduce and its computational advantages are discussed and illustrated by numerical experiments.'
volume: 54
URL: http://proceedings.mlr.press/v54/thomas17a.html
PDF: http://proceedings.mlr.press/v54/thomas17a/thomas17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-thomas17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Thomas
given: Albert
- family: Clemencon
given: Stéphan
- family: Gramfort
given: Alexandre
- family: Sabourin
given: Anne
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1011-1019
id: thomas17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1011
lastpage: 1019
published: 2017-04-10 00:00:00 +0000
- title: 'Structured adaptive and random spinners for fast machine learning computations'
abstract: 'We consider an efficient computational framework for speeding up several machine learning algorithms with almost no loss of accuracy. The proposed framework relies on projections via structured matrices that we call Structured Spinners, which are formed as products of three structured matrix-blocks that incorporate rotations. The approach is highly generic, i.e. i) structured matrices under consideration can either be fully-randomized or learned, ii) our structured family contains as special cases all previously considered structured schemes, iii) the setting extends to the non-linear case where the projections are followed by non-linear functions, and iv) the method finds numerous applications including kernel approximations via random feature maps, dimensionality reduction algorithms,new fast cross-polytope LSH techniques, deep learning, convex optimization algorithms via Newton sketches, quantization with random projection trees, and more. The proposed framework comes with theoretical guarantees characterizing the capacity of the structured model in reference to its unstructured counterpart and is based on a general theoretical principle that we describe in the paper. As a consequence of our theoretical analysis, we provide the first theoretical guarantees for one of the most efficient existing LSH algorithms based on the HD 3 HD 2 HD 1 structured matrix [Andoni et al., 2015]. The exhaustive experimental evaluation confirms the accuracy and efficiency of structured spinners for a variety of different applications.'
volume: 54
URL: http://proceedings.mlr.press/v54/bojarski17a.html
PDF: http://proceedings.mlr.press/v54/bojarski17a/bojarski17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-bojarski17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bojarski
given: Mariusz
- family: Choromanska
given: Anna
- family: Choromanski
given: Krzysztof
- family: Fagan
given: Francois
- family: Gouy-Pailler
given: Cedric
- family: Morvan
given: Anne
- family: Sakr
given: Nouri
- family: Sarlos
given: Tamas
- family: Atif
given: Jamal
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1020-1029
id: bojarski17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1020
lastpage: 1029
published: 2017-04-10 00:00:00 +0000
- title: 'Complementary Sum Sampling for Likelihood Approximation in Large Scale Classification'
abstract: 'We consider training probabilistic classifiers in the case that the number of classes is too large to perform exact normalisation over all classes. We show that the source of high variance in standard sampling approximations is due to simply not including the correct class of the datapoint into the approximation. To account for this we explicitly sum over a subset of classes and sample the remaining. We show that this simple approach is competitive with recently introduced non likelihood-based approximations. '
volume: 54
URL: http://proceedings.mlr.press/v54/botev17a.html
PDF: http://proceedings.mlr.press/v54/botev17a/botev17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-botev17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Botev
given: Aleksandar
- family: Zheng
given: Bowen
- family: Barber
given: David
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1030-1038
id: botev17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1030
lastpage: 1038
published: 2017-04-10 00:00:00 +0000
- title: 'Learning Optimal Interventions'
abstract: 'Our goal is to identify beneficial interventions from observational data. We consider interventions that are narrowly focused (impacting few covariates) and may be tailored to each individual or globally enacted over a population. For applications where harmful intervention is drastically worse than proposing no change, we propose a conservative definition of the optimal intervention. Assuming the underlying relationship remains invariant under intervention, we develop efficient algorithms to identify the optimal intervention policy from limited data and provide theoretical guarantees for our approach in a Gaussian Process setting. Although our methods assume covariates can be precisely adjusted, they remain capable of improving outcomes in misspecified settings with unintentional downstream effects. Empirically, our approach identifies good interventions in two practical applications: gene perturbation and writing improvement. '
volume: 54
URL: http://proceedings.mlr.press/v54/mueller17a.html
PDF: http://proceedings.mlr.press/v54/mueller17a/mueller17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-mueller17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Mueller
given: Jonas
- family: Reshef
given: David
- family: Du
given: George
- family: Jaakkola
given: Tommi
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1039-1047
id: mueller17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1039
lastpage: 1047
published: 2017-04-10 00:00:00 +0000
- title: 'A Lower Bound on the Partition Function of Attractive Graphical Models in the Continuous Case'
abstract: 'Computing the partition function of an arbitrary graphical model is generally intractable. As a result, approximate inference techniques such as loopy belief propagation and expectation propagation are used to compute an approximation to the true partition function. However, due to general issues of intractability in the continuous case, our understanding of these approximations is relatively limited. In particular, a number of theoretical results known for these approximations in the discrete case are missing in the continuous case. In this work, we use graph covers to extend several such results from the discrete case to the continuous case. Specifically, we provide a graph cover based upper bound for continuous graphical models, and we use this characterization (along with a continuous analog of a discrete correlation-type inequality) to show that the Bethe partition function also provides a lower bound on the true partition function of attractive graphical models in the continuous case.'
volume: 54
URL: http://proceedings.mlr.press/v54/ruozzi17a.html
PDF: http://proceedings.mlr.press/v54/ruozzi17a/ruozzi17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-ruozzi17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ruozzi
given: Nicholas
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1048-1056
id: ruozzi17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1048
lastpage: 1056
published: 2017-04-10 00:00:00 +0000
- title: 'Scalable Variational Inference for Super Resolution Microscopy'
abstract: 'Super-resolution microscopy methods have become essential tools in biology, opening up a variety of new questions that were previously inaccessible with standard light microscopy methods. In this paper we develop new Bayesian image processing methods that extend the reach of super-resolution microscopy even further. Our method couples variational inference techniques with a data summarization based on Laplace approximation to ensure computational scalability. Our formulation makes it straightforward to incorporate prior information about the underlying sample to further improve accuracy. The proposed method obtains dramatic resolution improvements over previous methods while retaining computational tractability.'
volume: 54
URL: http://proceedings.mlr.press/v54/sun17a.html
PDF: http://proceedings.mlr.press/v54/sun17a/sun17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-sun17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sun
given: Ruoxi
- family: Archer
given: Evan
- family: Paninski
given: Liam
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1057-1065
id: sun17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1057
lastpage: 1065
published: 2017-04-10 00:00:00 +0000
- title: 'Linear Convergence of Stochastic Frank Wolfe Variants'
abstract: 'In this paper, we show that the Away-step Stochastic Frank-Wolfe (ASFW) and Pairwise Stochastic Frank-Wolfe (PSFW) algorithms converge linearly in expectation. We also show that if an algorithm convergences linearly in expectation then it converges linearly almost surely. In order to prove these results, we develop a novel proof technique based on concepts of empirical processes and concentration inequalities. As far as we know, this technique has not been used previously to derive the convergence rates of stochastic optimization algorithms. In large- scale numerical experiments, ASFW and PSFW perform as well as or better than their stochastic competitors in actual CPU time.'
volume: 54
URL: http://proceedings.mlr.press/v54/goldfarb17a.html
PDF: http://proceedings.mlr.press/v54/goldfarb17a/goldfarb17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-goldfarb17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Goldfarb
given: Donald
- family: Iyengar
given: Garud
- family: Zhou
given: Chaoxu
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1066-1074
id: goldfarb17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1066
lastpage: 1074
published: 2017-04-10 00:00:00 +0000
- title: 'Sequential Graph Matching with Sequential Monte Carlo'
abstract: 'We develop a novel probabilistic model for graph matchings and develop practical inference methods for supervised and unsupervised learning of the parameters of this model. The framework we develop admits joint inference on the parameters and the matching. Furthermore, our framework generalizes naturally to $K$-partite hypergraph matchings or set packing problems. The sequential formulation of the graph matching process naturally leads to sequential Monte Carlo algorithms which can be combined with various parameter inference methods. We apply our method to image matching problems, document ranking, and our own novel quadripartite matching problem arising from the field of computational forestry.'
volume: 54
URL: http://proceedings.mlr.press/v54/jun17b.html
PDF: http://proceedings.mlr.press/v54/jun17b/jun17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-jun17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jun
given: Seong-Hwan
- family: Wong
given: Samuel W.K.
- family: Zidek
given: James
- family: Bouchard-Cote
given: Alexandre
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1075-1084
id: jun17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1075
lastpage: 1084
published: 2017-04-10 00:00:00 +0000
- title: 'Fast rates with high probability in exp-concave statistical learning'
abstract: 'We present an algorithm for the statistical learning setting with a bounded exp-concave loss in d dimensions that obtains excess risk $O(d \log(1/δ)/n)$ with probability $1 - δ$. The core technique is to boost the confidence of recent in-expectation O(d/n) excess risk bounds for empirical risk minimization (ERM), without sacrificing the rate, by leveraging a Bernstein condition which holds due to exp-concavity. We also show that a regret bound for any online learner in this setting translates to a high probability excess risk bound for the corresponding online-to-batch conversion of the online learner. Lastly, we present high probability bounds for the exp-concave model selection aggregation problem that are quantile-adaptive in a certain sense. One bound obtains a nearly optimal rate without requiring the loss to be Lipschitz continuous, and another requires Lipschitz continuity but obtains the optimal rate.'
volume: 54
URL: http://proceedings.mlr.press/v54/mehta17a.html
PDF: http://proceedings.mlr.press/v54/mehta17a/mehta17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-mehta17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Mehta
given: Nishant
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1085-1093
id: mehta17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1085
lastpage: 1093
published: 2017-04-10 00:00:00 +0000
- title: 'Generalization Error of Invariant Classifiers'
abstract: 'This paper studies the generalization error of invariant classifiers. In particular, we consider the common scenario where the classification task is invariant to certain transformations of the input, and that the classifier is constructed (or learned) to be invariant to these transformations. Our approach relies on factoring the input space into a product of a base space and a set of transformations. We show that whereas the generalization error of a non-invariant classifier is proportional to the complexity of the input space, the generalization error of an invariant classifier is proportional to the complexity of the base space. We also derive a set of sufficient conditions on the geometry of the base space and the set of transformations that ensure that the complexity of the base space is much smaller than the complexity of the input space. Our analysis applies to general classifiers such as convolutional neural networks. We demonstrate the implications of the developed theory for such classifiers with experiments on the MNIST and CIFAR-10 datasets.'
volume: 54
URL: http://proceedings.mlr.press/v54/sokolic17a.html
PDF: http://proceedings.mlr.press/v54/sokolic17a/sokolic17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-sokolic17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sokolic
given: Jure
- family: Giryes
given: Raja
- family: Sapiro
given: Guillermo
- family: Rodrigues
given: Miguel
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1094-1103
id: sokolic17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1094
lastpage: 1103
published: 2017-04-10 00:00:00 +0000
- title: 'Learning with Feature Feedback: from Theory to Practice'
abstract: 'In supervised learning, a human annotator only needs to assign each data point (document, image, etc.) its correct label. But in many situations, the human can also provide richer feedback at essentially no extra cost. In this paper, we examine a particular type of feature feedback that has been used, with some success, in information retrieval and in computer vision. We formalize two models of feature feedback, give learning algorithms for them, and quantify their usefulness in the learning process. Our experiments also show the efficacy of these methods.'
volume: 54
URL: http://proceedings.mlr.press/v54/poulis17a.html
PDF: http://proceedings.mlr.press/v54/poulis17a/poulis17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-poulis17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Poulis
given: Stefanos
- family: Dasgupta
given: Sanjoy
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1104-1113
id: poulis17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1104
lastpage: 1113
published: 2017-04-10 00:00:00 +0000
- title: 'Optimistic Planning for the Stochastic Knapsack Problem'
abstract: 'The stochastic knapsack problem is a stochastic resource allocation problem that arises frequently and yet is exceptionally hard to solve. We derive and study an optimistic planning algorithm specifically designed for the stochastic knapsack problem. Unlike other optimistic planning algorithms for MDPs, our algorithm, OpStoK, avoids the use of discounting and is adaptive to the amount of resources available. We achieve this behavior by means of a concentration inequality that simultaneously applies to capacity and reward estimates. Crucially, we are able to guarantee that the aforementioned confidence regions hold collectively over all time steps by an application of Doob’s inequality. We demonstrate that the method returns an $ε$-optimal solution to the stochastic knapsack problem with high probability. To the best of our knowledge, our algorithm is the first which provides such guarantees for the stochastic knapsack problem. Furthermore, our algorithm is an anytime algorithm and will return a good solution even if stopped prematurely. This is particularly important given the difficulty of the problem. We also provide theoretical conditions to guarantee OpStoK does not expand all policies and demonstrate favorable performance in a simple experimental setting.'
volume: 54
URL: http://proceedings.mlr.press/v54/pike-burke17a.html
PDF: http://proceedings.mlr.press/v54/pike-burke17a/pike-burke17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-pike-burke17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Pike-Burke
given: Ciara
- family: Grunewalder
given: Steffen
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1114-1122
id: pike-burke17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1114
lastpage: 1122
published: 2017-04-10 00:00:00 +0000
- title: 'Identifying Groups of Strongly Correlated Variables through Smoothed Ordered Weighted $L_1$-norms'
abstract: 'The failure of LASSO to identify groups of correlated predictors in linear regression has sparked significant research interest. Recently, various norms were proposed, which can be best described as instances of ordered weighted $\ell_1$ norms (OWL), as an alternative to $\ell_1$ regularization used in LASSO. OWL can identify groups of correlated variables but it forces the model to be constant within a group. This artifact induces unnecessary bias in the model estimation. In this paper we take a submodular perspective and show that OWL can be posed as the Lovász extension of a suitably defined submodular function. The submodular perspective not only explains the group-wise constant behavior of OWL, but also suggests alternatives. The main contribution of this paper is smoothed OWL (SOWL), a new family of norms, which not only identifies the groups but also allows the model to be flexible inside a group. We establish several algorithmic and theoretical properties of SOWL including group identification and model consistency. We also provide algorithmic tools to compute the SOWL norm and its proximal operator, whose computational complexity $O(d\log d)$ is significantly better than that of general purpose solvers in $O(d^2\log d)$. In our experiments, SOWL compares favorably with respect to OWL in the regimes of interest.'
volume: 54
URL: http://proceedings.mlr.press/v54/sankaran17a.html
PDF: http://proceedings.mlr.press/v54/sankaran17a/sankaran17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-sankaran17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sankaran
given: Raman
- family: Bach
given: Francis
- family: Bhattacharya
given: Chiranjib
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1123-1131
id: sankaran17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1123
lastpage: 1131
published: 2017-04-10 00:00:00 +0000
- title: 'Tracking Objects with Higher Order Interactions via Delayed Column Generation'
abstract: 'We study the problem of multi-target tracking and data association in video. We formulate this in terms of selecting a subset of high-quality tracks subject to the constraint that no pair of selected tracks is associated with a common detection (of an object). This objective is equivalent to the classic NP-hard problem of finding a maximum-weight set packing (MWSP) where tracks correspond to sets and is made further difficult since the number of candidate tracks grows exponentially in the number of detections. We present a relaxation of this combinatorial problem that uses a column generation formulation where the pricing problem is solved via dynamic programming to efficiently explore the space of tracks. We employ row generation to tighten the bound in such a way as to preserve efficient inference in the pricing problem. We show the practical utility of this algorithm for pedestrian and particle tracking.'
volume: 54
URL: http://proceedings.mlr.press/v54/wang17c.html
PDF: http://proceedings.mlr.press/v54/wang17c/wang17c.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-wang17c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Shaofei
- family: Wolf
given: Steffen
- family: Fowlkes
given: Charless
- family: Yarkony
given: Julian
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1132-1140
id: wang17c
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1132
lastpage: 1140
published: 2017-04-10 00:00:00 +0000
- title: 'Belief Propagation in Conditional RBMs for Structured Prediction'
abstract: 'Restricted Boltzmann machines (RBMs) and conditional RBMs (CRBMs) are popular models for a wide range of applications. In previous work, learning on such models has been dominated by contrastive divergence (CD) and its variants. Belief propagation (BP) algorithms are believed to be slow for structured prediction on conditional RBMs (e.g., Mnih et al. [2011]), and not as good as CD when applied in learning (e.g., Larochelle et al. [2012]). In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units. We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems. We also include practical guidelines on training CRBMs with BP, and some insights on the interaction of learning and inference algorithms for CRBMs.'
volume: 54
URL: http://proceedings.mlr.press/v54/ping17a.html
PDF: http://proceedings.mlr.press/v54/ping17a/ping17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-ping17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ping
given: Wei
- family: Ihler
given: Alex
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1141-1149
id: ping17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1141
lastpage: 1149
published: 2017-04-10 00:00:00 +0000
- title: 'Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data'
abstract: 'Sketching techniques scale up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, without sacrificing their statistical properties. In this paper, we study sketching from an optimization point of view. We first show that the iterative Hessian sketch is an optimization process with \emphpreconditioning and develop an \emphaccelerated version using this insight together with conjugate gradient descent. Next, we establish a primal-dual connection between the Hessian sketch and dual random projection, which allows us to develop an \emphaccelerated iterative dual random projection method by applying the preconditioned conjugate gradient descent on the dual problem. Finally, we tackle the problems of large sample size and high-dimensionality in massive data sets by developing the \emphprimal-dual sketch. The primal-dual sketch iteratively sketches the primal and dual formulations and requires only a logarithmic number of calls to solvers of small sub-problems to recover the optimum of the original problem up to \empharbitrary precision. Extensive experiments on synthetic and real data sets complement our theoretical results.'
volume: 54
URL: http://proceedings.mlr.press/v54/wang17d.html
PDF: http://proceedings.mlr.press/v54/wang17d/wang17d.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-wang17d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Jialei
- family: Lee
given: Jason
- family: Mahdavi
given: Mehrdad
- family: Kolar
given: Mladen
- family: Srebro
given: Nati
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1150-1158
id: wang17d
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1150
lastpage: 1158
published: 2017-04-10 00:00:00 +0000
- title: 'Finite-sum Composition Optimization via Variance Reduced Gradient Descent'
abstract: 'The stochastic composition optimization proposed recently by Wang et al. [2014] minimizes the objective with the composite expectation form: $\min_x (\mathbbE_iF_i ∘\mathbbE_j G_j)(x).$ It summarizes many important applications in machine learning, statistics, and finance. In this paper, we consider the finite-sum scenario for composition optimization: $\min_x f (x) := \frac1n \sum_i = 1^n F_i \left( \frac1m \sum_j = 1^m G_j (x) \right)$. In this paper, two algorithms are proposed to solve this problem by combining the stochastic compositional gradient descent (SCGD) and the stochastic variance reduced gradient (SVRG) technique. A constant linear convergence rate is proved for strongly convex optimization, which substantially improves the sublinear rate $O(K^-0.8)$ of the best known algorithm. '
volume: 54
URL: http://proceedings.mlr.press/v54/lian17a.html
PDF: http://proceedings.mlr.press/v54/lian17a/lian17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-lian17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lian
given: Xiangru
- family: Wang
given: Mengdi
- family: Liu
given: Ji
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1159-1167
id: lian17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1159
lastpage: 1167
published: 2017-04-10 00:00:00 +0000
- title: 'A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models'
abstract: 'Estimating multiple sparse Gaussian Graphical Models (sGGMs) jointly for many related tasks (large $K$) under a high-dimensional (large $p$) situation is an important task. Most previous studies for the joint estimation of multiple sGGMs rely on penalized log-likelihood estimators that involve expensive and difficult non-smooth optimizations. We propose a novel approach, FASJEM for \underlinefast and \underlinescalable \underlinejoint structure-\underlineestimation of \underlinemultiple sGGMs at a large scale. As the first study of joint sGGM using the M-estimator framework, our work has three major contributions: (1) We solve FASJEM through an entry-wise manner which is parallelizable. (2) We choose a proximal algorithm to optimize FASJEM. This improves the computational efficiency from $O(Kp^3)$ to $O(Kp^2)$ and reduces the memory requirement from $O(Kp^2)$ to $O(K)$. (3) We theoretically prove that FASJEM achieves a consistent estimation with a convergence rate of $O(\log(Kp)/n_tot)$. On several synthetic and four real-world datasets, FASJEM shows significant improvements over baselines on accuracy, computational complexity and memory costs.'
volume: 54
URL: http://proceedings.mlr.press/v54/wang17e.html
PDF: http://proceedings.mlr.press/v54/wang17e/wang17e.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-wang17e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Beilun
- family: Gao
given: Ji
- family: Qi
given: Yanjun
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1168-1177
id: wang17e
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1168
lastpage: 1177
published: 2017-04-10 00:00:00 +0000
- title: 'Communication-efficient Distributed Sparse Linear Discriminant Analysis'
abstract: 'We propose a communication-efficient distributed estimation method for sparse linear discriminant analysis (LDA) in the high dimensional regime. Our method distributes the data of size N into m machines, and estimates a local sparse LDA estimator on each machine using the data subset of size N/m. After the distributed estimation, our method aggregates the debiased local estimators from m machines, and sparsifies the aggregated estimator. We show that the aggregated estimator attains the same statistical rate as the centralized estimation method, as long as the number of machines m is chosen appropriately. Moreover, we prove that our method can attain the model selection consistency under a milder condition than the centralized method. Experiments on both synthetic and real datasets corroborate our theory.'
volume: 54
URL: http://proceedings.mlr.press/v54/tian17a.html
PDF: http://proceedings.mlr.press/v54/tian17a/tian17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-tian17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tian
given: Lu
- family: Gu
given: Quanquan
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1178-1187
id: tian17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1178
lastpage: 1187
published: 2017-04-10 00:00:00 +0000
- title: 'Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage'
abstract: 'This paper concerns a fundamental class of convex matrix optimization problems. It presents the first algorithm that uses optimal storage and provably computes a low-rank approximation of a solution. In particular, when all solutions have low rank, the algorithm converges to a solution. This algorithm, SketchyCGM, modifies a standard convex optimization scheme, the conditional gradient method, to store only a small randomized sketch of the matrix variable. After the optimization terminates, the algorithm extracts a low-rank approximation of the solution from the sketch. In contrast to nonconvex heuristics, the guarantees for SketchyCGM do not rely on statistical models for the problem data. Numerical work demonstrates the benefits of SketchyCGM over heuristics.'
volume: 54
URL: http://proceedings.mlr.press/v54/yurtsever17a.html
PDF: http://proceedings.mlr.press/v54/yurtsever17a/yurtsever17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-yurtsever17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Yurtsever
given: Alp
- family: Udell
given: Madeleine
- family: Tropp
given: Joel
- family: Cevher
given: Volkan
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1188-1196
id: yurtsever17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1188
lastpage: 1196
published: 2017-04-10 00:00:00 +0000
- title: 'Modal-set estimation with an application to clustering'
abstract: 'We present a procedure that can estimate – with statistical consistency guarantees – any local-maxima of a density, under benign distributional conditions. The procedure estimates all such local maxima, or modal-sets, of any bounded shape or dimension, including usual point-modes. In practice, modal-sets can arise as dense low-dimensional structures in noisy data, and more generally serve to better model the rich variety of locally dense structures in data. The procedure is then shown to be competitive on clustering applications, and moreover is quite stable to a wide range of settings of its tuning parameter. '
volume: 54
URL: http://proceedings.mlr.press/v54/jiang17c.html
PDF: http://proceedings.mlr.press/v54/jiang17c/jiang17c.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-jiang17c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jiang
given: Heinrich
- family: Kpotufe
given: Samory
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1197-1206
id: jiang17c
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1197
lastpage: 1206
published: 2017-04-10 00:00:00 +0000
- title: 'Compressed Least Squares Regression revisited'
abstract: 'We revisit compressed least squares (CLS) regression as originally analyzed in Maillard and Munos (2009) and later on in Kaban (2014) with some refinements. Given a set of high-dimensional inputs, CLS applies a random projection and then performs least squares regression based on the projected inputs of lower dimension. This approach can be beneficial with regard to both computation (yielding a smaller least squares problem) and statistical performance (reducing the estimation error). We will argue below that the outcome of previous analysis of the procedure is not meaningful in typical situations, yielding a bound on the prediction error that is inferior to ordinary least squares while requiring the dimension of the projected data to be of the same order as the original dimension. As a fix, we subsequently present a modified analysis with meaningful implications that much better reflects empirical results with simulated and real data.'
volume: 54
URL: http://proceedings.mlr.press/v54/slawski17a.html
PDF: http://proceedings.mlr.press/v54/slawski17a/slawski17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-slawski17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Slawski
given: Martin
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1207-1215
id: slawski17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1207
lastpage: 1215
published: 2017-04-10 00:00:00 +0000
- title: 'Diverse Neural Network Learns True Target Functions'
abstract: 'Neural networks are a powerful class of functions that can be trained with simple gradient descent to achieve state-of-the-art performance on a variety of applications. Despite their practical success, there is a paucity of results that provide theoretical guarantees on why they are so effective. Lying in the center of the problem is the difficulty of analyzing the non-convex loss function with potentially numerous local minima and saddle points. Can neural networks corresponding to the stationary points of the loss function learn the true target function? If yes, what are the key factors contributing to such nice optimization properties? In this paper, we answer these questions by analyzing one-hidden-layer neural networks with ReLU activation, and show that despite the non-convexity, neural networks with diverse units have no spurious local minima. We bypass the non-convexity issue by directly analyzing the first order optimality condition, and show that the loss can be made arbitrarily small if the minimum singular value of the “extended feature matrix” is large enough. We make novel use of techniques from kernel methods and geometric discrepancy, and identify a new relation linking the smallest singular value to the spectrum of a kernel function associated with the activation function and to the diversity of the units. Our results also suggest a novel regularization function to promote unit diversity for potentially better generalization ability.'
volume: 54
URL: http://proceedings.mlr.press/v54/xie17a.html
PDF: http://proceedings.mlr.press/v54/xie17a/xie17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-xie17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Xie
given: Bo
- family: Liang
given: Yingyu
- family: Song
given: Le
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1216-1224
id: xie17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1216
lastpage: 1224
published: 2017-04-10 00:00:00 +0000
- title: 'Local Group Invariant Representations via Orbit Embeddings'
abstract: 'Invariance to nuisance transformations is one of the desirable properties of effective representations. We consider transformations that form a group and propose an approach based on kernel methods to derive local group invariant representations. Locality is achieved by defining a suitable probability distribution over the group which in turn induces distributions in the input feature space. We learn a decision function over these distributions by appealing to the powerful framework of kernel methods and generate local invariant random feature maps via kernel approximations. We show uniform convergence bounds for kernel approximation and provide generalization bounds for learning with these features. We evaluate our method on three real datasets, including Rotated MNIST and CIFAR-10, and observe that it outperforms competing kernel based approaches. The proposed method also outperforms deep CNN on Rotated MNIST and performs comparably to the recently proposed group-equivariant CNN.'
volume: 54
URL: http://proceedings.mlr.press/v54/raj17a.html
PDF: http://proceedings.mlr.press/v54/raj17a/raj17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-raj17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Raj
given: Anant
- family: Kumar
given: Abhishek
- family: Mroueh
given: Youssef
- family: Fletcher
given: Tom
- family: Schoelkopf
given: Bernhard
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1225-1235
id: raj17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1225
lastpage: 1235
published: 2017-04-10 00:00:00 +0000
- title: 'Relativistic Monte Carlo '
abstract: 'Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a Metropolis-Hastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In particular the mass matrix of HMC is hard to tune well. In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduces a maximum velocity on particles. We also derive stochastic gradient versions of the algorithm and show that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning. Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo. In experiments we show that the relativistic algorithms perform better than classical Newtonian variants and Adam.'
volume: 54
URL: http://proceedings.mlr.press/v54/lu17b.html
PDF: http://proceedings.mlr.press/v54/lu17b/lu17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-lu17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lu
given: Xiaoyu
- family: Perrone
given: Valerio
- family: Hasenclever
given: Leonard
- family: Teh
given: Yee Whye
- family: Vollmer
given: Sebastian
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1236-1245
id: lu17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1236
lastpage: 1245
published: 2017-04-10 00:00:00 +0000
- title: 'Thompson Sampling for Linear-Quadratic Control Problems'
abstract: 'We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy. This results in an overall regret of $O(T^2/3)$, which is significantly worse than the regret $O(\sqrtT)$ achieved by the optimism-in-face-of-uncertainty algorithm in LQ control problems.'
volume: 54
URL: http://proceedings.mlr.press/v54/abeille17b.html
PDF: http://proceedings.mlr.press/v54/abeille17b/abeille17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-abeille17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Abeille
given: Marc
- family: Lazaric
given: Alessandro
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1246-1254
id: abeille17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1246
lastpage: 1254
published: 2017-04-10 00:00:00 +0000
- title: 'Fast Classification with Binary Prototypes'
abstract: 'In this work, we propose a new technique for \emphfast k-nearest neighbor (k-NN) classification in which the original database is represented via a small set of learned binary prototypes. The training phase simultaneously learns a hash function which maps the data points to binary codes, and a set of representative binary prototypes. In the prediction phase, we first hash the query into a binary code and then do the k-NN classification using the binary prototypes as the database. Our approach speeds up k-NN classification in two aspects. First, we compress the database into a smaller set of prototypes such that k-NN search only goes through a smaller set rather than the whole dataset. Second, we reduce the original space to a compact binary embedding, where the Hamming distance between two binary codes is very efficient to compute. We propose a formulation to learn the hash function and prototypes such that the classification error is minimized. We also provide a novel theoretical analysis of the proposed technique in terms of Bayes error consistency. Empirically, our method is much faster than the state-of-the-art k-NN compression methods with comparable accuracy.'
volume: 54
URL: http://proceedings.mlr.press/v54/zhong17a.html
PDF: http://proceedings.mlr.press/v54/zhong17a/zhong17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-zhong17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhong
given: Kai
- family: Guo
given: Ruiqi
- family: Kumar
given: Sanjiv
- family: Yan
given: Bowei
- family: Simcha
given: David
- family: Dhillon
given: Inderjit
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1255-1263
id: zhong17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1255
lastpage: 1263
published: 2017-04-10 00:00:00 +0000
- title: 'Prediction Performance After Learning in Gaussian Process Regression'
abstract: 'This paper considers the quantification of the prediction performance in Gaussian process regression. The standard approach is to base the prediction error bars on the theoretical predictive variance, which is a lower bound on the mean square-error (MSE). This approach, however, does not take into account that the statistical model is learned from the data. We show that this omission leads to a systematic underestimation of the prediction errors. Starting from a generalization of the Cramér-Rao bound, we derive a more accurate MSE bound which provides a measure of uncertainty for prediction of Gaussian processes. The improved bound is easily computed and we illustrate it using synthetic and real data examples.'
volume: 54
URL: http://proceedings.mlr.press/v54/wagberg17a.html
PDF: http://proceedings.mlr.press/v54/wagberg17a/wagberg17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-wagberg17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wagberg
given: Johan
- family: Zachariah
given: Dave
- family: Schon
given: Thomas
- family: Stoica
given: Petre
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1264-1272
id: wagberg17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1264
lastpage: 1272
published: 2017-04-10 00:00:00 +0000
- title: 'Communication-Efficient Learning of Deep Networks from Decentralized Data'
abstract: 'Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent. '
volume: 54
URL: http://proceedings.mlr.press/v54/mcmahan17a.html
PDF: http://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-mcmahan17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: McMahan
given: Brendan
- family: Moore
given: Eider
- family: Ramage
given: Daniel
- family: Hampson
given: Seth
- family: Arcas
given: Blaise Aguera y
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1273-1282
id: mcmahan17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1273
lastpage: 1282
published: 2017-04-10 00:00:00 +0000
- title: 'Learning Structured Weight Uncertainty in Bayesian Neural Networks'
abstract: 'Deep neural networks (DNNs) are increasingly popular in modern machine learning. Bayesian learning affords the opportunity to quantify posterior uncertainty on DNN model parameters. Most existing work adopts independent Gaussian priors on the model weights, ignoring possible structural information. In this paper, we consider the matrix variate Gaussian (MVG) distribution to model structured correlations within the weights of a DNN. To make posterior inference feasible, a reparametrization is proposed for the MVG prior, simplifying the complex MVG-based model to an equivalent yet simpler model with independent Gaussian priors on the transformed weights. Consequently, we develop a scalable Bayesian online inference algorithm by adopting the recently proposed probabilistic backpropagation framework. Experiments on several synthetic and real datasets indicate the superiority of our model, achieving competitive performance in terms of model likelihood and predictive root mean square error. Importantly, it also yields faster convergence speed compared to related Bayesian DNN models.'
volume: 54
URL: http://proceedings.mlr.press/v54/sun17b.html
PDF: http://proceedings.mlr.press/v54/sun17b/sun17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-sun17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sun
given: Shengyang
- family: Chen
given: Changyou
- family: Carin
given: Lawrence
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1283-1292
id: sun17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1283
lastpage: 1292
published: 2017-04-10 00:00:00 +0000
- title: 'Signal-based Bayesian Seismic Monitoring'
abstract: 'Detecting weak seismic events from noisy sensors is a difficult perceptual task. We formulate this task as Bayesian inference and propose a generative model of seismic events and signals across a network of spatially distributed stations. Our system, SIGVISA, is the first to directly model seismic waveforms, allowing it to incorporate a rich representation of the physics underlying the signal generation process. We use Gaussian processes over wavelet parameters to predict detailed waveform fluctuations based on historical events, while degrading smoothly to simple parametric envelopes in regions with no historical seismicity. Evaluating on data from the western US, we recover three times as many events as previous work, and reduce mean location errors by a factor of four while greatly increasing sensitivity to low-magnitude events.'
volume: 54
URL: http://proceedings.mlr.press/v54/moore17a.html
PDF: http://proceedings.mlr.press/v54/moore17a/moore17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-moore17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Moore
given: David
- family: Russell
given: Stuart
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1293-1301
id: moore17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1293
lastpage: 1301
published: 2017-04-10 00:00:00 +0000
- title: 'Learning the Network Structure of Heterogeneous Data via Pairwise Exponential Markov Random Fields'
abstract: 'Markov random fields (MRFs) are a useful tool for modeling relationships present in large and high-dimensional data. Often, this data comes from various sources and can have diverse distributions, for example a combination of numerical, binary, and categorical variables. Here, we define the pairwise exponential Markov random field (PE-MRF), an approach capable of modeling exponential family distributions in heterogeneous domains. We develop a scalable method of learning the graphical structure across the variables by solving a regularized approximated maximum likelihood problem. Specifically, we first derive a tractable upper bound on the log-partition function. We then use this upper bound to derive the group graphical lasso, a generalization of the classic graphical lasso problem to heterogeneous domains. To solve this problem, we develop a fast algorithm based on the alternating direction method of multipliers (ADMM). We also prove that our estimator is sparsistent, with guaranteed recovery of the true underlying graphical structure, and that it has a polynomially faster runtime than the current state-of-the-art method for learning such distributions. Experiments on synthetic and real-world examples demonstrate that our approach is both efficient and accurate at uncovering the structure of heterogeneous data.'
volume: 54
URL: http://proceedings.mlr.press/v54/park17d.html
PDF: http://proceedings.mlr.press/v54/park17d/park17d.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-park17d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Park
given: Youngsuk
- family: Hallac
given: David
- family: Boyd
given: Stephen
- family: Leskovec
given: Jure
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1302-1310
id: park17d
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1302
lastpage: 1310
published: 2017-04-10 00:00:00 +0000
- title: 'Discovering and Exploiting Additive Structure for Bayesian Optimization'
abstract: 'Bayesian optimization has proven invaluable for black-box optimization of expensive functions. Its main limitation is its exponential complexity with respect to the dimensionality of the search space using typical kernels. Luckily, many objective functions can be decomposed into additive subproblems, which can be optimized independently. We investigate how to automatically discover such (typically unknown) additive structure while simultaneously exploiting it through Bayesian optimization. We propose an efficient algorithm based on Metropolis-Hastings sampling and demonstrate its efficacy empirically on synthetic and real-world data sets. Throughout all our experiments we reliably discover hidden additive structure whenever it exists and exploit it to yield significantly faster convergence.'
volume: 54
URL: http://proceedings.mlr.press/v54/gardner17a.html
PDF: http://proceedings.mlr.press/v54/gardner17a/gardner17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-gardner17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gardner
given: Jacob
- family: Guo
given: Chuan
- family: Weinberger
given: Kilian
- family: Garnett
given: Roman
- family: Grosse
given: Roger
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1311-1319
id: gardner17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1311
lastpage: 1319
published: 2017-04-10 00:00:00 +0000
- title: 'Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning'
abstract: 'Density-ratio estimation (i.e. estimating $f = f_Q/f_P$ for two unknown distributions Q and P) has proved useful in many Machine Learning tasks, e.g., risk-calibration in transfer-learning, two-sample tests, and also useful in common techniques such importance sampling and bias correction. While there are many important analyses of this estimation problem, the present paper derives convergence rates in other practical settings that are less understood, namely, extensions of traditional Lipschitz smoothness conditions, and common high-dimensional settings with structured data (e.g. manifold data, sparse data). Various interesting facts, which hold in earlier settings, are shown to extend to these settings. Namely, (1) optimal rates depend only on the smoothness of the ratio f, and not on the densities $f_Q$, $f_P$, supporting the belief that plugging in estimates for $f_Q$, $f_P$ is suboptimal; (2) optimal rates depend only on the intrinsic dimension of data, i.e. this problem – unlike density estimation – escapes the curse of dimension. We further show that near-optimal rates are attainable by estimators tuned from data alone, i.e. with no prior distributional information. This last fact is of special interest in unsupervised settings such as this one, where only oracle rates seem to be known, i.e., rates which assume critical distributional information usually unavailable in practice. '
volume: 54
URL: http://proceedings.mlr.press/v54/kpotufe17a.html
PDF: http://proceedings.mlr.press/v54/kpotufe17a/kpotufe17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-kpotufe17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kpotufe
given: Samory
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1320-1328
id: kpotufe17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1320
lastpage: 1328
published: 2017-04-10 00:00:00 +0000
- title: 'Spatial Decompositions for Large Scale SVMs'
abstract: 'Although support vector machines (SVMs) are theoretically well understood, their underlying optimization problem becomes very expensive if, for example, hundreds of thousands of samples and a non-linear kernel are considered. Several approaches have been proposed in the past to address this serious limitation. In this work we investigate a decomposition strategy that learns on small, spatially defined data chunks. Our contributions are two fold: On the theoretical side we establish an oracle inequality for the overall learning method using the hinge loss, and show that the resulting rates match those known for SVMs solving the complete optimization problem with Gaussian kernels. On the practical result we compare our approach to learning SVMs on small, randomly chosen chunks. Here it turns out that for comparable training times our approach is significantly faster during testing and also reduces the test error in most cases significantly. Furthermore, we show that our approach easily scales up to 10 million training samples: including hyper-parameter selection using cross validation, the entire training only takes a few hours on a single machine. Finally, we report an experiment on 32 million training samples. All experiments used liquidSVM (Steinwart and Thomann, 2017)'
volume: 54
URL: http://proceedings.mlr.press/v54/thomann17a.html
PDF: http://proceedings.mlr.press/v54/thomann17a/thomann17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-thomann17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Thomann
given: Philipp
- family: Blaschzyk
given: Ingrid
- family: Meister
given: Mona
- family: Steinwart
given: Ingo
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1329-1337
id: thomann17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1329
lastpage: 1337
published: 2017-04-10 00:00:00 +0000
- title: 'Inference Compilation and Universal Probabilistic Programming'
abstract: 'We introduce a method for using deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods. We call what we do “compilation of inference” because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language. When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program. Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine. We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.'
volume: 54
URL: http://proceedings.mlr.press/v54/le17a.html
PDF: http://proceedings.mlr.press/v54/le17a/le17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-le17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Le
given: Tuan Anh
- family: Baydin
given: Atilim Gunes
- family: Wood
given: Frank
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1338-1348
id: le17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1338
lastpage: 1348
published: 2017-04-10 00:00:00 +0000
- title: 'Active Positive Semidefinite Matrix Completion: Algorithms, Theory and Applications'
abstract: 'In this paper we provide simple, computationally efficient, active algorithms for completion of symmetric positive semidefinite matrices. Our proposed algorithms are based on adaptive Nyström sampling, and are allowed to actively query any element in the matrix, and obtain a possibly noisy estimate of the queried element. We establish sample complexity guarantees on the recovery of the matrix in the max-norm and in the process establish new theoretical results, potentially of independent interest, on adaptive Nyström sampling. We demonstrate the efficacy of our algorithms on problems in multi-armed bandits and kernel dimensionality reduction.'
volume: 54
URL: http://proceedings.mlr.press/v54/bhargava17a.html
PDF: http://proceedings.mlr.press/v54/bhargava17a/bhargava17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-bhargava17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bhargava
given: Aniruddha
- family: Ganti
given: Ravi
- family: Nowak
given: Rob
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1349-1357
id: bhargava17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1349
lastpage: 1357
published: 2017-04-10 00:00:00 +0000
- title: 'Information Projection and Approximate Inference for Structured Sparse Variables'
abstract: 'Approximate inference via information projection has been recently introduced as a general-purpose technique for efficient probabilistic inference given sparse variables. This manuscript goes beyond classical sparsity by proposing efficient algorithms for approximate inference via information projection that are applicable to any structure on the set of variables that admits enumeration using matroid or knapsack constraints. Further, leveraging recent advances in submodular optimization, we provide an efficient greedy algorithm with strong optimization-theoretic guarantees. The class of probabilistic models that can be expressed in this way is quite broad and, as we show, includes group sparse regression, group sparse principal components analysis and sparse collective matrix factorization, among others. Empirical results on simulated data and high dimensional neuroimaging data highlight the superior performance of the information projection approach as compared to established baselines for a range of probabilistic models.'
volume: 54
URL: http://proceedings.mlr.press/v54/khanna17a.html
PDF: http://proceedings.mlr.press/v54/khanna17a/khanna17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-khanna17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Khanna
given: Rajiv
- family: Ghosh
given: Joydeep
- family: Poldrack
given: Rusell
- family: Koyejo
given: Oluwasanmi
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1358-1366
id: khanna17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1358
lastpage: 1366
published: 2017-04-10 00:00:00 +0000
- title: 'On the Interpretability of Conditional Probability Estimates in the Agnostic Setting'
abstract: 'We study the interpretability of conditional probability estimates for binary classification under the agnostic setting or scenario. Under the agnostic setting, conditional probability estimates do not necessarily reflect the true conditional probabilities. Instead, they have a certain calibration property: among all data points that the classifier has predicted $P(Y = 1|X) = p$, $p$ portion of them actually have label $Y = 1$. For cost-sensitive decision problems, this calibration property provides adequate support for us to use Bayes Decision Theory. In this paper, we define a novel measure for the calibration property together with its empirical counterpart, and prove an uniform convergence result between them. This new measure enables us to formally justify the calibration property of conditional probability estimations, and provides new insights on the problem of estimating and calibrating conditional probabilities. '
volume: 54
URL: http://proceedings.mlr.press/v54/gao17a.html
PDF: http://proceedings.mlr.press/v54/gao17a/gao17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-gao17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gao
given: Yihan
- family: Parameswaran
given: Aditya
- family: Peng
given: Jian
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1367-1374
id: gao17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1367
lastpage: 1374
published: 2017-04-10 00:00:00 +0000
- title: 'Linking Micro Event History to Macro Prediction in Point Process Models'
abstract: 'User behaviors in social networks are microscopic with fine grained temporal information. Predicting a macroscopic quantity based on users’ collective behaviors is an important problem. However, existing works are mainly problem-specific models for the microscopic behaviors and typically design approximation or heuristic algorithms to compute the macroscopic quantity. In this paper, we propose a unifying framework with a jump stochastic differential equation model that systematically links the microscopic event data and macroscopic inference, and the theory to approximate its probability distribution. We showed that our method can better predict the user behaviors in real-world applications.'
volume: 54
URL: http://proceedings.mlr.press/v54/wang17f.html
PDF: http://proceedings.mlr.press/v54/wang17f/wang17f.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-wang17f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Yichen
- family: Ye
given: Xiaojing
- family: Zhou
given: Haomin
- family: Zha
given: Hongyuan
- family: Song
given: Le
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1375-1384
id: wang17f
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1375
lastpage: 1384
published: 2017-04-10 00:00:00 +0000
- title: 'Initialization and Coordinate Optimization for Multi-way Matching'
abstract: 'We consider the problem of consistently matching multiple sets of elements to each other, which is a common task in fields such as computer vision. To solve the underlying NP-hard objective, existing methods often relax or approximate it, but end up with unsatisfying empirical performance due to a misaligned objective. We propose a coordinate update algorithm that directly optimizes the target objective. By using pairwise alignment information to build an undirected graph and initializing the permutation matrices along the edges of its Maximum Spanning Tree, our algorithm successfully avoids bad local optima. Theoretically, with high probability our algorithm guarantees an optimal solution under reasonable noise assumptions. Empirically, our algorithm consistently and significantly outperforms existing methods on several benchmark tasks on real datasets.'
volume: 54
URL: http://proceedings.mlr.press/v54/tang17a.html
PDF: http://proceedings.mlr.press/v54/tang17a/tang17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-tang17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tang
given: Da
- family: Jebara
given: Tony
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1385-1393
id: tang17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1385
lastpage: 1393
published: 2017-04-10 00:00:00 +0000
- title: 'Optimal Recovery of Tensor Slices'
abstract: 'We consider the problem of large scale matrix recovery given side information in the form of additional matrices of conforming dimension. This is a parsimonious model that captures a number of interesting problems including context and location aware recommendations, personalized ‘tag’ learning, demand learning with side information, etc. Viewing the matrix we seek to recover and the side information we have as slices of a tensor, we consider the problem of Slice Recovery, which is to recover specific slices of a tensor from noisy observations of the tensor. We provide an efficient algorithm to recover slices of structurally ’simple’ tensors given noisy observations of the tensor’s entries; our definition of simplicity subsumes low-rank tensors for a variety of definitions of tensor rank. Our algorithm is practical for large datasets and provides a significant performance improvement over state of the art incumbent approaches to tensor recovery. We establish theoretical recovery guarantees that under reasonable assumptions are minimax optimal for slice recovery. These guarantees also imply the first minimax optimal guarantees for recovering tensors of low Tucker rank and general noise. Experiments on data from a music streaming service demonstrate the performance and scalability of our algorithm.'
volume: 54
URL: http://proceedings.mlr.press/v54/farias17a.html
PDF: http://proceedings.mlr.press/v54/farias17a/farias17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-farias17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Farias
given: Vivek
- family: Li
given: Andrew
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1394-1402
id: farias17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1394
lastpage: 1402
published: 2017-04-10 00:00:00 +0000
- title: 'Efficient Online Multiclass Prediction on Graphs via Surrogate Losses'
abstract: ' We develop computationally efficient algorithms for online multi-class prediction. Our construction is based on carefully-chosen data-dependent surrogate loss functions, and the new methods enjoy strong mistake bound guarantees. To illustrate the technique, we study the combinatorial problem of node classification and develop a prediction strategy that is linear-time per round. In contrast, the offline benchmark is NP-hard to compute in general. We demonstrate the empirical performance of the method on several datasets. '
volume: 54
URL: http://proceedings.mlr.press/v54/rakhlin17a.html
PDF: http://proceedings.mlr.press/v54/rakhlin17a/rakhlin17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-rakhlin17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Rakhlin
given: Alexander
- family: Sridharan
given: Karthik
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1403-1411
id: rakhlin17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1403
lastpage: 1411
published: 2017-04-10 00:00:00 +0000
- title: 'Distribution of Gaussian Process Arc Lengths'
abstract: ' We present the first treatment of the arc length of the GP with more than a single output dimension. GPs are commonly used for tasks such as trajectory modelling, where path length is a crucial quantity of interest. Previously, only paths in one dimension have been considered, with no theoretical consideration of higher dimensional problems. We fill the gap in the existing literature by deriving the moments of the arc length for a stationary GP with multiple output dimensions. A new method is used to derive the mean of a one-dimensional GP over a finite interval, by considering the distribution of the arc length integrand. This technique is used to derive an approximate distribution over the arc length of a vector valued GP in $\mathbbR^n$ by moment matching the distribution. Numerical simulations confirm our theoretical derivations.'
volume: 54
URL: http://proceedings.mlr.press/v54/bewsher17a.html
PDF: http://proceedings.mlr.press/v54/bewsher17a/bewsher17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-bewsher17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bewsher
given: Justin
- family: Tosi
given: Alessandra
- family: Osborne
given: Michael
- family: Roberts
given: Stephen
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1412-1420
id: bewsher17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1412
lastpage: 1420
published: 2017-04-10 00:00:00 +0000
- title: 'Distributed Adaptive Sampling for Kernel Matrix Approximation'
abstract: 'Most kernel-based methods, such as kernel regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $K_n$ requires at least $O(n^2)$ time and space for $n$ samples. Recent works (Alaoui 2014, Musco 2016) show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for $K_n$. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that \emphsequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_eff(γ)$ of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK never constructs the whole matrix $K_n$, runs in linear time $\widetildeO(nd_eff(γ)^3)$ w.r.t. $n$, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK achieving similar accuracy in as little as $\widetildeO(\log(n)d_eff(γ)^3)$ time.'
volume: 54
URL: http://proceedings.mlr.press/v54/calandriello17a.html
PDF: http://proceedings.mlr.press/v54/calandriello17a/calandriello17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-calandriello17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Calandriello
given: Daniele
- family: Lazaric
given: Alessandro
- family: Valko
given: Michal
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1421-1429
id: calandriello17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1421
lastpage: 1429
published: 2017-04-10 00:00:00 +0000
- title: 'Binary and Multi-Bit Coding for Stable Random Projections'
abstract: 'The recent work [17] developed a 1-bit compressed sensing (CS) algorithm based on $α$-stable random projections. Although the work in [17] showed that the method is a strong competitor of other existing 1-bit algorithms, the procedure requires knowing $K$, the sparsity. Note that $K$ is the $l_0$ norm of the signal. Other existing 1-bit CS algorithms require the $l_2$ norm of the signal. In this paper, we develop an estimation procedure for the $l_α$ norm of the signal, where $0<α\leq2$ from binary or multi-bit measurements. We demonstrate that using a simple closed-form estimator with merely 1-bit information does not result in a significant loss of accuracy if the parameter is chosen appropriately. Theoretical tail bounds are also provided. Using 2 or more bits per measurement reduces the variance and importantly, stabilizes the estimate so that the variance is not too sensitive to chosen parameters.'
volume: 54
URL: http://proceedings.mlr.press/v54/li17c.html
PDF: http://proceedings.mlr.press/v54/li17c/li17c.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-li17c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Ping
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1430-1438
id: li17c
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1430
lastpage: 1438
published: 2017-04-10 00:00:00 +0000
- title: 'Spectral Methods for Correlated Topic Models'
abstract: 'In this paper we propose guaranteed spectral methods for learning a broad range of topic models, which generalize the popular Latent Dirichlet Allocation (LDA). We overcome the limitation of LDA to incorporate arbitrary topic correlations, by assuming that the hidden topic proportions are drawn from a flexible class of Normalized Infinitely Divisible (NID) distributions. NID distributions are generated by normalizing a family of independent Infinitely Divisible (ID) random variables. The Dirichlet distribution is a special case obtained by normalizing a set of Gamma random variables. We prove that this flexible topic model class can be learnt via spectral methods using only moments up to the third order, with (low order) polynomial sample and computational complexity. The proof is based on a key new technique derived here that allows us to diagonalize the moments of the NID distribution through an efficient procedure that requires evaluating only univariate integrals, despite the fact that we are handling high dimensional multivariate moments. In order to assess the performance of our proposed Latent NID topic model, we use two real datasets of articles collected from New York Times and Pubmed. Our experiments yield improved perplexity on both datasets compared with the baseline.'
volume: 54
URL: http://proceedings.mlr.press/v54/arabshahi17a.html
PDF: http://proceedings.mlr.press/v54/arabshahi17a/arabshahi17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-arabshahi17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Arabshahi
given: Forough
- family: Anandkumar
given: Anima
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1439-1447
id: arabshahi17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1439
lastpage: 1447
published: 2017-04-10 00:00:00 +0000
- title: 'Label Filters for Large Scale Multilabel Classification'
abstract: 'When assigning labels to a test instance, most multilabel and multiclass classifiers systematically evaluate every single label to decide whether it is relevant or not. This linear scan over labels becomes prohibitive when the number of labels is very large. To alleviate this problem we propose a two step approach where computationally efficient label filters pre-select a small set of candidate labels before the base multiclass or multilabel classifier is applied. The label filters select candidate labels by projecting a test instance on a filtering line, and retaining only the labels that have training instances in the vicinity of this projection. The filter parameters are learned directly from data by solving a constraint optimization problem, and are independent of the base multilabel classifier. The proposed label filters can be used in conjunction with any multiclass or multilabel classifier that requires a linear scan over the labels, and speed up prediction by orders of magnitude without significant impact on performance.'
volume: 54
URL: http://proceedings.mlr.press/v54/niculescu-mizil17a.html
PDF: http://proceedings.mlr.press/v54/niculescu-mizil17a/niculescu-mizil17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-niculescu-mizil17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Niculescu-Mizil
given: Alexandru
- family: Abbasnejad
given: Ehsan
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1448-1457
id: niculescu-mizil17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1448
lastpage: 1457
published: 2017-04-10 00:00:00 +0000
- title: 'Learning from Conditional Distributions via Dual Embeddings'
abstract: 'Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample x itself is associated with a conditional distribution $p(z|x)$ represented by samples $z_i_i=1^M$, and the goal is to learn a function f that links these conditional distributions to target values y. These problems become very challenging when we only have limited samples or in the extreme case only one sample from each conditional distribution. Commonly used approaches either assume that z is independent of x, or require an overwhelmingly large set of samples from each conditional distribution. To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distribution problem. With such new reformulation, we only need to deal with the joint distribution p(z,x). We also design an efficient learning algorithm, Embedding-SGD, and establish theoretical sample complexity for such problems. Finally, our numerical experiments, on both synthetic and real-world datasets, show that the proposed approach can significantly improve over existing algorithms.'
volume: 54
URL: http://proceedings.mlr.press/v54/dai17a.html
PDF: http://proceedings.mlr.press/v54/dai17a/dai17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-dai17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Dai
given: Bo
- family: He
given: Niao
- family: Pan
given: Yunpeng
- family: Boots
given: Byron
- family: Song
given: Le
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1458-1467
id: dai17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1458
lastpage: 1467
published: 2017-04-10 00:00:00 +0000
- title: 'Sequential Multiple Hypothesis Testing with Type I Error Control'
abstract: 'This work studies multiple hypothesis testing in the setting when we obtain data sequentially and may choose when to stop sampling. We summarize the notion of a sequential p-value (one that can be continually updated and still maintain a type I error guarantee) and provide several examples from the literature. This tool allows us to convert step-up or step-down multiple hypothesis testing procedures in the fixed-horizon setting (which includes Benjamini-Hochberg, Holm, and Bonferroni) into sequential versions that allow the statistician to reject a hypothesis as soon as the sequential p-value reaches a threshold. We show that if the original procedure has a type I error guarantee in a certain family (including FDR and FWER), then the sequential conversion inherits an analogous guarantee. The conversion also allows for allocating samples in a data-dependent way, and we provide simulated experiments demonstrating an increased number of rejections when compared to the fixed-horizon setting.'
volume: 54
URL: http://proceedings.mlr.press/v54/malek17a.html
PDF: http://proceedings.mlr.press/v54/malek17a/malek17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-malek17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Malek
given: Alan
- family: Katariya
given: Sumeet
- family: Chow
given: Yinlam
- family: Ghavamzadeh
given: Mohammad
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1468-1476
id: malek17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1468
lastpage: 1476
published: 2017-04-10 00:00:00 +0000
- title: 'A Maximum Matching Algorithm for Basis Selection in Spectral Learning'
abstract: 'We present a solution to scale spectral algorithms for learning sequence functions. We are interested in the case where these functions are sparse (that is, for most sequences they return 0). Spectral algorithms reduce the learning problem to the task of computing an SVD decomposition over a special type of matrix called the Hankel matrix. This matrix is designed to capture the relevant statistics of the training sequences. What is crucial is that to capture long range dependencies we must consider very large Hankel matrices. Thus the computation of the SVD becomes a critical bottleneck. Our solution finds a subset of rows and columns of the Hankel that realizes a compact and informative Hankel submatrix. The novelty lies in the way that this subset is selected: we exploit a maximal bipartite matching combinatorial algorithm to look for a sub-block with full structural rank, and show how computation of this sub-block can be further improved by exploiting the specific structure of Hankel matrices.'
volume: 54
URL: http://proceedings.mlr.press/v54/quattoni17a.html
PDF: http://proceedings.mlr.press/v54/quattoni17a/quattoni17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-quattoni17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Quattoni
given: Ariadna
- family: Carreras
given: Xavier
- family: Gallé
given: Matthias
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1477-1485
id: quattoni17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1477
lastpage: 1485
published: 2017-04-10 00:00:00 +0000
- title: 'Value-Aware Loss Function for Model-based Reinforcement Learning'
abstract: 'We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it. We introduce a loss function that takes the structure of the value function into account. We provide a finite-sample upper bound for the loss function showing the dependence of the error on model approximation error, number of samples, and the complexity of the model space. We also empirically compare the method with the maximum likelihood estimator on a simple problem.'
volume: 54
URL: http://proceedings.mlr.press/v54/farahmand17a.html
PDF: http://proceedings.mlr.press/v54/farahmand17a/farahmand17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-farahmand17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Farahmand
given: Amir-Massoud
- family: Barreto
given: Andre
- family: Nikovski
given: Daniel
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1486-1494
id: farahmand17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1486
lastpage: 1494
published: 2017-04-10 00:00:00 +0000
- title: 'Convergence Rate of Stochastic k-means'
abstract: 'We analyze online (Bottou & Bengio, 1994) and mini-batch (Sculley, 2010) k-means variants. Both scale up the widely used Lloyd’s algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards “local optima” at rate $O(1/t)$ under general conditions. In addition, we show that if the dataset is clusterable, stochastic k-means with suitable initialization converges to an optimal k-means solution at rate $O(1/t)$ with high probability. The k-means objective is non-convex and non-differentiable; we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of the k-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about the k-means update.'
volume: 54
URL: http://proceedings.mlr.press/v54/tang17b.html
PDF: http://proceedings.mlr.press/v54/tang17b/tang17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-tang17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tang
given: Cheng
- family: Monteleoni
given: Claire
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1495-1503
id: tang17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1495
lastpage: 1503
published: 2017-04-10 00:00:00 +0000
- title: 'Automated Inference with Adaptive Batches'
abstract: 'Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive stepsize selection and automatic stopping. We propose alternative “big batch” SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. Big batch methods are thus easily automated and can run with little or no oversight.'
volume: 54
URL: http://proceedings.mlr.press/v54/de17a.html
PDF: http://proceedings.mlr.press/v54/de17a/de17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-de17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: De
given: Soham
- family: Yadav
given: Abhay
- family: Jacobs
given: David
- family: Goldstein
given: Tom
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1504-1513
id: de17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1504
lastpage: 1513
published: 2017-04-10 00:00:00 +0000
- title: 'Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition'
abstract: 'Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm, which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an accelerated dual decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.'
volume: 54
URL: http://proceedings.mlr.press/v54/zhang17b.html
PDF: http://proceedings.mlr.press/v54/zhang17b/zhang17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-zhang17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhang
given: Jiong
- family: Yen
given: Ian En-Hsu
- family: Ravikumar
given: Pradeep
- family: Dhillon
given: Inderjit
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1514-1522
id: zhang17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1514
lastpage: 1522
published: 2017-04-10 00:00:00 +0000
- title: 'Robust Causal Estimation in the Large-Sample Limit without Strict Faithfulness'
abstract: 'Causal effect estimation from observational data is an important and much studied research topic. The instrumental variable (IV) and local causal discovery (LCD) patterns are canonical examples of settings where a closed-form expression exists for the causal effect of one variable on another, given the presence of a third variable. Both rely on faithfulness to infer that the latter only influences the target effect via the cause variable. In reality, it is likely that this assumption only holds approximately and that there will be at least some form of weak interaction. This brings about the paradoxical situation that, in the large-sample limit, no predictions are made, as detecting the weak edge invalidates the setting. We introduce an alternative approach by replacing strict faithfulness with a prior that reflects the existence of many ’weak’ (irrelevant) and ’strong’ interactions. We obtain a posterior distribution over the target causal effect estimator which shows that, in many cases, we can still make good estimates. We demonstrate the approach in an application on a simple linear-Gaussian setting, using the MultiNest sampling algorithm, and compare it with established techniques to show our method is robust even when strict faithfulness is violated.'
volume: 54
URL: http://proceedings.mlr.press/v54/bucur17a.html
PDF: http://proceedings.mlr.press/v54/bucur17a/bucur17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-bucur17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bucur
given: Ioan Gabriel
- family: Claassen
given: Tom
- family: Heskes
given: Tom
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1523-1531
id: bucur17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1523
lastpage: 1531
published: 2017-04-10 00:00:00 +0000
- title: 'Learning Graphical Games from Behavioral Data: Sufficient and Necessary Conditions'
abstract: 'In this paper we obtain sufficient and necessary conditions on the number of samples required for exact recovery of the pure-strategy Nash equilibria (PSNE) set of a graphical game from noisy observations of joint actions. We consider sparse linear influence games — a parametric class of graphical games with linear payoffs, and represented by directed graphs of n nodes (players) and in-degree of at most k. We show that one can efficiently recover the PSNE set of a linear influence game with $O(k^2 \log n)$ samples, under very general observation models. On the other hand, we show that $Ω(k \log n)$ samples are necessary for any procedure to recover the PSNE set from observations of joint actions.'
volume: 54
URL: http://proceedings.mlr.press/v54/ghoshal17b.html
PDF: http://proceedings.mlr.press/v54/ghoshal17b/ghoshal17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-ghoshal17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ghoshal
given: Asish
- family: Honorio
given: Jean
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1532-1540
id: ghoshal17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1532
lastpage: 1540
published: 2017-04-10 00:00:00 +0000
- title: 'Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models'
abstract: 'Lifted inference algorithms commonly exploit symmetries in a probabilistic graphical model (PGM) for efficient inference. However, existing algorithms for Boolean-valued domains can identify only those pairs of states as symmetric, in which the number of ones and zeros match exactly (count symmetries). Moreover, algorithms for lifted inference in multi-valued domains also compute a multi-valued extension of count symmetries only. These algorithms miss many symmetries in a domain. In this paper, we present first algorithms to compute non-count symmetries in both Boolean-valued and multi-valued domains. Our methods can also find symmetries between multi-valued variables that have different domain cardinalities. The key insight in the algorithms is that they change the unit of symmetry computation from a variable to a variable-value (VV) pair. Our experiments find that exploiting these symmetries in MCMC can obtain substantial computational gains over existing algorithms.'
volume: 54
URL: http://proceedings.mlr.press/v54/anand17a.html
PDF: http://proceedings.mlr.press/v54/anand17a/anand17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-anand17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Anand
given: Ankit
- family: Noothigattu
given: Ritesh
- family: Singla
given: Parag
- family: Mausam
given:
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1541-1549
id: anand17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1541
lastpage: 1549
published: 2017-04-10 00:00:00 +0000
- title: 'Greedy Direction Method of Multiplier for MAP Inference of Large Output Domain'
abstract: 'Maximum-a-Posteriori (MAP) inference lies at the heart of Graphical Models and Structured Prediction. Despite the intractability of exact MAP inference, approximated methods based on LP relaxations have exhibited superior performance across a wide range of applications. Yet for problems involving large output domains (i.e., the state space for each variable is large), standard LP relaxations can easily give rise to a large number of variables and constraints which are beyond the limit of existing optimization algorithms. In this paper, we introduce an effective MAP inference method for problems with large output domains. The method builds upon alternating minimization of an Augmented Lagrangian that exploits the sparsity of messages through greedy optimization techniques. A key feature of our greedy approach is to introduce variables in an on-demand manner with a pre-built data structure over local factors. This results in a single-loop algorithm of sublinear cost per iteration and O(log(1/epsilon))-type iteration complexity to achieve epsilon sub-optimality. In addition, we introduce a variant of GDMM for binary MAP inference problems with a large number of factors. Empirically, the proposed algorithms demonstrate orders of magnitude speedup over state-of-the-art MAP inference techniques on MAP inference problems including Segmentation, Alignment, Protein Folding, Graph Matching, and Pairwise-Interacted Multilabel Prediction.'
volume: 54
URL: http://proceedings.mlr.press/v54/huang17a.html
PDF: http://proceedings.mlr.press/v54/huang17a/huang17a.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-huang17a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Huang
given: Xiangru
- family: Yen
given: Ian En-Hsu
- family: Zhang
given: Ruohan
- family: Huang
given: Qixing
- family: Ravikumar
given: Pradeep
- family: Dhillon
given: Inderjit
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1550-1559
id: huang17a
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1550
lastpage: 1559
published: 2017-04-10 00:00:00 +0000
- title: 'Scalable Greedy Feature Selection via Weak Submodularity'
abstract: 'Greedy algorithms are widely used for problems in machine learning such as feature selection and set function optimization. Unfortunately, for large datasets, the running time of even greedy algorithms can be quite high. This is because for each greedy step we need to refit a model or calculate a function using the previously selected choices and the new candidate. Two algorithms that are faster approximations to the greedy forward selection were introduced recently [Mirzasoleiman et al., 2013, 2015]. They achieve better performance by exploiting stochastic evaluation and distributed computation respectively. Both algorithms have provable performance guarantees for submodular functions. In this paper we show that divergent from previously held opinion, submodularity is not required to obtain approximation guarantees for these two algorithms. Specifically, we show that a generalized concept of weak submodularity suffices to give multiplicative approximation guarantees. Our result extends the applicability of these algorithms to a larger class of functions. Furthermore, we show that a bounded submodularity ratio can be used to provide data dependent bounds that can sometimes be tighter also for submodular functions. We empirically validate our work by showing superior performance of fast greedy approximations versus several established baselines on artificial and real datasets. '
volume: 54
URL: http://proceedings.mlr.press/v54/khanna17b.html
PDF: http://proceedings.mlr.press/v54/khanna17b/khanna17b.pdf
edit: https://github.com/mlresearch/v54/edit/gh-pages/_posts/2017-04-10-khanna17b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Khanna
given: Rajiv
- family: Elenberg
given: Ethan
- family: Dimakis
given: Alex
- family: Negahban
given: Sahand
- family: Ghosh
given: Joydeep
editor:
- family: Singh
given: Aarti
- family: Zhu
given: Jerry
page: 1560-1568
id: khanna17b
issued:
date-parts:
- 2017
- 4
- 10
firstpage: 1560
lastpage: 1568
published: 2017-04-10 00:00:00 +0000