- title: 'Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures'
abstract: 'Coresets are efficient representations of data sets such that models trained on the coreset are provably competitive with models trained on the original data set. As such, they have been successfully used to scale up clustering models such as K-Means and Gaussian mixture models to massive data sets. However, until now, the algorithms and the corresponding theory were usually specific to each clustering problem. We propose a single, practical algorithm to construct strong coresets for a large class of hard and soft clustering problems based on Bregman divergences. This class includes hard clustering with popular distortion measures such as the Squared Euclidean distance, the Mahalanobis distance, KL-divergence and Itakura-Saito distance. The corresponding soft clustering problems are directly related to popular mixture models due to a dual relationship between Bregman divergences and Exponential family distributions. Our theoretical results further imply a randomized polynomial-time approximation scheme for hard clustering. We demonstrate the practicality of the proposed algorithm in an empirical evaluation.'
volume: 51
URL: http://proceedings.mlr.press/v51/lucic16.html
PDF: http://proceedings.mlr.press/v51/lucic16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-lucic16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lucic
given: Mario
- family: Bachem
given: Olivier
- family: Krause
given: Andreas
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1-9
id: lucic16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1
lastpage: 9
published: 2016-05-02 00:00:00 +0000
- title: 'Revealing Graph Bandits for Maximizing Local Influence'
abstract: 'We study a graph bandit setting where the objective of the learner is to detect the most influential node of a graph by requesting as little information from the graph as possible. One of the relevant applications for this setting is marketing in social networks, where the marketer aims at finding and taking advantage of the most influential customers. The existing approaches for bandit problems on graphs require either partial or complete knowledge of the graph. In this paper, we do not assume any knowledge of the graph, but we consider a setting where it can be gradually discovered in a sequential and active way. At each round, the learner chooses a node of the graph and the only information it receives is a stochastic set of the nodes that the chosen node is currently influencing. To address this setting, we propose BARE, a bandit strategy for which we prove a regret guarantee that scales with the detectable dimension, a problem dependent quantity that is often much smaller than the number of nodes.'
volume: 51
URL: http://proceedings.mlr.press/v51/carpentier16a.html
PDF: http://proceedings.mlr.press/v51/carpentier16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-carpentier16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Carpentier
given: Alexandra
- family: Valko
given: Michal
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 10-18
id: carpentier16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 10
lastpage: 18
published: 2016-05-02 00:00:00 +0000
- title: 'Convex Block-sparse Linear Regression with Expanders – Provably'
abstract: 'Sparse matrices are favorable objects in machine learning and optimization. When such matrices are used, in place of dense ones, the overall complexity requirements in optimization can be significantly reduced in practice, both in terms of space and run-time. Prompted by this observation, we study a convex optimization scheme for block-sparse recovery from linear measurements. To obtain linear sketches, we use expander matrices, i.e., sparse matrices containing only few non-zeros per column. Hitherto, to the best of our knowledge, such algorithmic solutions have been only studied from a non-convex perspective. Our aim here is to theoretically characterize the performance of convex approaches under such setting. Our key novelty is the expression of the recovery error in terms of the model-based norm, while assuring that solution lives in the model. To achieve this, we show that sparse model-based matrices satisfy a group version of the null-space property. Our experimental findings on synthetic and real applications support our claims for faster recovery in the convex setting – as opposed to using dense sensing matrices, while showing a competitive recovery performance.'
volume: 51
URL: http://proceedings.mlr.press/v51/kyrillidis16.html
PDF: http://proceedings.mlr.press/v51/kyrillidis16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-kyrillidis16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kyrillidis
given: Anastasios
- family: Bah
given: Bubacarr
- family: Hasheminezhad
given: Rouzbeh
- family: Tran Dinh
given: Quoc
- family: Baldassarre
given: Luca
- family: Cevher
given: Volkan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 19-27
id: kyrillidis16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 19
lastpage: 27
published: 2016-05-02 00:00:00 +0000
- title: 'C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching'
abstract: 'Lightweight, source-to-source transformation approaches to implementing MCMC for probabilistic programming languages are popular for their simplicity, support of existing deterministic code, and ability to execute on existing fast runtimes. However, they are also inefficient, requiring a complete re-execution of the program on every Metropolis Hastings proposal. We present a new extension to the lightweight approach, C3, which enables efficient, incrementalized re-execution of MH proposals. C3 is based on two core ideas: transforming probabilistic programs into continuation passing style (CPS), and caching the results of function calls. It is particularly effective at speeding up recursive programs with many local latent variables. We show that on several common models, C3 reduces proposal runtime by 20-100x, in some cases reducing runtime complexity from linear in model size to constant. We also demonstrate nearly an order of magnitude speedup on a complex inverse procedural modeling application.'
volume: 51
URL: http://proceedings.mlr.press/v51/ritchie16.html
PDF: http://proceedings.mlr.press/v51/ritchie16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-ritchie16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ritchie
given: Daniel
- family: Stuhlmüller
given: Andreas
- family: Goodman
given: Noah
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 28-37
id: ritchie16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 28
lastpage: 37
published: 2016-05-02 00:00:00 +0000
- title: 'Clamping Improves TRW and Mean Field Approximations'
abstract: 'We examine the effect of clamping variables for approximate inference in undirected graphical models with pairwise relationships and discrete variables. For any number of variable labels, we demonstrate that clamping and summing approximate sub-partition functions can lead only to a decrease in the partition function estimate for TRW, and an increase for the naive mean ﬁeld method, in each case guaranteeing an improvement in the approximation and bound. We next focus on binary variables, add the Bethe approximation to consideration and examine ways to choose good variables to clamp, introducing new methods. We show the importance of identifying highly frustrated cycles, and of checking the singleton entropy of a variable. We explore the value of our methods by empirical analysis and draw lessons to guide practitioners.'
volume: 51
URL: http://proceedings.mlr.press/v51/weller16a.html
PDF: http://proceedings.mlr.press/v51/weller16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-weller16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Weller
given: Adrian
- family: Domke
given: Justin
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 38-46
id: weller16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 38
lastpage: 46
published: 2016-05-02 00:00:00 +0000
- title: 'Tightness of LP Relaxations for Almost Balanced Models'
abstract: 'Linear programming (LP) relaxations are widely used to attempt to identify a most likely conﬁguration of a discrete graphical model. In some cases, the LP relaxation attains an optimum vertex at an integral location and thus guarantees an exact solution to the original optimization problem. When this occurs, we say that the LP relaxation is tight. Here we consider binary pairwise models and derive sufﬁcient conditions for guaranteed tightness of (i) the standard LP relaxation on the local polytope LP+LOC, and (ii) the LP relaxation on the triplet-consistent polytope LP+TRI (the next level in the Sherali-Adams hierarchy). We provide simple new proofs of earlier results and derive significant novel results including that LP+TRI is tight for any model where each block is balanced or almost balanced, and a decomposition theorem that may be used to break apart complex models into smaller pieces. An almost balanced (sub-)model is one that contains no frustrated cycles except through one privileged variable.'
volume: 51
URL: http://proceedings.mlr.press/v51/weller16b.html
PDF: http://proceedings.mlr.press/v51/weller16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-weller16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Weller
given: Adrian
- family: Rowland
given: Mark
- family: Sontag
given: David
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 47-55
id: weller16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 47
lastpage: 55
published: 2016-05-02 00:00:00 +0000
- title: 'Control Functionals for Quasi-Monte Carlo Integration'
abstract: 'Quasi-Monte Carlo (QMC) methods are being adopted in statistical applications due to the increasingly challenging nature of numerical integrals that are now routinely encountered. For integrands with d-dimensions and derivatives of order α, an optimal QMC rule converges at a best-possible rate O(N^-α/d). However, in applications the value of αcan be unknown and/or a rate-optimal QMC rule can be unavailable. Standard practice is to employ \alpha_L-optimal QMC where the lower bound \alpha_L ≤αis known, but in general this does not exploit the full power of QMC. One solution is to trade-off numerical integration with functional approximation. This strategy is explored herein and shown to be well-suited to modern statistical computation. A challenging application to robotic arm data demonstrates a substantial variance reduction in predictions for mechanical torques.'
volume: 51
URL: http://proceedings.mlr.press/v51/oates16.html
PDF: http://proceedings.mlr.press/v51/oates16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-oates16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Oates
given: Chris
- family: Girolami
given: Mark
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 56-65
id: oates16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 56
lastpage: 65
published: 2016-05-02 00:00:00 +0000
- title: 'Probability Inequalities for Kernel Embeddings in Sampling without Replacement'
abstract: 'The \emphkernel embedding of distributions is a popular machine learning technique to manipulate probability distributions and an integral part of numerous applications. Its empirical counterpart is an estimate from a finite dataset of samples from the distribution under consideration. However, for large-scale learning problems the empirical kernel embedding becomes infeasible to compute and approximate, constant time, solutions are necessary. Instead of the full dataset, a random subset of smaller size can be used to calculate the empirical kernel embedding, known as \emphsampling without replacement. In this work we generalize the results of (Serfling 1974) to quantify the difference between this two estimates. We derive probability inequalities for the kernel embedding and more general inequalities for Banach space valued martingales in the setting of sampling without replacement.'
volume: 51
URL: http://proceedings.mlr.press/v51/schneider16.html
PDF: http://proceedings.mlr.press/v51/schneider16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-schneider16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Schneider
given: Markus
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 66-74
id: schneider16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 66
lastpage: 74
published: 2016-05-02 00:00:00 +0000
- title: 'Sparse Representation of Multivariate Extremes with Applications to Anomaly Ranking'
abstract: 'Extremes play a special role in Anomaly Detection. Beyond inference and simulation purposes, probabilistic tools borrowed from Extreme Value Theory (EVT), such as the \textitangular measure, can also be used to design novel statistical learning methods for Anomaly Detection/ranking. This paper proposes a new algorithm based on multivariate EVT to learn how to rank observations in a high dimensional space with respect to their degree of ‘abnormality’. The procedure relies on an original dimension-reduction technique in the extreme domain that possibly produces a sparse representation of multivariate extremes and allows to gain insight into the dependence structure thereof, escaping the curse of dimensionality. The representation output by the unsupervised methodology we propose here can be combined with any Anomaly Detection technique tailored to non-extreme data. As it performs linearly with the dimension and almost linearly in the data (in O(d n \log n)), it fits to large scale problems. The approach in this paper is novel in that EVT has never been used in its multivariate version in the field of Anomaly Detection. Illustrative experimental results provide strong empirical evidence of the relevance of our approach.'
volume: 51
URL: http://proceedings.mlr.press/v51/goix16.html
PDF: http://proceedings.mlr.press/v51/goix16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-goix16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Goix
given: Nicolas
- family: Sabourin
given: Anne
- family: Clémençon
given: Stéphan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 75-83
id: goix16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 75
lastpage: 83
published: 2016-05-02 00:00:00 +0000
- title: 'A Robust-Equitable Copula Dependence Measure for Feature Selection'
abstract: 'Feature selection aims to select relevant features to improve the performance of predictors. Many feature selection methods depend on the choice of dependence measures. To select features that have complex nonlinear relationships with the response variable, the dependence measure should be equitable: treating linear and nonlinear relationships equally. In this paper we introduce the concept of robust-equitability and a robust-equitable dependence measure copula correlation (Ccor). This measure has the following advantages compared to existing dependence measures: it is robust to different relationship forms and robust to unequal sample sizes of different features. In contrast, existing dependence measures cannot take these factors into account simultaneously. Experiments on synthetic and real-world datasets confirm our theoretical analysis, and illustrates its advantage in feature selection.'
volume: 51
URL: http://proceedings.mlr.press/v51/chang16.html
PDF: http://proceedings.mlr.press/v51/chang16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-chang16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chang
given: Yale
- family: Li
given: Yi
- family: Ding
given: Adam
- family: Dy
given: Jennifer
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 84-92
id: chang16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 84
lastpage: 92
published: 2016-05-02 00:00:00 +0000
- title: 'Random Forest for the Contextual Bandit Problem'
abstract: 'To address the contextual bandit problem, we propose an online random forest algorithm. The analysis of the proposed algorithm is based on the sample complexity needed to find the optimal decision stump. Then, the decision stumps are recursively stacked in a random collection of decision trees, BANDIT FOREST. We show that the proposed algorithm is optimal up to logarithmic factors. The dependence of the sample complexity upon the number of contextual variables is logarithmic. The computational cost of the proposed algorithm with respect to the time horizon is linear. These analytical results allow the proposed algorithm to be efficient in real applications , where the number of events to process is huge, and where we expect that some contextual variables, chosen from a large set, have potentially non-linear dependencies with the rewards. In the experiments done to illustrate the theoretical analysis, BANDIT FOREST obtain promising results in comparison with state-of-the-art algorithms.'
volume: 51
URL: http://proceedings.mlr.press/v51/feraud16.html
PDF: http://proceedings.mlr.press/v51/feraud16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-feraud16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Féraud
given: Raphaël
- family: Allesiardo
given: Robin
- family: Urvoy
given: Tanguy
- family: Clérot
given: Fabrice
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 93-101
id: feraud16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 93
lastpage: 101
published: 2016-05-02 00:00:00 +0000
- title: 'Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics'
abstract: 'Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from observed behavior of an agent. Since the agent’s behavior originates in its policy and MDP policies depend on both the stochastic system dynamics as well as the reward function, the solution of the inverse problem is significantly influenced by both. Current IRL approaches assume that if the transition model is unknown, additional samples from the system’s dynamics are accessible, or the observed behavior provides enough samples of the system’s dynamics to solve the inverse problem accurately. These assumptions are often not satisfied. To overcome this, we present a gradient-based IRL approach that simultaneously estimates the system’s dynamics. By solving the combined optimization problem, our approach takes into account the bias of the demonstrations, which stems from the generating policy. The evaluation on a synthetic MDP and a transfer learning task shows improvements regarding the sample efficiency as well as the accuracy of the estimated reward functions and transition models.'
volume: 51
URL: http://proceedings.mlr.press/v51/herman16.html
PDF: http://proceedings.mlr.press/v51/herman16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-herman16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Herman
given: Michael
- family: Gindele
given: Tobias
- family: Wagner
given: Jörg
- family: Schmitt
given: Felix
- family: Burgard
given: Wolfram
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 102-110
id: herman16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 102
lastpage: 110
published: 2016-05-02 00:00:00 +0000
- title: 'Learning Sparse Additive Models with Interactions in High Dimensions'
abstract: 'A function f: \mathbbR^d →\mathbbR is referred to as a Sparse Additive Model (SPAM), if it is of the form f(x) = \sum_l ∈S \phi_l(x_l), where S ⊂[d], |S| ≪d. Assuming \phi_l’s and S to be unknown, the problem of estimating f from its samples has been studied extensively. In this work, we consider a generalized SPAM, allowing for second order interaction terms. For some S_1 ⊂[d], S_2 ⊂[d] \choose 2, the function f is assumed to be of the form: f(x) = \sum_p ∈S_1 \phi_p (x_p) + \sum_(l,l’) ∈S_2 \phi_l,l’ (x_l, x_l’). Assuming \phi_p, \phi_(l,l’), S_1 and S_2 to be unknown, we provide a randomized algorithm that queries f and exactly recovers S_1,S_2. Consequently, this also enables us to estimate the underlying \phi_p, \phi_l,l’. We derive sample complexity bounds for our scheme and also extend our analysis to include the situation where the queries are corrupted with noise – either stochastic, or arbitrary but bounded. Lastly, we provide simulation results on synthetic data, that validate our theoretical findings.'
volume: 51
URL: http://proceedings.mlr.press/v51/tyagi16.html
PDF: http://proceedings.mlr.press/v51/tyagi16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-tyagi16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tyagi
given: Hemant
- family: Kyrillidis
given: Anastasios
- family: Gärtner
given: Bernd
- family: Krause
given: Andreas
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 111-120
id: tyagi16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 111
lastpage: 120
published: 2016-05-02 00:00:00 +0000
- title: 'Bipartite Correlation Clustering: Maximizing Agreements'
abstract: 'In Bipartite Correlation Clustering (BCC) we are given a complete bipartite graph G with ’+’ and ’-’ edges, and we seek a vertex clustering that maximizes the number of agreements: the number of all ’+’ edges within clusters plus all ’-’ edges cut across clusters. BCC is known to be NP-hard [5]. We present a novel approximation algorithm for k-BCC, a variant of BCC with an upper bound k on the number of clusters. Our algorithm outputs a k-clustering that provably achieves a number of agreements within a multiplicative (1-δ)-factor from the optimal, for any desired accuracy δ. It relies on solving a combinatorially constrained bilinear maximization on the bi-adjacency matrix of G. It runs in time exponential in k and 1/δ, but linear in the size of the input. Further, we show that, in the (unconstrained) BCC setting, an (1-δ)-approximation can be achieved by O(1/δ) clusters regardless of the size of the graph. In turn, our k-BCC algorithm implies an Efficient PTAS for the BCC objective of maximizing agreements.'
volume: 51
URL: http://proceedings.mlr.press/v51/asteris16.html
PDF: http://proceedings.mlr.press/v51/asteris16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-asteris16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Asteris
given: Megasthenis
- family: Kyrillidis
given: Anastasios
- family: Papailiopoulos
given: Dimitris
- family: Dimakis
given: Alexandros
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 121-129
id: asteris16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 121
lastpage: 129
published: 2016-05-02 00:00:00 +0000
- title: 'Breaking Sticks and Ambiguities with Adaptive Skip-gram'
abstract: 'The recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words. However, Skip-gram as well as most prior work on learning word representations does not take into account word ambiguity and maintain only single representation per word. Although a number of Skip-gram modifications were proposed to overcome this limitation and learn multi-prototype word representations, they either require a known number of word meanings or learn them using greedy heuristic approaches. In this paper we propose the Adaptive Skip-gram model which is a nonparametric Bayesian extension of Skip-gram capable to automatically learn the required number of representations for all words at desired semantic resolution. We derive efficient online variational learning algorithm for the model and empirically demonstrate its efficiency on word-sense induction task.'
volume: 51
URL: http://proceedings.mlr.press/v51/bartunov16.html
PDF: http://proceedings.mlr.press/v51/bartunov16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-bartunov16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bartunov
given: Sergey
- family: Kondrashkin
given: Dmitry
- family: Osokin
given: Anton
- family: Vetrov
given: Dmitry
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 130-138
id: bartunov16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 130
lastpage: 138
published: 2016-05-02 00:00:00 +0000
- title: 'Top Arm Identification in Multi-Armed Bandits with Batch Arm Pulls'
abstract: 'We introduce a new multi-armed bandit (MAB) problem in which arms must be sampled in batches, rather than one at a time. This is motivated by applications in social media monitoring and biological experimentation where such batch constraints naturally arise. This paper develops and analyzes algorithms for batch MABs and top arm identification, for both fixed confidence and fixed budget settings. Our main theoretical results show that the batch constraint does not significantly affect the sample complexity of top arm identification compared to unconstrained MAB algorithms. Alternatively, if one views a batch as the fundamental sampling unit, then the results can be interpreted as showing that the sample complexity of batch MABs can be significantly less than traditional MABs. We demonstrate the new batch MAB algorithms with simulations and in two interesting real-world applications: (i) microwell array experiments for identifying genes that are important in virus replication and (ii) finding the most active users in Twitter on a specific topic.'
volume: 51
URL: http://proceedings.mlr.press/v51/jun16.html
PDF: http://proceedings.mlr.press/v51/jun16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-jun16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jun
given: Kwang-Sung
- family: Jamieson
given: Kevin
- family: Nowak
given: Robert
- family: Zhu
given: Xiaojin
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 139-148
id: jun16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 139
lastpage: 148
published: 2016-05-02 00:00:00 +0000
- title: 'Limits on Sparse Support Recovery via Linear Sketching with Random Expander Matrices'
abstract: 'Linear sketching is a powerful tool for the problem of sparse signal recovery, having numerous applications such as compressive sensing, data stream computing, graph sketching, and routing. Motivated by applications where the \emphpositions of the non-zero entries in a sparse vector are of primary interest, we consider the problem of \emphsupport recovery from a linear sketch taking the form \mathbfY = \mathbfXβ+ \mathbfZ. We focus on a widely-used expander-based construction in the columns of the measurement matrix \mathbfX ∈\mathbbR^n \times p are random permutations of a sparse binary vector containing d ≪n ones and n-d zeros. We provide a sharp characterization of the number of measurements required for an information-theoretically optimal decoder, thus permitting a precise comparison to the i.i.d. Gaussian construction. Our findings reveal both positive and negative results, showing that the performance nearly matches the Gaussian construction at moderate-to-high noise levels, while being worse by an arbitrarily large factor at low noise levels.'
volume: 51
URL: http://proceedings.mlr.press/v51/scarlett16.html
PDF: http://proceedings.mlr.press/v51/scarlett16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-scarlett16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Scarlett
given: Jonathan
- family: Cevher
given: Volkan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 149-158
id: scarlett16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 149
lastpage: 158
published: 2016-05-02 00:00:00 +0000
- title: 'Maximum Likelihood for Variance Estimation in High-Dimensional Linear Models'
abstract: 'We study maximum likelihood estimators (MLEs) for the residual variance, the signal-to-noise ratio, and other variance parameters in high-dimensional linear models. These parameters are essential in many statistical applications involving regression diagnostics, inference, tuning parameter selection for high-dimensional regression, and other applications, including genetics. The estimators that we study are not new, and have been widely used for variance component estimation in linear random-effects models. However, our analysis is new and it implies that the MLEs, which were devised for random-effects models, may also perform very well in high-dimensional linear models with fixed-effects, which are more commonly studied in some areas of high-dimensional statistics. The MLEs are shown to be consistent and asymptotically normal in fixed-effects models with random design, in asymptotic settings where the number of predictors (p) is proportional to the number of observations (n). Moreover, the estimators’ asymptotic variance can be given explicitly in terms moments of the Marcenko-Pastur distribution. A variety of analytical and empirical results show that the MLEs outperform other, previously proposed estimators for variance parameters in high-dimensional linear models with fixed-effects. More broadly, the results in this paper illustrate a strategy for drawing connections between fixed- and random-effects models in high dimensions, which may be useful in other applications.'
volume: 51
URL: http://proceedings.mlr.press/v51/dicker16.html
PDF: http://proceedings.mlr.press/v51/dicker16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-dicker16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Dicker
given: Lee H.
- family: Erdogdu
given: Murat A.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 159-167
id: dicker16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 159
lastpage: 167
published: 2016-05-02 00:00:00 +0000
- title: 'Scalable Gaussian Process Classification via Expectation Propagation'
abstract: 'Variational methods have been recently considered for scaling the training process of Gaussian process classifiers to large datasets. As an alternative, we describe here how to train these classifiers efficiently using expectation propagation (EP). The proposed EP method allows to train Gaussian process classifiers on very large datasets, with millions of instances, that were out of the reach of previous implementations of EP. More precisely, it can be used for (i) training in a distributed fashion where the data instances are sent to different nodes in which the required computations are carried out, and for (ii) maximizing an estimate of the marginal likelihood using a stochastic approximation of the gradient. Several experiments involving large datasets show that the method described is competitive with the variational approach.'
volume: 51
URL: http://proceedings.mlr.press/v51/hernandez-lobato16.html
PDF: http://proceedings.mlr.press/v51/hernandez-lobato16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-hernandez-lobato16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hernandez-Lobato
given: Daniel
- family: Hernandez-Lobato
given: Jose Miguel
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 168-176
id: hernandez-lobato16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 168
lastpage: 176
published: 2016-05-02 00:00:00 +0000
- title: 'Precision Matrix Estimation in High Dimensional Gaussian Graphical Models with Faster Rates'
abstract: 'In this paper, we present a new estimator for precision matrix in high dimensional Gaussian graphical models. At the core of the proposed estimator is a collection of node-wise linear regression with nonconvex penalty. In contrast to existing estimators for Gaussian graphical models with O(s\sqrt\log d/n) estimation error bound in terms of spectral norm, where s is the maximum degree of a graph, the proposed estimator could attain O(s/\sqrtn+\sqrt\log d/n) spectral norm based convergence rate in the best case, and it is no worse than exiting estimators in general. In addition, our proposed estimator enjoys the oracle property under a milder condition than existing estimators. We show through extensive experiments on both synthetic and real datasets that our estimator outperforms the state-of-the art estimators.'
volume: 51
URL: http://proceedings.mlr.press/v51/wang16a.html
PDF: http://proceedings.mlr.press/v51/wang16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-wang16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Lingxiao
- family: Ren
given: Xiang
- family: Gu
given: Quanquan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 177-185
id: wang16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 177
lastpage: 185
published: 2016-05-02 00:00:00 +0000
- title: 'On the Reducibility of Submodular Functions'
abstract: 'The scalability of submodular optimization methods is critical for their usability in practice. In this paper, we study the reducibility of submodular functions, a property that enables us to reduce the solution space of submodular optimization problems without performance loss. We introduce the concept of reducibility using marginal gains. Then we show that by adding perturbation, we can endow irreducible functions with reducibility, based on which we propose the perturbation-reduction optimization framework. Our theoretical analysis proves that given the perturbation scales, the reducibility gain could be computed, and the performance loss has additive upper bounds. We further conduct empirical studies and the results demonstrate that our proposed framework significantly accelerates existing optimization methods for irreducible submodular functions with a cost of only small performance losses.'
volume: 51
URL: http://proceedings.mlr.press/v51/mei16.html
PDF: http://proceedings.mlr.press/v51/mei16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-mei16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Mei
given: Jincheng
- family: Zhang
given: Hao
- family: Lu
given: Bao-Liang
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 186-194
id: mei16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 186
lastpage: 194
published: 2016-05-02 00:00:00 +0000
- title: 'Accelerated Stochastic Gradient Descent for Minimizing Finite Sums'
abstract: 'We propose an optimization method for minimizing the finite sums of smooth convex functions. Our method incorporates an accelerated gradient descent (AGD) and a stochastic variance reduction gradient (SVRG) in a mini-batch setting. An important feature of the method is that it can be directly applied to general convex and semi-strongly convex problems that is a weaker condition than strong convexity. We show that our method achieves a better overall complexity for the general convex problems and linear convergence for optimal strongly convex problems. Moreover we prove the fast iteration complexity of our method. Our experiments show the effectiveness of our method.'
volume: 51
URL: http://proceedings.mlr.press/v51/nitanda16.html
PDF: http://proceedings.mlr.press/v51/nitanda16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-nitanda16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Nitanda
given: Atsushi
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 195-203
id: nitanda16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 195
lastpage: 203
published: 2016-05-02 00:00:00 +0000
- title: 'Fast Convergence of Online Pairwise Learning Algorithms'
abstract: 'Pairwise learning usually refers to a learning task which involves a loss function depending on pairs of examples, among which most notable ones are bipartite ranking, metric learning and AUC maximization. In this paper, we focus on online learning algorithms for pairwise learning problems without strong convexity, for which all previously known algorithms achieve a convergence rate of \mathcalO(1/\sqrtT) after T iterations. In particular, we study an online learning algorithm for pairwise learning with a least-square loss function in an unconstrained setting. We prove that the convergence of its last iterate can converge to the desired minimizer at a rate arbitrarily close to \mathcalO(1/T) up to logarithmic factor. The rates for this algorithm are established in high probability under the assumptions of polynomially decaying step sizes.'
volume: 51
URL: http://proceedings.mlr.press/v51/boissier16.html
PDF: http://proceedings.mlr.press/v51/boissier16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-boissier16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Boissier
given: Martin
- family: Lyu
given: Siwei
- family: Ying
given: Yiming
- family: Zhou
given: Ding-Xuan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 204-212
id: boissier16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 204
lastpage: 212
published: 2016-05-02 00:00:00 +0000
- title: 'Computationally Efficient Bayesian Learning of Gaussian Process State Space Models'
abstract: 'Gaussian processes allow for flexible specification of prior assumptions of unknown dynamics in state space models. We present a procedure for efficient Bayesian learning in Gaussian process state space models, where the representation is formed by projecting the problem onto a set of approximate eigenfunctions derived from the prior covariance structure. Learning under this family of models can be conducted using a carefully crafted particle MCMC algorithm. This scheme is computationally efficient and yet allows for a fully Bayesian treatment of the problem. Compared to conventional system identification tools or existing learning methods, we show competitive performance and reliable quantification of uncertainties in the model.'
volume: 51
URL: http://proceedings.mlr.press/v51/svensson16.html
PDF: http://proceedings.mlr.press/v51/svensson16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-svensson16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Svensson
given: Andreas
- family: Solin
given: Arno
- family: Särkkä
given: Simo
- family: Schön
given: Thomas
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 213-221
id: svensson16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 213
lastpage: 221
published: 2016-05-02 00:00:00 +0000
- title: 'Generalized Ideal Parent (GIP): Discovering non-Gaussian Hidden Variables'
abstract: 'A formidable challenge in uncertainty modeling in general, and when learning Bayesian networks in particular, is the discovery of unknown hidden variables. Few works that tackle this task are typically limited to discrete or Gaussian domains, or to tree structures. We propose a novel general purpose approach for discovering hidden variables in flexible non-Gaussian domains using the powerful class of Gaussian copula networks. Briefly, we define the concept of a hypothetically optimal predictor of variable and show it can be used to discover useful hidden variables in the expressive framework of copula networks. Our approach leads to performance and compactness advantages over competitors in a variety of domains.'
volume: 51
URL: http://proceedings.mlr.press/v51/tenzer16.html
PDF: http://proceedings.mlr.press/v51/tenzer16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-tenzer16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tenzer
given: Yaniv
- family: Elidan
given: Gal
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 222-230
id: tenzer16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 222
lastpage: 230
published: 2016-05-02 00:00:00 +0000
- title: 'On Sparse Variational Methods and the Kullback-Leibler Divergence between Stochastic Processes'
abstract: 'The variational framework for learning inducing variables (Titsias, 2009) has had a large impact on the Gaussian process literature. The framework may be interpreted as minimizing a rigorously defined Kullback-Leibler divergence between the approximating and posterior processes. To our knowledge this connection has thus far gone unremarked in the literature. In this paper we give a substantial generalization of the literature on this topic. We give a new proof of the result for infinite index sets which allows inducing points that are not data points and likelihoods that depend on all function values. We then discuss augmented index sets and show that, contrary to previous works, marginal consistency of augmentation is not enough to guarantee consistency of variational inference with the original model. We then characterize an extra condition where such a guarantee is obtainable. Finally we show how our framework sheds light on interdomain sparse approximations and sparse approximations for Cox processes.'
volume: 51
URL: http://proceedings.mlr.press/v51/matthews16.html
PDF: http://proceedings.mlr.press/v51/matthews16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-matthews16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Matthews
given: Alexander G. de G.
- family: Hensman
given: James
- family: Turner
given: Richard
- family: Ghahramani
given: Zoubin
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 231-239
id: matthews16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 231
lastpage: 239
published: 2016-05-02 00:00:00 +0000
- title: 'Non-stochastic Best Arm Identification and Hyperparameter Optimization'
abstract: 'Motivated by the task of hyperparameter optimization, we introduce the \em non-stochastic best-arm identification problem. We identify an attractive algorithm for this setting that makes no assumptions on the convergence behavior of the arms’ losses, has no free-parameters to adjust, provably outperforms the uniform allocation baseline in favorable conditions, and performs comparably (up to \log factors) otherwise. Next, by leveraging the iterative nature of many learning algorithms, we cast hyperparameter optimization as an instance of non-stochastic best-arm identification. Our empirical results show that, by allocating more resources to promising hyperparameter settings, our approach achieves comparable test accuracies an order of magnitude faster than the uniform strategy. The robustness and simplicity of our approach makes it well-suited to ultimately replace the uniform strategy currently used in most machine learning software packages.'
volume: 51
URL: http://proceedings.mlr.press/v51/jamieson16.html
PDF: http://proceedings.mlr.press/v51/jamieson16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-jamieson16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jamieson
given: Kevin
- family: Talwalkar
given: Ameet
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 240-248
id: jamieson16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 240
lastpage: 248
published: 2016-05-02 00:00:00 +0000
- title: 'A Linearly-Convergent Stochastic L-BFGS Algorithm'
abstract: 'We propose a new stochastic L-BFGS algorithm and prove a linear convergence rate for strongly convex and smooth functions. Our algorithm draws heavily from a recent stochastic variant of L-BFGS proposed in Byrd et al. (2014) as well as a recent approach to variance reduction for stochastic gradient descent from Johnson and Zhang (2013). We demonstrate experimentally that our algorithm performs well on large-scale convex and non-convex optimization problems, exhibiting linear convergence and rapidly solving the optimization problems to high levels of precision. Furthermore, we show that our algorithm performs well for a wide-range of step sizes, often differing by several orders of magnitude.'
volume: 51
URL: http://proceedings.mlr.press/v51/moritz16.html
PDF: http://proceedings.mlr.press/v51/moritz16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-moritz16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Moritz
given: Philipp
- family: Nishihara
given: Robert
- family: Jordan
given: Michael
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 249-258
id: moritz16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 249
lastpage: 258
published: 2016-05-02 00:00:00 +0000
- title: 'No Regret Bound for Extreme Bandits'
abstract: 'Algorithms for hyperparameter optimization abound, all of which work well under different and often unverifiable assumptions. Motivated by the general challenge of sequentially choosing which algorithm to use, we study the more specific task of choosing among distributions to use for random hyperparameter optimization. This work is naturally framed in the extreme bandit setting, which deals with sequentially choosing which distribution from a collection to sample in order to minimize (maximize) the single best cost (reward). Whereas the distributions in the standard bandit setting are primarily characterized by their means, a number of subtleties arise when we care about the minimal cost as opposed to the average cost. For example, there may not be a well-defined “best” distribution as there is in the standard bandit setting. The best distribution depends on the rewards that have been obtained and on the remaining time horizon. Whereas in the standard bandit setting, it is sensible to compare policies with an oracle which plays the single best arm, in the extreme bandit setting, there are multiple sensible oracle models. We define a sensible notion of “extreme regret” in the extreme bandit setting, which parallels the concept of regret in the standard bandit setting. We then prove that no policy can asymptotically achieve no extreme regret. '
volume: 51
URL: http://proceedings.mlr.press/v51/nishihara16.html
PDF: http://proceedings.mlr.press/v51/nishihara16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-nishihara16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Nishihara
given: Robert
- family: Lopez-Paz
given: David
- family: Bottou
given: Leon
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 259-267
id: nishihara16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 259
lastpage: 267
published: 2016-05-02 00:00:00 +0000
- title: 'Tensor vs. Matrix Methods: Robust Tensor Decomposition under Block Sparse Perturbations'
abstract: 'Robust tensor CP decomposition involves decomposing a tensor into low rank and sparse components. We propose a novel non-convex iterative algorithm with guaranteed recovery. It alternates between low-rank CP decomposition through gradient ascent (a variant of the tensor power method), and hard thresholding of the residual. We prove convergence to the globally optimal solution under natural incoherence conditions on the low rank component, and bounded level of sparse perturbations. We compare our method with natural baselines, which apply robust matrix PCA either to the \em flattened tensor, or to the matrix slices of the tensor. Our method can provably handle a far greater level of perturbation when the sparse tensor is block-structured. This naturally occurs in many applications such as the foreground-background separation task in videos. Our experiments validate these findings. Thus, we establish that tensor methods can tolerate a higher level of gross corruptions compared to matrix methods.'
volume: 51
URL: http://proceedings.mlr.press/v51/anandkumar16.html
PDF: http://proceedings.mlr.press/v51/anandkumar16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-anandkumar16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Anandkumar
given: Anima
- family: Jain
given: Prateek
- family: Shi
given: Yang
- family: Niranjan
given: U. N.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 268-276
id: anandkumar16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 268
lastpage: 276
published: 2016-05-02 00:00:00 +0000
- title: 'Online Learning to Rank with Feedback at the Top'
abstract: 'We consider an online learning to rank setting in which, at each round, an oblivious adversary generates a list of m documents, pertaining to a query, and the learner ranks the documents according to assigned scores. The adversary then generates a relevance vector and the learner updates its ranker according to the feedback received. We consider the setting where the feedback is restricted to be the relevance levels of only the top k documents in the ranked list, for k ≪m. However, the performance of learner is judged based on the unrevealed full relevance vectors, using an appropriate learning to rank loss function. We develop efficient algorithms for well known losses in the pointwise, pairwise and listwise families. We also prove that no online algorithm can have sublinear regret, with top 1 feedback, for any loss that is calibrated with respect to NDCG. We apply our algorithms on benchmark datasets demonstrating efficient online learning of a ranking function from highly restricted feedback.'
volume: 51
URL: http://proceedings.mlr.press/v51/chaudhuri16.html
PDF: http://proceedings.mlr.press/v51/chaudhuri16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-chaudhuri16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chaudhuri
given: Sougata
- family: Tewari
given: Ambuj Tewari
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 277-285
id: chaudhuri16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 277
lastpage: 285
published: 2016-05-02 00:00:00 +0000
- title: 'Survey Propagation beyond Constraint Satisfaction Problems'
abstract: 'Survey propagation (SP) is a message passing procedure that attempts to model all the fixed points of Belief Propagation (BP), thereby improving BP’s approximation in loopy graphs where BP’s assumptions do not hold. For this, SP messages represent distributions over BP messages. Unfortunately this requirement makes SP intractable beyond constraint satisfaction problems because, to perform general SP updates, one has to operate on distributions over a continuous domain. We propose an approximation scheme to efficiently extend the application of SP to marginalization in binary pairwise graphical models. Our approximate SP has O(DK\log(DK)t) complexity per iteration, where t is the complexity of BP per iteration, D is the maximum node degree and K is a resolution constant controlling the approximation’s fidelity. Our experiments show that this method can track many BP fixed points, achieving a high marginalization accuracy within a few iterations, in difficult settings where BP is often non-convergent and inaccurate.'
volume: 51
URL: http://proceedings.mlr.press/v51/srinivasa16.html
PDF: http://proceedings.mlr.press/v51/srinivasa16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-srinivasa16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Srinivasa
given: Christopher
- family: Ravanbakhsh
given: Siamak
- family: Frey
given: Brendan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 286-295
id: srinivasa16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 286
lastpage: 295
published: 2016-05-02 00:00:00 +0000
- title: 'Score Permutation Based Finite Sample Inference for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) Models'
abstract: 'A standard model of (conditional) heteroscedasticity, i.e., the phenomenon that the variance of a process changes over time, is the Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) model, which is especially important for economics and finance. GARCH models are typically estimated by the Quasi-Maximum Likelihood (QML) method, which works under mild statistical assumptions. Here, we suggest a finite sample approach, called ScoPe, to construct distribution-free confidence regions around the QML estimate, which have exact coverage probabilities, despite no additional assumptions about moments are made. ScoPe is inspired by the recently developed Sign-Perturbed Sums (SPS) method, which however cannot be applied in the GARCH case. ScoPe works by perturbing the score function using randomly permuted residuals. This produces alternative samples which lead to exact confidence regions. Experiments on simulated and stock market data are also presented, and ScoPe is compared with the asymptotic theory and bootstrap approaches.'
volume: 51
URL: http://proceedings.mlr.press/v51/csaji16.html
PDF: http://proceedings.mlr.press/v51/csaji16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-csaji16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Csáji
given: Balázs Csanád
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 296-304
id: csaji16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 296
lastpage: 304
published: 2016-05-02 00:00:00 +0000
- title: 'CRAFT: ClusteR-specific Assorted Feature selecTion'
abstract: 'We present a hierarchical Bayesian framework for clustering with cluster-specific feature selection. We derive a simplified model, CRAFT, by analyzing the asymptotic behavior of the log posterior formulations in a nonparametric MAP-based clustering setting in this framework. CRAFT handles assorted data, i.e., both numeric and categorical data, and the underlying objective functions are intuitively appealing. The resulting algorithm is simple to implement and scales nicely, requires minimal parameter tuning, obviates the need to specify the number of clusters a priori, and compares favorably with other state-of-the-art methods on real datasets. We also provide empirical evidence on carefully designed synthetic data sets to highlight the robustness of the algorithm to recover the underlying feature subspaces, even when the average dimensionality of the features across clusters is misspecified. Besides, the framework seamlessly allows for multiple views of clustering by interpolating between the two extremes of cluster-specific feature selection and global selection, and recovers the DP-means objective under the degenerate setting of clustering without feature selection.'
volume: 51
URL: http://proceedings.mlr.press/v51/garg16.html
PDF: http://proceedings.mlr.press/v51/garg16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-garg16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Garg
given: Vikas K.
- family: Rudin
given: Cynthia
- family: Jaakkola
given: Tommi
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 305-313
id: garg16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 305
lastpage: 313
published: 2016-05-02 00:00:00 +0000
- title: 'Time-Varying Gaussian Process Bandit Optimization'
abstract: 'We consider the sequential Bayesian optimization problem with bandit feedback, adopting a formulation that allows for the reward function to vary with time. We model the reward function using a Gaussian process whose evolution obeys a simple Markov model. We introduce two natural extensions of the classical Gaussian process upper confidence bound (GP-UCB) algorithm. The first, R-GP-UCB, resets GP-UCB at regular intervals. The second, TV-GP-UCB, instead forgets about old data in a smooth fashion. Our main contribution comprises of novel regret bounds for these algorithms, providing an explicit characterization of the trade-off between the time horizon and the rate at which the function varies. We illustrate the performance of the algorithms on both synthetic and real data, and we find the gradual forgetting of TV-GP-UCB to perform favorably compared to the sharp resetting of R-GP-UCB. Moreover, both algorithms significantly outperform classical GP-UCB, since it treats stale and fresh data equally.'
volume: 51
URL: http://proceedings.mlr.press/v51/bogunovic16.html
PDF: http://proceedings.mlr.press/v51/bogunovic16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-bogunovic16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bogunovic
given: Ilija
- family: Scarlett
given: Jonathan
- family: Cevher
given: Volkan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 314-323
id: bogunovic16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 314
lastpage: 323
published: 2016-05-02 00:00:00 +0000
- title: 'Bayes-Optimal Effort Allocation in Crowdsourcing: Bounds and Index Policies'
abstract: 'We consider effort allocation in crowdsourcing, where we wish to assign labeling tasks to imperfect homogeneous crowd workers to maximize overall accuracy in a continuous-time Bayesian setting, subject to budget and time constraints. The Bayes-optimal policy for this problem is the solution to a partially observable Markov decision process, but the curse of dimensionality renders the computation infeasible. Following a similar approach to the Lagrangian Relaxation technique in Adelman and Mersereau (2008), we provide a computationally tractable instance-specific upper bound on the value of this Bayes-optimal policy, which can in turn be used to bound the optimality gap of any other sub-optimal policy. In an approach similar in spirit to the Whittle index for restless multi-armed bandits, we provide an index policy for effort allocation in crowdsourcing and demonstrate numerically that it outperforms other state-of-the-art policies and performs close to optimal.'
volume: 51
URL: http://proceedings.mlr.press/v51/hu16a.html
PDF: http://proceedings.mlr.press/v51/hu16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-hu16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hu
given: Weici
- family: Frazier
given: Peter
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 324-332
id: hu16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 324
lastpage: 332
published: 2016-05-02 00:00:00 +0000
- title: 'Bayesian Markov Blanket Estimation'
abstract: 'This paper considers a Bayesian view for estimating the Markov blanket of a set of query variables, where the set of potential neighbours here is big. We factorize the posterior such that the Markov blanket is conditionally independent of the network of the potential neighbours. By exploiting this blockwise decoupling, we derive analytic expressions for posterior conditionals. Subsequently, we develop an inference scheme, which makes use of the factorization. As a result, estimation of a sub-network is possible without inferring an entire network. Since the resulting Gibbs sampler scales linearly with the number of variables, it can handle relatively large neighbourhoods. The proposed scheme results in faster convergence and superior mixing of the Markov chain than existing Bayesian network estimation techniques.'
volume: 51
URL: http://proceedings.mlr.press/v51/kaufmann16.html
PDF: http://proceedings.mlr.press/v51/kaufmann16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-kaufmann16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kaufmann
given: Dinu
- family: Parbhoo
given: Sonali
- family: Wieczorek
given: Aleksander
- family: Keller
given: Sebastian
- family: Adametz
given: David
- family: Roth
given: Volker
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 333-341
id: kaufmann16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 333
lastpage: 341
published: 2016-05-02 00:00:00 +0000
- title: 'Dreaming More Data: Class-dependent Distributions over Diffeomorphisms for Learned Data Augmentation'
abstract: 'Data augmentation is a key element in training high-dimensional models. In this approach, one synthesizes new observations by applying pre-specified transformations to the original training data; e.g. new images are formed by rotating old ones. Current augmentation schemes, however, rely on manual specification of the applied transformations, making data augmentation an implicit form of feature engineering. With an eye towards true end-to-end learning, we suggest learning the applied transformations on a per-class basis. Particularly, we align image pairs within each class under the assumption that the spatial transformation between images belongs to a large class of diffeomorphisms. We then learn a class-specific probabilistic generative models of the transformations in a Riemannian submanifold of the Lie group of diffeomorphisms. We demonstrate significant performance improvements in training deep neural nets over manually-specified augmentation schemes. Our code and augmented datasets are available online.'
volume: 51
URL: http://proceedings.mlr.press/v51/hauberg16.html
PDF: http://proceedings.mlr.press/v51/hauberg16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-hauberg16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hauberg
given: Søren
- family: Freifeld
given: Oren
- family: Larsen
given: Anders Boesen Lindbo
- family: Fisher
given: John
- family: Hansen
given: Lars
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 342-350
id: hauberg16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 342
lastpage: 350
published: 2016-05-02 00:00:00 +0000
- title: 'Unsupervised Ensemble Learning with Dependent Classifiers'
abstract: 'In unsupervised ensemble learning, one obtains predictions from multiple sources or classifiers, yet without knowing the reliability and expertise of each source, and with no labeled data to assess it. The task is to combine these possibly conﬂicting predictions into an accurate meta-learner. Most works to date assumed perfect diversity between the different sources, a property known as conditional independence. In realistic scenarios, however, this assumption is often violated, and ensemble learners based on it can be severely sub-optimal. The key challenges we address in this paper are: (i) how to detect, in an unsupervised manner, strong violations of conditional independence; and (ii) construct a suitable meta-learner. To this end we introduce a statistical model that allows for dependencies between classifiers. Based on this model, we develop novel unsupervised methods to detect strongly dependent classifiers, better estimate their accuracies, and construct an improved meta-learner. Using both artificial and real datasets, we showcase the importance of taking classifier dependencies into account and the competitive performance of our approach.'
volume: 51
URL: http://proceedings.mlr.press/v51/jaffe16.html
PDF: http://proceedings.mlr.press/v51/jaffe16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-jaffe16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jaffe
given: Ariel
- family: Fetaya
given: Ethan
- family: Nadler
given: Boaz
- family: Jiang
given: Tingting
- family: Kluger
given: Yuval
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 351-360
id: jaffe16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 351
lastpage: 360
published: 2016-05-02 00:00:00 +0000
- title: 'Multi-Level Cause-Effect Systems'
abstract: 'We present a domain-general account of causation that applies to settings in which macro-level causal relations between two systems are of interest, but the relevant causal features are poorly understood and have to be aggregated from vast arrays of micro-measurements. Our approach generalizes that of Chalupka et. al. (2015) to the setting in which the macro-level effect is not specified. We formalize the connection between micro- and macro-variables in such situations and provide a coherent framework describing causal relations at multiple levels of analysis. We present an algorithm that discovers macro-variable causes and effects from micro-level measurements obtained from an experiment. We further show how to design experiments to discover macro-variables from observational micro-variable data. Finally, we show that under specific conditions, one can identify multiple levels of causal structure. Throughout the article, we use a simulated neuroscience multi-unit recording experiment to illustrate the ideas and the algorithms.'
volume: 51
URL: http://proceedings.mlr.press/v51/chalupka16.html
PDF: http://proceedings.mlr.press/v51/chalupka16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-chalupka16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chalupka
given: Krzysztof
- family: Eberhardt
given: Frederick
- family: Perona
given: Pietro
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 361-369
id: chalupka16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 361
lastpage: 369
published: 2016-05-02 00:00:00 +0000
- title: 'Deep Kernel Learning'
abstract: 'We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost O(n) for n training points, and predictions cost O(1) per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.'
volume: 51
URL: http://proceedings.mlr.press/v51/wilson16.html
PDF: http://proceedings.mlr.press/v51/wilson16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-wilson16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wilson
given: Andrew Gordon
- family: Hu
given: Zhiting
- family: Salakhutdinov
given: Ruslan
- family: Xing
given: Eric P.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 370-378
id: wilson16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 370
lastpage: 378
published: 2016-05-02 00:00:00 +0000
- title: 'Nearly Optimal Classification for Semimetrics'
abstract: 'We initiate the rigorous study of classification in semimetric spaces, which are point sets with a distance function that is non-negative and symmetric, but need not satisfy the triangle inequality. We define the \em density dimension \dens and discover that it plays a central role in the statistical and algorithmic feasibility of learning in semimetric spaces. We compute this quantity for several widely used semimetrics and present nearly optimal sample compression algorithms, which are then used to obtain generalization guarantees, including fast rates. Our claim of near-optimality holds in both computational and statistical senses. When the sample has radius R and margin γ, we show that it can be compressed down to roughly d=(R/γ)^\dens points, and further that finding a significantly better compression is algorithmically intractable unless P=NP. This compression implies generalization via standard Occam-type arguments, to which we provide a nearly matching lower bound.'
volume: 51
URL: http://proceedings.mlr.press/v51/gottlieb16.html
PDF: http://proceedings.mlr.press/v51/gottlieb16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-gottlieb16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gottlieb
given: Lee-Ad
- family: Kontorovich
given: Aryeh
- family: Nisnevitch
given: Pinhas
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 379-388
id: gottlieb16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 379
lastpage: 388
published: 2016-05-02 00:00:00 +0000
- title: 'Latent Point Process Allocation'
abstract: 'We introduce a probabilistic model for the factorisation of continuous Poisson process rate functions. Our model can be thought of as a topic model for Poisson point processes in which each point is assigned to one of a set of latent rate functions that are shared across multiple outputs. We show that the model brings a means of incorporating structure in point process inference beyond the state-of-the-art. We derive an efficient variational inference scheme for the model based on sparse Gaussian processes that scales linearly in the number of data points. Finally, we demonstrate, using examples from spatial and temporal statistics, how the model can be used for discovering hidden structure with greater precision than standard frequentist approaches.'
volume: 51
URL: http://proceedings.mlr.press/v51/lloyd16.html
PDF: http://proceedings.mlr.press/v51/lloyd16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-lloyd16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lloyd
given: Chris
- family: Gunter
given: Tom
- family: Osborne
given: Michael
- family: Roberts
given: Stephen
- family: Nickson
given: Tom
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 389-397
id: lloyd16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 389
lastpage: 397
published: 2016-05-02 00:00:00 +0000
- title: 'K2-ABC: Approximate Bayesian Computation with Kernel Embeddings'
abstract: 'Complicated generative models often result in a situation where computing the likelihood of observed data is intractable, while simulating from the conditional density given a parameter value is relatively easy. Approximate Bayesian Computation (ABC) is a paradigm that enables simulation-based posterior inference in such cases by measuring the similarity between simulated and observed data in terms of a chosen set of summary statistics. However, there is no general rule to construct sufficient summary statistics for complex models. Insufficient summary statistics will leak information, which leads to ABC algorithms yielding samples from an incorrect posterior. In this paper, we propose a fully nonparametric ABC paradigm which circumvents the need for manually selecting summary statistics. Our approach, K2-ABC, uses maximum mean discrepancy (MMD) to construct a dissimilarity measure between the observed and simulated data. The embedding of an empirical distribution of the data into a reproducing kernel Hilbert space plays a role of the summary statistic and is sufficient whenever the corresponding kernels are characteristic. Experiments on a simulated scenario and a real-world biological problem illustrate the effectiveness of the proposed algorithm.'
volume: 51
URL: http://proceedings.mlr.press/v51/park16.html
PDF: http://proceedings.mlr.press/v51/park16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-park16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Park
given: Mijung
- family: Jitkrittum
given: Wittawat
- family: Sejdinovic
given: Dino
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 398-407
id: park16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 398
lastpage: 407
published: 2016-05-02 00:00:00 +0000
- title: 'Bayesian Generalised Ensemble Markov Chain Monte Carlo'
abstract: 'Bayesian generalised ensemble (BayesGE) is a new method that addresses two major drawbacks of standard Markov chain Monte Carlo algorithms for inference in high-dimensional probability models: inapplicability to estimate the partition function and poor mixing properties. BayesGE uses a Bayesian approach to iteratively update the belief about the density of states (distribution of the log likelihood under the prior) for the model, with the dual purpose of enhancing the sampling efficiency and making the estimation of the partition function tractable. We benchmark BayesGE on Ising and Potts systems and show that it compares favourably to existing state-of-the-art methods.'
volume: 51
URL: http://proceedings.mlr.press/v51/frellsen16.html
PDF: http://proceedings.mlr.press/v51/frellsen16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-frellsen16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Frellsen
given: Jes
- family: Winther
given: Ole
- family: Ghahramani
given: Zoubin
- family: Ferkinghoff-Borg
given: Jesper
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 408-416
id: frellsen16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 408
lastpage: 416
published: 2016-05-02 00:00:00 +0000
- title: 'A Lasso-based Sparse Knowledge Gradient Policy for Sequential Optimal Learning'
abstract: 'We propose a sequential learning policy for noisy discrete global optimization and ranking and selection (R&S) problems with high dimensional sparse belief functions, where there are hundreds or even thousands of features, but only a small portion of these features contain explanatory power. Our problem setting, motivated by the experimental sciences, arises where we have to choose which experiment to run next. Here the experiments are time-consuming and expensive. We derive a sparse knowledge gradient (SpKG) decision policy based on the \ell_1-penalized regression Lasso to identify the sparsity pattern before our budget is exhausted. This policy is a unique and novel hybrid of Bayesian R&S with a frequentist learning approach. Theoretically, we provide the error bound of the posterior mean estimate, which has shown to be at the minimax optimal \sqrts\log p/n rate. Controlled experiments on both synthetic data and real application for automatically designing experiments to identify the structure of an RNA molecule show that the algorithm efficiently learns the correct set of nonzero parameters. It also outperforms several other learning policies.'
volume: 51
URL: http://proceedings.mlr.press/v51/li16a.html
PDF: http://proceedings.mlr.press/v51/li16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-li16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Yan
- family: Liu
given: Han
- family: Powell
given: Warren
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 417-425
id: li16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 417
lastpage: 425
published: 2016-05-02 00:00:00 +0000
- title: 'Optimal Statistical and Computational Rates for One Bit Matrix Completion'
abstract: 'We consider one bit matrix completion under rank constraint. We present an estimator based on rank constrained maximum likelihood estimation, and an efficient greedy algorithm to solve it approximately based on an extension of conditional gradient descent. The output of the proposed algorithm converges at a linear rate to the underlying true low-rank matrix up to the optimal statistical estimation error rate, i.e., O(\sqrtrn\log(n)/|Ω|), where n is the dimension of the underlying matrix and |Ω| is the number of observed entries. Our work establishes the first computationally efficient approach with provable guarantee for optimal estimation in one bit matrix completion. Our theory is supported by thorough numerical results.'
volume: 51
URL: http://proceedings.mlr.press/v51/ni16.html
PDF: http://proceedings.mlr.press/v51/ni16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-ni16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ni
given: Renkun
- family: Gu
given: Quanquan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 426-434
id: ni16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 426
lastpage: 434
published: 2016-05-02 00:00:00 +0000
- title: 'PAC-Bayesian Bounds based on the Rényi Divergence'
abstract: 'We propose a simplified proof process for PAC-Bayesian generalization bounds, that allows to divide the proof in four successive inequalities, easing the "customization" of PAC-Bayesian theorems. We also propose a family of PAC-Bayesian bounds based on the Rényi divergence between the prior and posterior distributions, whereas most PAC-Bayesian bounds are based on the Kullback-Leibler divergence. Finally, we present an empirical evaluation of the tightness of each inequality of the simplified proof, for both the classical PAC-Bayesian bounds and those based on the Rényi divergence.'
volume: 51
URL: http://proceedings.mlr.press/v51/begin16.html
PDF: http://proceedings.mlr.press/v51/begin16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-begin16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bégin
given: Luc
- family: Germain
given: Pascal
- family: Laviolette
given: François
- family: Roy
given: Jean-Francis
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 435-444
id: begin16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 435
lastpage: 444
published: 2016-05-02 00:00:00 +0000
- title: 'Simple and Scalable Constrained Clustering: a Generalized Spectral Method'
abstract: 'We present a simple spectral approach to the well-studied constrained clustering problem. It captures constrained clustering as a generalized eigenvalue problem with graph Laplacians. The algorithm works in nearly-linear time and provides concrete guarantees for the quality of the clusters, at least for the case of 2-way partitioning. In practice this translates to a very fast implementation that consistently outperforms existing spectral approaches both in speed and quality.'
volume: 51
URL: http://proceedings.mlr.press/v51/cucuringu16.html
PDF: http://proceedings.mlr.press/v51/cucuringu16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-cucuringu16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Cucuringu
given: Mihai
- family: Koutis
given: Ioannis
- family: Chawla
given: Sanjay
- family: Miller
given: Gary
- family: Peng
given: Richard
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 445-454
id: cucuringu16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 445
lastpage: 454
published: 2016-05-02 00:00:00 +0000
- title: 'Geometry Aware Mappings for High Dimensional Sparse Factors'
abstract: 'While matrix factorisation models are ubiquitous in large scale recommendation and search, real time application of such models requires inner product computations over an intractably large set of item factors. In this manuscript we present a novel framework that exploits structural properties of sparse vectors, using the inverted index representation, to significantly reduce the run time computational cost of factorisation models. We develop techniques that use geometry aware permutation maps on a tessellated unit sphere to obtain high dimensional sparse embeddings for latent factors with sparsity patterns related to angular closeness of the original latent factors. We also design several efficient and deterministic realisations within this framework and demonstrate with experiments that our techniques lead to faster run time operation with minimal loss of accuracy.'
volume: 51
URL: http://proceedings.mlr.press/v51/bhowmik16.html
PDF: http://proceedings.mlr.press/v51/bhowmik16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-bhowmik16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bhowmik
given: Avradeep
- family: Liu
given: Nathan
- family: Zhong
given: Erheng
- family: Bhaskar
given: Badri
- family: Rajan
given: Suju
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 455-463
id: bhowmik16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 455
lastpage: 463
published: 2016-05-02 00:00:00 +0000
- title: 'Generalizing Pooling Functions in Convolutional Neural Networks: Mixed, Gated, and Tree'
abstract: 'We seek to improve deep neural networks by generalizing the pooling operations that play a central role in current architectures. We pursue a careful exploration of approaches to allow pooling to learn and to adapt to complex and variable patterns. The two primary directions lie in (1) learning a pooling function via (two strategies of) combining of max and average pooling, and (2) learning a pooling function in the form of a tree-structured fusion of pooling filters that are themselves learned. In our experiments every generalized pooling operation we explore improves performance when used in place of average or max pooling. We experimentally demonstrate that the proposed pooling operations provide a boost in invariance properties relative to conventional pooling and set the state of the art on several widely adopted benchmark datasets; they are also easy to implement, and can be applied within various deep neural network architectures. These benefits come with only a light increase in computational overhead during training and a very modest increase in the number of model parameters.'
volume: 51
URL: http://proceedings.mlr.press/v51/lee16a.html
PDF: http://proceedings.mlr.press/v51/lee16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-lee16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lee
given: Chen-Yu
- family: Gallagher
given: Patrick W.
- family: Tu
given: Zhuowen
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 464-472
id: lee16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 464
lastpage: 472
published: 2016-05-02 00:00:00 +0000
- title: 'Rivalry of Two Families of Algorithms for Memory-Restricted Streaming PCA'
abstract: 'We study the problem of recovering the subspace spanned by the first k principal components of d-dimensional data under the streaming setting, with a memory bound of O(kd). Two families of algorithms are known for this problem. The first family is based on the framework of stochastic gradient descent. Nevertheless, the convergence rate of the family can be seriously affected by the learning rate of the descent steps and deserves more serious study. The second family is based on the power method over blocks of data, but setting the block size for its existing algorithms is not an easy task. In this paper, we analyze the convergence rate of a representative algorithm with decayed learning rate (Oja and Karhunen, 1985) in the first family for the general k>1 case. Moreover, we propose a novel algorithm for the second family that sets the block sizes automatically and dynamically with faster convergence rate. We then conduct empirical studies that fairly compare the two families on real-world data. The studies reveal the advantages and disadvantages of these two families.'
volume: 51
URL: http://proceedings.mlr.press/v51/li16b.html
PDF: http://proceedings.mlr.press/v51/li16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-li16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Chun-Liang
- family: Lin
given: Hsuan-Tien
- family: Lu
given: Chi-Jen
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 473-481
id: li16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 473
lastpage: 481
published: 2016-05-02 00:00:00 +0000
- title: 'Quantization based Fast Inner Product Search'
abstract: 'We propose a quantization based approach for fast approximate Maximum Inner Product Search (MIPS). Each database vector is quantized in multiple subspaces via a set of codebooks, learned directly by minimizing the inner product quantization error. Then, the inner product of a query to a database vector is approximated as the sum of inner products with the subspace quantizers. Different from recently proposed LSH approaches to MIPS, the database vectors and queries do not need to be augmented in a higher dimensional feature space. We also provide a theoretical analysis of the proposed approach, consisting of the concentration results under mild assumptions. Furthermore, if a small set of held-out queries is given at the training time, we propose a modified codebook learning procedure which further improves the accuracy. Experimental results on a variety of datasets including those arising from deep neural networks show that the proposed approach significantly outperforms the existing state-of-the-art.'
volume: 51
URL: http://proceedings.mlr.press/v51/guo16a.html
PDF: http://proceedings.mlr.press/v51/guo16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-guo16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Guo
given: Ruiqi
- family: Kumar
given: Sanjiv
- family: Choromanski
given: Krzysztof
- family: Simcha
given: David
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 482-490
id: guo16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 482
lastpage: 490
published: 2016-05-02 00:00:00 +0000
- title: 'An Improved Convergence Analysis of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization'
abstract: 'The cyclic block coordinate descent-type (CBCD-type) methods have shown remarkable computational performance for solving strongly convex minimization problems. Typical applications include many popular statistical machine learning methods such as elastic-net regression, ridge penalized logistic regression, and sparse additive regression. Existing optimization literature has shown that the CBCD-type methods attain iteration complexity of O(p⋅\log(1/ε)), where εis a pre-specified accuracy of the objective value, and p is the number of blocks. However, such iteration complexity explicitly depends on p, and therefore is at least p times worse than those of gradient descent methods. To bridge this theoretical gap, we propose an improved convergence analysis for the CBCD-type methods. In particular, we first show that for a family of quadratic minimization problems, the iteration complexity of the CBCD-type methods matches that of the GD methods in term of dependency on p (up to a \log^2 p factor). Thus our complexity bounds are sharper than the existing bounds by at least a factor of p/\log^2p. We also provide a lower bound to confirm that our improved complexity bounds are tight (up to a \log^2 p factor) if the largest and smallest eigenvalues of the Hessian matrix do not scale with p. Finally, we generalize our analysis to other strongly convex minimization problems beyond quadratic ones.'
volume: 51
URL: http://proceedings.mlr.press/v51/li16c.html
PDF: http://proceedings.mlr.press/v51/li16c.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-li16c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Xingguo
- family: Zhao
given: Tuo
- family: Arora
given: Raman
- family: Liu
given: Han
- family: Hong
given: Mingyi
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 491-499
id: li16c
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 491
lastpage: 499
published: 2016-05-02 00:00:00 +0000
- title: 'Learning Structured Low-Rank Representation via Matrix Factorization'
abstract: 'A vast body of recent works in the literature has shown that exploring structures beyond data low-rankness can boost the performance of subspace clustering methods such as Low-Rank Representation (LRR). It has also been well recognized that the matrix factorization framework might offer more flexibility on pursuing underlying structures of the data. In this paper, we propose to learn structured LRR by factorizing the nuclear norm regularized matrix, which leads to our proposed non-convex formulation NLRR. Interestingly, this formulation of NLRR provides a general framework for unifying a variety of popular algorithms including LRR, dictionary learning, robust principal component analysis, sparse subspace clustering, etc. Several variants of NLRR are also proposed, for example, to promote sparsity while preserving low-rankness. We design a practical algorithm for NLRR and its variants, and establish theoretical guarantee for the stability of the solution and the convergence of the algorithm. Perhaps surprisingly, the computational and memory cost of NLRR can be reduced by roughly one order of magnitude compared to the cost of LRR. Experiments on extensive simulations and real datasets confirm the robustness of efficiency of NLRR and the variants.'
volume: 51
URL: http://proceedings.mlr.press/v51/shen16.html
PDF: http://proceedings.mlr.press/v51/shen16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-shen16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Shen
given: Jie
- family: Li
given: Ping
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 500-509
id: shen16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 500
lastpage: 509
published: 2016-05-02 00:00:00 +0000
- title: 'A PAC RL Algorithm for Episodic POMDPs'
abstract: 'Many interesting real world domains involve reinforcement learning (RL) in partially observable environments. Efficient learning in such domains is important, but existing sample complexity bounds for partially observable RL are at least exponential in the episode length. We give, to our knowledge, the first partially observable RL algorithm with a polynomial bound on the number of episodes on which the algorithm may not achieve near-optimal performance. Our algorithm is suitable for an important class of episodic POMDPs. Our approach builds on recent advances in the method of moments for latent variable model estimation.'
volume: 51
URL: http://proceedings.mlr.press/v51/guo16b.html
PDF: http://proceedings.mlr.press/v51/guo16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-guo16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Guo
given: Zhaohan Daniel
- family: Doroudi
given: Shayan
- family: Brunskill
given: Emma
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 510-518
id: guo16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 510
lastpage: 518
published: 2016-05-02 00:00:00 +0000
- title: 'Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation'
abstract: 'Traditional graph-based semi-supervised learning (SSL) approaches are not suited for massive data and large label scenarios since they scale linearly with the number of edges |E| and distinct labels m. To deal with the large label size problem, recent works propose sketch-based methods to approximate the label distribution per node thereby achieving a space reduction from O(m) to O(\log m), under certain conditions. In this paper, we present a novel streaming graph-based SSL approximation that effectively captures the sparsity of the label distribution and further reduces the space complexity per node to O(1). We also provide a distributed version of the algorithm that scales well to large data sizes. Experiments on real-world datasets demonstrate that the new method achieves better performance than existing state-of-the-art algorithms with significant reduction in memory footprint. Finally, we propose a robust graph augmentation strategy using unsupervised deep learning architectures that yields further significant quality gains for SSL in natural language applications.'
volume: 51
URL: http://proceedings.mlr.press/v51/ravi16.html
PDF: http://proceedings.mlr.press/v51/ravi16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-ravi16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ravi
given: Sujith
- family: Diao
given: Qiming
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 519-528
id: ravi16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 519
lastpage: 528
published: 2016-05-02 00:00:00 +0000
- title: 'Large-Scale Optimization Algorithms for Sparse Conditional Gaussian Graphical Models'
abstract: 'This paper addresses the problem of scalable optimization for L1-regularized conditional Gaussian graphical models. Conditional Gaussian graphical models generalize the well-known Gaussian graphical models to conditional distributions to model the output network influenced by conditioning input variables. While highly scalable optimization methods exist for sparse Gaussian graphical model estimation, state-of-the-art methods for conditional Gaussian graphical models are not efficient enough and more importantly, fail due to memory constraints for very large problems. In this paper, we propose a new optimization procedure based on a Newton method that efficiently iterates over two sub-problems, leading to drastic improvement in computation time compared to the previous methods. We then extend our method to scale to large problems under memory constraints, using block coordinate descent to limit memory usage while achieving fast convergence. Using synthetic and genomic data, we show that our methods can solve problems with millions of variables and tens of billions of parameters to high accuracy on a single machine.'
volume: 51
URL: http://proceedings.mlr.press/v51/mccarter16.html
PDF: http://proceedings.mlr.press/v51/mccarter16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-mccarter16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: McCarter
given: Calvin
- family: Kim
given: Seyoung
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 528-537
id: mccarter16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 528
lastpage: 537
published: 2016-05-02 00:00:00 +0000
- title: 'Graph Connectivity in Noisy Sparse Subspace Clustering'
abstract: 'Subspace clustering is the problem of clustering data points into a union of low-dimensional linear/affine subspaces. It is the mathematical abstraction of many important problems in computer vision, image processing and machine learning. A line of recent work [4, 19, 24, 20] provided strong theoretical guarantee for sparse subspace cluster- ing [4], the state-of-the-art algorithm for sub- space clustering, on both noiseless and noisy data sets. It was shown that under mild conditions, with high probability no two points from different subspaces are clustered together. Such guarantee, however, is not sufficient for the clustering to be correct, due to the notorious “graph connectivity problem” [15]. In this paper, we investigate the graph connectivity problem for noisy sparse sub-space clustering and show that a simple post-processing procedure is capable of delivering consistent clustering under certain “general position” or “restricted eigenvalue” assumptions. We also show that our condition is almost tight with adversarial noise perturbation by constructing a counter-example. These results provide the first exact clustering guarantee of noisy SSC for subspaces of dimension greater then 3.'
volume: 51
URL: http://proceedings.mlr.press/v51/wang16b.html
PDF: http://proceedings.mlr.press/v51/wang16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-wang16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Yining
- family: Wang
given: Yu-Xiang
- family: Singh
given: Aarti
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 538-546
id: wang16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 538
lastpage: 546
published: 2016-05-02 00:00:00 +0000
- title: 'The Nonparametric Kernel Bayes Smoother'
abstract: 'Recently, significant progress has been made developing kernel mean expressions for Bayesian inference. An important success in this domain is the nonparametric kernel Bayes’ filter (nKB-filter), which can be used for sequential inference in state space models. We expand upon this work by introducing a smoothing algorithm, the nonparametric kernel Bayes’ smoother (nKB-smoother) which relies on kernel Bayesian inference through the kernel sum rule and kernel Bayes’ rule. We derive the smoothing equations, analyze the computational cost, and show smoothing consistency. We summarize the algorithm, which is simple to implement, requiring only matrix multiplications and the output of the nKB-filter. Finally, we report experimental results that compare the nKB-smoother to previous parametric and nonparametric approaches to Bayesian filtering and smoothing. In the supplementary materials, we show that the combination of the nKB-filter and the nKB-smoother allows marginal kernel mean computation, which gives an alternative to kernel belief propagation.'
volume: 51
URL: http://proceedings.mlr.press/v51/nishiyama16.html
PDF: http://proceedings.mlr.press/v51/nishiyama16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-nishiyama16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Nishiyama
given: Yu
- family: Afsharinejad
given: Amir
- family: Naruse
given: Shunsuke
- family: Boots
given: Byron
- family: Song
given: Le
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 547-555
id: nishiyama16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 547
lastpage: 555
published: 2016-05-02 00:00:00 +0000
- title: 'Universal Models of Multivariate Temporal Point Processes'
abstract: 'With the rapidly increasing availability of event stream data there is growing interest in multivariate temporal point process models to capture both qualitative and quantitative features of this type of data. Recent research on multivariate point processes have focused in inference and estimation problems for restricted classes of models such as continuous time Bayesian networks, Markov jump processes, Gaussian Cox processes, and Hawkes Processes. In this paper, we study the expressive power and learnability of Graphical Event Models (GEMs) – the analogue of directed graphical models for multivariate temporal point processes. In particular, we describe a set of Graphical Event Models (GEMs) and show that this class can universally approximate any smooth multivariate temporal point process. We also describe a universal learning algorithm for this class of GEMs and show, under a mild set of assumptions, learnability results for both the dependency structures and distributions in this class. Our consistency results demonstrate the possibility of learning about both qualitative and quantitative dependencies from rich event stream data.'
volume: 51
URL: http://proceedings.mlr.press/v51/gunawardana16.html
PDF: http://proceedings.mlr.press/v51/gunawardana16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-gunawardana16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gunawardana
given: Asela
- family: Meek
given: Chris
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 556-563
id: gunawardana16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 556
lastpage: 563
published: 2016-05-02 00:00:00 +0000
- title: 'Online Relative Entropy Policy Search using Reproducing Kernel Hilbert Space Embeddings'
abstract: 'Kernel methods have been successfully applied to reinforcement learning problems to address some challenges such as high dimensional and continuous states, value function approximation and state transition probability modeling. In this paper, we develop an online policy search algorithm based on a recent state-of-the-art algorithm REPS-RKHS that uses conditional kernel embeddings. Our online algorithm inherits the advantages of REPS-RKHS, including the ability to learn non-parametric control policies for infinite horizon continuous MDPs with high- dimensional sensory representations. Different from the original REPS-RKHS algorithm which is based on batch learning, the proposed online algorithm updates the model in an online fashion and thus is able to capture and respond to rapid changes in the system dynamics. In addition, the online update operation takes constant time (i.e., independent of the sample size n), which is much more efficient computationally and allows the policy to be continuously revised. Experiments on different domains are conducted and results show that our online algorithm outperforms the original algorithm.'
volume: 51
URL: http://proceedings.mlr.press/v51/chen16a.html
PDF: http://proceedings.mlr.press/v51/chen16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-chen16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Zhitang
- family: Poupart
given: Pascal
- family: Geng
given: Yanhui
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 573-581
id: chen16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 573
lastpage: 581
published: 2016-05-02 00:00:00 +0000
- title: 'Relationship between PreTraining and Maximum Likelihood Estimation in Deep Boltzmann Machines'
abstract: 'A pretraining algorithm, which is a layer-by-layer greedy learning algorithm, for a deep Boltzmann machine (DBM) is presented in this paper. By considering the deep belief net type of pretraining for the DBM, which is a simplified version of the original pretraining of the DBM, two interesting theoretical facts about pretraining can be obtained. (1) By applying two different types of approximation, a replacing approximation by using a Bayesian network and a Bethe type of approximation based on the cluster variation method, to two different parts of the true log-likelihood function of the DBM, pretraining can be derived from a variational approximation of the original maximum likelihood estimation. (2) It can be ensured that the pretraining improves the variational bound of the true log-likelihood function of the DBM. These two theoretical results will help deepen our understanding of deep learning. Moreover, on the basis of the theoretical results, we discuss the original pretraining of the DBM in the latter part of this paper.'
volume: 51
URL: http://proceedings.mlr.press/v51/yasuda16.html
PDF: http://proceedings.mlr.press/v51/yasuda16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-yasuda16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Yasuda
given: Muneki
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 582-590
id: yasuda16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 582
lastpage: 590
published: 2016-05-02 00:00:00 +0000
- title: 'Enumerating Equivalence Classes of Bayesian Networks using EC Graphs'
abstract: 'We consider the problem of learning Bayesian network structures from complete data. In particular, we consider the enumeration of their k-best equivalence classes. We propose a new search space for A* search, called the EC graph, that facilitates the enumeration of equivalence classes, by representing the space of completed, partially directed acyclic graphs. We also propose a canonization of this search space, called the EC tree, which further improves the efficiency of enumeration. Empirically, our approach is orders of magnitude more efficient than the state-of-the-art at enumerating equivalence classes.'
volume: 51
URL: http://proceedings.mlr.press/v51/chen16b.html
PDF: http://proceedings.mlr.press/v51/chen16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-chen16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Eunice Yuh-Jie
- family: Choi
given: Arthur Choi
- family: Darwiche
given: Adnan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 591-599
id: chen16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 591
lastpage: 599
published: 2016-05-02 00:00:00 +0000
- title: 'Low-Rank and Sparse Structure Pursuit via Alternating Minimization'
abstract: 'In this paper, we present a nonconvex alternating minimization optimization algorithm for low-rank and sparse structure pursuit. Compared with convex relaxation based methods, the proposed algorithm is computationally more efficient for large scale problems. In our study, we define a notion of bounded difference of gradients, based on which we rigorously prove that with suitable initialization, the proposed nonconvex optimization algorithm enjoys linear convergence to the global optima and exactly recovers the underlying low rank and sparse matrices under standard conditions such as incoherence and sparsity conditions. For a wide range of statistical models such as multi-task learning and robust principal component analysis (RPCA), our algorithm provides a principled approach to learning the low rank and sparse structures with provable guarantee. Thorough experiments on both synthetic and real datasets backup our theory.'
volume: 51
URL: http://proceedings.mlr.press/v51/gu16.html
PDF: http://proceedings.mlr.press/v51/gu16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-gu16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gu
given: Quanquan
- family: Wang
given: Zhaoran Wang
- family: Liu
given: Han
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 600-609
id: gu16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 600
lastpage: 609
published: 2016-05-02 00:00:00 +0000
- title: 'NuC-MKL: A Convex Approach to Non Linear Multiple Kernel Learning'
abstract: 'Multiple Kernel Learning (MKL) methods are known for their effectiveness in solving classification and regression problems involving multi-modal data. Many MKL approaches use linear combination of base kernels, resulting in somewhat limited feature representations. Several non-linear MKL formulations were proposed recently. They provide much higher dimensional feature spaces, and, therefore, richer representations. However, these methods often lead to non-convex optimization and to intractable number of optimization parameters. In this paper, we propose a new non-linear MKL method that utilizes nuclear norm regularization and leads to convex optimization problem. The proposed Nuclear-norm-Constrained MKL (NuC-MKL) algorithm converges faster, and requires smaller number of calls to an SVM solver, as compared to other competing methods. Moreover, the number of the model support vectors in our approach is usually much smaller, as compared to other methods. This suggests that our algorithm is more resilient to overfitting. We test our algorithm on several known benchmarks, and show that it equals or outperforms the state-of-the-art MKL methods on all these data sets.'
volume: 51
URL: http://proceedings.mlr.press/v51/meirom16.html
PDF: http://proceedings.mlr.press/v51/meirom16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-meirom16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Meirom
given: Eli
- family: Kisilev
given: Pavel
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 610-619
id: meirom16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 610
lastpage: 619
published: 2016-05-02 00:00:00 +0000
- title: 'Tractable and Scalable Schatten Quasi-Norm Approximations for Rank Minimization'
abstract: 'The Schatten quasi-norm was introduced to bridge the gap between the trace norm and rank function. However, existing algorithms are too slow or even impractical for large-scale problems. Motivated by the equivalence relation between the trace norm and its bilinear spectral penalty, we define two tractable Schatten norms, i.e. the bi-trace and tri-trace norms, and prove that they are in essence the Schatten-1/2 and 1/3 quasi-norms, respectively. By applying the two defined Schatten quasi-norms to various rank minimization problems such as MC and RPCA, we only need to solve much smaller factor matrices. We design two efficient linearized alternating minimization algorithms to solve our problems and establish that each bounded sequence generated by our algorithms converges to a critical point. We also provide the restricted strong convexity (RSC) based and MC error bounds for our algorithms. Our experimental results verified both the efficiency and effectiveness of our algorithms compared with the state-of-the-art methods.'
volume: 51
URL: http://proceedings.mlr.press/v51/shang16.html
PDF: http://proceedings.mlr.press/v51/shang16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-shang16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Shang
given: Fanhua
- family: Liu
given: Yuanyuan
- family: Cheng
given: James
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 620-629
id: shang16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 620
lastpage: 629
published: 2016-05-02 00:00:00 +0000
- title: 'Fast Dictionary Learning with a Smoothed Wasserstein Loss'
abstract: 'We consider in this paper the dictionary learning problem when the observations are normalized histograms of features. This problem can be tackled using non-negative matrix factorization approaches, using typically Euclidean or Kullback-Leibler fitting errors. Because these fitting errors are separable and treat each feature on equal footing, they are blind to any similarity the features may share. We assume in this work that we have prior knowledge on these features. To leverage this side-information, we propose to use the Wasserstein (\textita.k.a. earth mover’s or optimal transport) distance as the fitting error between each original point and its reconstruction, and we propose scalable algorithms to to so. Our methods build upon Fenchel duality and entropic regularization of Wasserstein distances, which improves not only speed but also computational stability. We apply these techniques on face images and text documents. We show in particular that we can learn dictionaries (topics) for bag-of-word representations of texts using words that may not have appeared in the original texts, or even words that come from a different language than that used in the texts.'
volume: 51
URL: http://proceedings.mlr.press/v51/rolet16.html
PDF: http://proceedings.mlr.press/v51/rolet16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-rolet16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Rolet
given: Antoine
- family: Cuturi
given: Marco
- family: Peyré
given: Gabriel
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 630-638
id: rolet16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 630
lastpage: 638
published: 2016-05-02 00:00:00 +0000
- title: 'New Resistance Distances with Global Information on Large Graphs'
abstract: 'We consider the problem that on large random geometric graphs, random walk-based distances between nodes do not carry global information such as cluster structure. Instead, as the graphs become larger, the distances contain mainly the obsolete information of local density of the nodes. Many distances or similarity measures between nodes on a graph have been proposed but none are both proved to overcome this problem or computationally feasible even for small graphs. We propose new distance functions between nodes for this problem. The idea is to use electrical flows with different energy functions. Our proposed distances are proved analytically to be metrics in L^p spaces, to keep global information, avoiding the problem, and can be computed efficiently for large graphs. Our experiments with synthetic and real data confirmed the theoretical properties and practical performances of our proposed distances.'
volume: 51
URL: http://proceedings.mlr.press/v51/nguyen16a.html
PDF: http://proceedings.mlr.press/v51/nguyen16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-nguyen16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Nguyen
given: Canh Hao
- family: Mamitsuka
given: Hiroshi
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 639-647
id: nguyen16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 639
lastpage: 647
published: 2016-05-02 00:00:00 +0000
- title: 'Batch Bayesian Optimization via Local Penalization'
abstract: 'The popularity of Bayesian optimization methods for efficient exploration of parameter spaces has lead to a series of papers applying Gaussian processes as surrogates in the optimization of functions. However, most proposed approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available. These could either be computational or physical facets of the process being optimized. Batch methods, however, require the modeling of the interaction between the different evaluations in the batch, which can be expensive in complex scenarios. We investigate this issue and propose a highly effective heuristic based on an estimate of the function’s Lipschitz constant that captures the most important aspect of this interaction–local repulsion–at negligible computational overhead. A penalized acquisition function is used to collect batches of points minimizing the non-parallelizable computational effort. The resulting algorithm compares very well, in run-time, with much more elaborate alternatives.'
volume: 51
URL: http://proceedings.mlr.press/v51/gonzalez16a.html
PDF: http://proceedings.mlr.press/v51/gonzalez16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-gonzalez16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gonzalez
given: Javier
- family: Dai
given: Zhenwen
- family: Hennig
given: Philipp
- family: Lawrence
given: Neil
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 648-657
id: gonzalez16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 648
lastpage: 657
published: 2016-05-02 00:00:00 +0000
- title: 'Nonparametric Budgeted Stochastic Gradient Descent'
abstract: 'One of the most challenging problems in kernel online learning is to bound the model size. Budgeted kernel online learning addresses this issue by bounding the model size to a predefined budget. However, determining an appropriate value for such predefined budget is arduous. In this paper, we propose the Nonparametric Budgeted Stochastic Gradient Descent that allows the model size to automatically grow with data in a principled way. We provide theoretical analysis to show that our framework is guaranteed to converge for a large collection of loss functions (e.g. Hinge, Logistic, L2, L1, and \varepsilon-insensitive) which enables the proposed algorithm to perform both classification and regression tasks without hurting the ideal convergence rate O\left(\frac1T\right) of the standard Stochastic Gradient Descent. We validate our algorithm on the real-world datasets to consolidate the theoretical claims.'
volume: 51
URL: http://proceedings.mlr.press/v51/le16.html
PDF: http://proceedings.mlr.press/v51/le16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-le16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Le
given: Trung
- family: Nguyen
given: Vu
- family: Nguyen
given: Tu Dinh
- family: Phung
given: Dinh
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 654-572
id: le16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 654
lastpage: 572
published: 2016-05-02 00:00:00 +0000
- title: 'Learning Relationships between Data Obtained Independently'
abstract: 'The aim of this paper is to provide a new method for learning the relationships between data that have been obtained independently. Unlike existing methods like matching, the proposed technique does not require any contextual information, provided that the dependency between the variables of interest is monotone. It can therefore be easily combined with matching in order to exploit the advantages of both methods. This technique can be described as a mix between quantile matching, and deconvolution. We provide for it a theoretical and an empirical validation.'
volume: 51
URL: http://proceedings.mlr.press/v51/carpentier16b.html
PDF: http://proceedings.mlr.press/v51/carpentier16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-carpentier16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Carpentier
given: Alexandra
- family: Schlueter
given: Teresa
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 658-666
id: carpentier16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 658
lastpage: 666
published: 2016-05-02 00:00:00 +0000
- title: 'Fast and Scalable Structural SVM with Slack Rescaling'
abstract: 'We present an efficient method for training slack-rescaled structural SVM. Although finding the most violating label in a margin-rescaled formulation is often easy since the target function decomposes with respect to the structure, this is not the case for a slack-rescaled formulation, and finding the most violated label might be very difficult. Our core contribution is an efficient method for finding the most-violating-label in a slack-rescaled formulation, given an oracle that returns the most-violating-label in a (slightly modified) margin-rescaled formulation. We show that our method enables accurate and scalable training for slack-rescaled SVMs, reducing runtime by an order of magnitude compared to previous approaches to slack-rescaled SVMs.'
volume: 51
URL: http://proceedings.mlr.press/v51/choi16.html
PDF: http://proceedings.mlr.press/v51/choi16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-choi16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Choi
given: Heejin
- family: Meshi
given: Ofer
- family: Srebro
given: Nathan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 667-675
id: choi16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 667
lastpage: 675
published: 2016-05-02 00:00:00 +0000
- title: 'Probabilistic Approximate Least-Squares'
abstract: 'Least-squares and kernel-ridge / Gaussian process regression are among the foundational algorithms of statistics and machine learning. Famously, the worst-case cost of exact nonparametric regression grows cubically with the data-set size; but a growing number of approximations have been developed that estimate good solutions at lower cost. These algorithms typically return point estimators, without measures of uncertainty. Leveraging recent results casting elementary linear algebra operations as probabilistic inference, we propose a new approximate method for nonparametric least-squares that affords a probabilistic uncertainty estimate over the error between the approximate and exact least-squares solution (this is not the same as the posterior variance of the associated Gaussian process regressor). This allows estimating the error of the least-squares solution on a subset of the data relative to the full-data solution. The uncertainty can be used to control the computational effort invested in the approximation. Our algorithm has linear cost in the data-set size, and a simple formal form, so that it can be implemented with a few lines of code in programming languages with linear algebra functionality.'
volume: 51
URL: http://proceedings.mlr.press/v51/bartels16.html
PDF: http://proceedings.mlr.press/v51/bartels16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-bartels16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bartels
given: Simon
- family: Hennig
given: Philipp
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 676-684
id: bartels16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 676
lastpage: 684
published: 2016-05-02 00:00:00 +0000
- title: 'Approximate Inference Using DC Programming For Collective Graphical Models'
abstract: 'Collective graphical models (CGMs) provide a framework for reasoning about a population of independent and identically distributed individuals when only noisy and aggregate observations are given. Previous approaches for inference in CGMs work on a junction-tree representation, thereby highly limiting their scalability. To remedy this, we show how the Bethe entropy approximation naturally arises for the inference problem in CGMs. We reformulate the resulting optimization problem as a difference-of-convex functions program that can capture different types of CGM noise models. Using the concave-convex procedure, we then develop a scalable message-passing algorithm. Empirically, our approach is highly scalable and accurate for large graphs, more than an order-of-magnitude faster than a generic optimization solver, and is guaranteed to converge unlike the previous message-passing approach NLBP that fails in several loopy graphs.'
volume: 51
URL: http://proceedings.mlr.press/v51/nguyen16b.html
PDF: http://proceedings.mlr.press/v51/nguyen16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-nguyen16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Nguyen
given: Thien
- family: Kumar
given: Akshat
- family: Lau
given: Hoong Chuin
- family: Sheldon
given: Daniel
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 685-693
id: nguyen16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 685
lastpage: 693
published: 2016-05-02 00:00:00 +0000
- title: 'Sequential Inference for Deep Gaussian Process'
abstract: 'A deep Gaussian process (DGP) is a deep network in which each layer is modelled with a Gaussian process (GP). It is a flexible model that can capture highly-nonlinear functions for complex data sets. However, the network structure of DGP often makes inference computationally expensive. In this paper, we propose an efficient sequential inference framework for DGP, where the data is processed sequentially. We also propose two DGP extensions to handle heteroscedasticity and multi-task learning. Our experimental evaluation shows the effectiveness of our sequential inference framework on a number of important learning tasks.'
volume: 51
URL: http://proceedings.mlr.press/v51/wang16c.html
PDF: http://proceedings.mlr.press/v51/wang16c.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-wang16c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Yali
- family: Brubaker
given: Marcus
- family: Chaib-Draa
given: Brahim
- family: Urtasun
given: Raquel
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 694-703
id: wang16c
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 694
lastpage: 703
published: 2016-05-02 00:00:00 +0000
- title: 'Variational Tempering'
abstract: 'Variational inference (VI) combined with data subsampling enables approximate posterior inference with large data sets for otherwise intractable models, but suffers from poor local optima. We first formulate a deterministic annealing approach for the generic class of conditionally conjugate exponential family models. This algorithm uses a temperature parameter that deterministically deforms the objective and reduces this parameter over the course of the optimization. A well-known drawback in annealing is the choice of the annealing schedule. We therefore introduce variational tempering, a variational algorithm that introduces a temperature latent variable to the model. In contrast to related work in the Markov chain Monte Carlo literature, this algorithm results in adaptive annealing schedules. Lastly, we develop local variational tempering, which assigns a latent temperature to each data point; this allows for dynamic annealing that varies across data. Compared to the traditional VI, all proposed approaches find improved predictive likelihoods on held-out data.'
volume: 51
URL: http://proceedings.mlr.press/v51/mandt16.html
PDF: http://proceedings.mlr.press/v51/mandt16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-mandt16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Mandt
given: Stephan
- family: McInerney
given: James
- family: Abrol
given: Farhan
- family: Ranganath
given: Rajesh
- family: Blei
given: David
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 704-712
id: mandt16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 704
lastpage: 712
published: 2016-05-02 00:00:00 +0000
- title: 'On Convergence of Model Parallel Proximal Gradient Algorithm for Stale Synchronous Parallel System'
abstract: 'With ever growing data volume and model size, an error-tolerant, communication efficient, yet versatile parallel algorithm has become a vital part for the success of many large-scale applications. In this work we propose mspg, an extension of the flexible proximal gradient algorithm to the model parallel and stale synchronous setting. The worker machines of mspg operate asynchronously as long as they are not too far apart, and they communicate efficiently through a dedicated parameter server. Theoretically, we provide a rigorous analysis of the various convergence properties of mspg, and a salient feature of our analysis is its seamless generality that allows both nonsmooth and nonconvex functions. Under mild conditions, we prove the whole iterate sequence of mspg converges to a critical point (which is optimal under convexity assumptions). We further provide an economical implementation of mspg, completely bypassing the need of keeping a local full model. We confirm our theoretical findings through numerical experiments.'
volume: 51
URL: http://proceedings.mlr.press/v51/zhou16.html
PDF: http://proceedings.mlr.press/v51/zhou16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-zhou16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhou
given: Yi
- family: Yu
given: Yaoliang
- family: Dai
given: Wei
- family: Liang
given: Yingbin
- family: Xing
given: Eric
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 713-722
id: zhou16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 713
lastpage: 722
published: 2016-05-02 00:00:00 +0000
- title: 'Scalable MCMC for Mixed Membership Stochastic Blockmodels'
abstract: 'We propose a stochastic gradient Markov chain Monte Carlo (SG-MCMC) algorithm for scalable inference in mixed-membership stochastic blockmodels (MMSB). Our algorithm is based on the stochastic gradient Riemannian Langevin sampler and achieves both faster speed and higher accuracy at every iteration than the current state-of-the-art algorithm based on stochastic variational inference. In addition we develop an approximation that can handle models that entertain a very large number of communities. The experimental results show that SG-MCMC strictly dominates competing algorithms in all cases.'
volume: 51
URL: http://proceedings.mlr.press/v51/li16d.html
PDF: http://proceedings.mlr.press/v51/li16d.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-li16d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Wenzhe
- family: Ahn
given: Sungjin
- family: Welling
given: Max
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 723-731
id: li16d
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 723
lastpage: 731
published: 2016-05-02 00:00:00 +0000
- title: 'Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo'
abstract: 'We present a novel approach for non-stationary Gaussian process regression (GPR), where the three key parameters – noise variance, signal variance and lengthscale – can be simultaneously input-dependent. We develop gradient-based inference methods to learn the unknown function and the non-stationary model parameters, without requiring any model approximations. For inferring the full posterior distribution we use Hamiltonian Monte Carlo (HMC), which conveniently extends the analytical gradient-based GPR learning by guiding the sampling with the gradients. The MAP solution can also be learned with gradient ascent. In experiments on several synthetic datasets and in modelling of temporal gene expression, the non-stationary GPR is shown to give major improvement when modeling realistic input-dependent dynamics.'
volume: 51
URL: http://proceedings.mlr.press/v51/heinonen16.html
PDF: http://proceedings.mlr.press/v51/heinonen16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-heinonen16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Heinonen
given: Markus
- family: Mannerström
given: Henrik
- family: Rousu
given: Juho
- family: Kaski
given: Samuel
- family: Lähdesmäki
given: Harri
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 732-740
id: heinonen16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 732
lastpage: 740
published: 2016-05-02 00:00:00 +0000
- title: 'A Deep Generative Deconvolutional Image Model'
abstract: 'A deep generative model is developed for representation and analysis of images, based on a hierarchical convolutional dictionary-learning framework. Stochastic unpooling is employed to link consecutive layers in the model, yielding top-down image generation. A Bayesian support vector machine is linked to the top-layer features, yielding max-margin discrimination. Deep deconvolutional inference is employed when testing, to infer the latent features, and the top-layer features are connected with the max-margin classifier for discrimination tasks. The algorithm is efficiently trained via Monte Carlo expectation-maximization (MCEM), with implementation on graphical processor units (GPUs) for efficient large-scale learning, and fast testing. Excellent results are obtained on several benchmark datasets, including ImageNet, demonstrating that the proposed model achieves results that are highly competitive with similarly sized convolutional neural networks.'
volume: 51
URL: http://proceedings.mlr.press/v51/pu16.html
PDF: http://proceedings.mlr.press/v51/pu16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-pu16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Pu
given: Yunchen
- family: Yuan
given: Win
- family: Stevens
given: Andrew
- family: Li
given: Chunyuan
- family: Carin
given: Lawrence
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 741-750
id: pu16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 741
lastpage: 750
published: 2016-05-02 00:00:00 +0000
- title: 'Distributed Multi-Task Learning'
abstract: 'We consider the problem of distributed multi-task learning, where each machine learns a separate, but related, task. Specifically, each machine learns a linear predictor in high-dimensional space, where all tasks share the same small support. We present a communication-efficient estimator based on the debiased lasso and show that it is comparable with the optimal centralized method.'
volume: 51
URL: http://proceedings.mlr.press/v51/wang16d.html
PDF: http://proceedings.mlr.press/v51/wang16d.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-wang16d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Jialei
- family: Kolar
given: Mladen
- family: Srerbo
given: Nathan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 751-760
id: wang16d
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 751
lastpage: 760
published: 2016-05-02 00:00:00 +0000
- title: 'A Fixed-Point Operator for Inference in Variational Bayesian Latent Gaussian Models'
abstract: 'Latent Gaussian Models (LGM) provide a rich modeling framework with general inference procedures. The variational approximation offers an effective solution for such models and has attracted a significant amount of interest. Recent work proposed a fixed-point (FP) update procedure to optimize the covariance matrix in the variational solution and demonstrated its efficacy in specific models. The paper makes three contributions. First, it shows that the same approach can be used more generally in extensions of LGM. Second, it provides an analysis identifying conditions for the convergence of the FP method. Third, it provides an extensive experimental evaluation in Gaussian processes, sparse Gaussian processes, and generalized linear models, with several non-conjugate observation likelihoods, showing wide applicability of the FP method and a significant advantage over gradient based optimization.'
volume: 51
URL: http://proceedings.mlr.press/v51/sheth16.html
PDF: http://proceedings.mlr.press/v51/sheth16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-sheth16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sheth
given: Rishit
- family: Khardon
given: Roni
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 761-769
id: sheth16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 761
lastpage: 769
published: 2016-05-02 00:00:00 +0000
- title: 'Learning Probabilistic Submodular Diversity Models Via Noise Contrastive Estimation'
abstract: 'Modeling diversity of sets of items is important in many applications such as product recommendation and data summarization. Probabilistic submodular models, a family of models including the determinantal point process, form a natural class of distributions, encouraging effects such as diversity, repulsion and coverage. Current models, however, are limited to small and medium number of items due to the high time complexity for learning and inference. In this paper, we propose FLID, a novel log-submodular diversity model that scales to large numbers of items and can be efficiently learned using noise contrastive estimation. We show that our model achieves state of the art performance in terms of model fit, but can be also learned orders of magnitude faster. We demonstrate the wide applicability of our model using several experiments.'
volume: 51
URL: http://proceedings.mlr.press/v51/tschiatschek16.html
PDF: http://proceedings.mlr.press/v51/tschiatschek16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-tschiatschek16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tschiatschek
given: Sebastian
- family: Djolonga
given: Josip
- family: Krause
given: Andreas
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 770-779
id: tschiatschek16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 770
lastpage: 779
published: 2016-05-02 00:00:00 +0000
- title: 'Fast Saddle-Point Algorithm for Generalized Dantzig Selector and FDR Control with Ordered L1-Norm'
abstract: 'In this paper we propose a primal-dual proximal extragradient algorithm to solve the generalized Dantzig selector (GDS) estimation problem, based on a new convex-concave saddle-point (SP) reformulation. Our new formulation makes it possible to adopt recent developments in saddle-point optimization, to achieve the optimal O(1/k) rate of convergence. Compared to the optimal non-SP algorithms, ours do not require specification of sensitive parameters that affect algorithm performance or solution quality. We also provide a new analysis showing a possibility of local acceleration to achieve the rate of O(1/k^2) in special cases even without strong convexity or strong smoothness. As an application, we propose a GDS equipped with the ordered \ell_1-norm, showing its false discovery rate control properties in variable selection. Algorithm performance is compared between ours and other alternatives, including the linearized ADMM, Nesterov’s smoothing, Nemirovski’s mirror-prox, and the accelerated hybrid proximal extragradient techniques.'
volume: 51
URL: http://proceedings.mlr.press/v51/lee16b.html
PDF: http://proceedings.mlr.press/v51/lee16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-lee16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lee
given: Sangkyun
- family: Brzyski
given: Damian
- family: Bogdan
given: Malgorzata
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 780-789
id: lee16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 780
lastpage: 789
published: 2016-05-02 00:00:00 +0000
- title: 'GLASSES: Relieving The Myopia Of Bayesian Optimisation'
abstract: 'We present GLASSES: Global optimisation with Look-Ahead through Stochastic Simulation and Expected-loss Search. The majority of global optimisation approaches in use are myopic, in only considering the impact of the next function value; the non-myopic approaches that do exist are able to consider only a handful of future evaluations. Our novel algorithm, GLASSES, permits the consideration of dozens of evaluations into the future. This is done by approximating the ideal look-ahead loss function, which is expensive to evaluate, by a cheaper alternative in which the future steps of the algorithm are simulated beforehand. An Expectation Propagation algorithm is used to compute the expected value of the loss. We show that the far-horizon planning thus enabled leads to substantive performance gains in empirical tests.'
volume: 51
URL: http://proceedings.mlr.press/v51/gonzalez16b.html
PDF: http://proceedings.mlr.press/v51/gonzalez16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-gonzalez16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gonzalez
given: Javier
- family: Osborne
given: Michael
- family: Lawrence
given: Neil
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 790-799
id: gonzalez16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 790
lastpage: 799
published: 2016-05-02 00:00:00 +0000
- title: 'Stochastic Variational Inference for the HDP-HMM'
abstract: 'We derive a variational inference algorithm for the HDP-HMM based on the two-level stick breaking construction. This construction has previously been applied to the hierarchical Dirichlet processes (HDP) for mixed membership models, allowing for efficient handling of the coupled weight parameters. However, the same algorithm is not directly applicable to HDP-based infinite hidden Markov models (HDP-HMM) because of extra sequential dependencies in the Markov chain. In this paper we provide a solution to this problem by deriving a variational inference algorithm for the HDP-HMM, as well as its stochastic extension, for which all parameter updates are in closed form. We apply our algorithm to sequential text analysis and audio signal analysis, comparing our results with the beam-sampled iHMM, the parametric HMM, and other variational inference approximations.'
volume: 51
URL: http://proceedings.mlr.press/v51/zhang16a.html
PDF: http://proceedings.mlr.press/v51/zhang16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-zhang16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhang
given: Aonan
- family: Gultekin
given: San
- family: Paisley
given: John
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 800-808
id: zhang16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 800
lastpage: 808
published: 2016-05-02 00:00:00 +0000
- title: 'Stochastic Neural Networks with Monotonic Activation Functions'
abstract: 'We propose a Laplace approximation that creates a stochastic unit from any smooth monotonic activation function, using only Gaussian noise. This paper investigates the application of this stochastic approximation in training a family of Restricted Boltzmann Machines (RBM) that are closely linked to Bregman divergences. This family, that we call exponential family RBM (Exp-RBM), is a subset of the exponential family Harmoniums that expresses family members through a choice of smooth monotonic non-linearity for each neuron. Using contrastive divergence along with our Gaussian approximation, we show that Exp-RBM can learn useful representations using novel stochastic units.'
volume: 51
URL: http://proceedings.mlr.press/v51/ravanbakhsh16.html
PDF: http://proceedings.mlr.press/v51/ravanbakhsh16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-ravanbakhsh16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ravanbakhsh
given: Siamak
- family: Poczos
given: Barnabas
- family: Schneider
given: Jeff
- family: Schuurmans
given: Dale
- family: Greiner
given: Russell
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 809-818
id: ravanbakhsh16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 809
lastpage: 818
published: 2016-05-02 00:00:00 +0000
- title: '(Bandit) Convex Optimization with Biased Noisy Gradient Oracles'
abstract: 'A popular class of algorithms for convex optimization and online learning with bandit feedback rely on constructing noisy gradient estimates, which are then used in place of the actual gradients in appropriately adjusted first-order algorithms. Depending on the properties of the function to be optimized and the nature of “noise” in the bandit feedback, the bias and variance of gradient estimates exhibit various tradeoffs. In this paper we propose a novel framework that replaces the specific gradient estimation methods with an abstract oracle model. With the help of the new framework we unify previous works, reproducing their results in a clean and concise fashion, while, perhaps more importantly, the framework also allows us to formally show that to achieve the optimal root-n rate either the algorithms that use existing gradient estimators, or the proofs have to go beyond what exists today.'
volume: 51
URL: http://proceedings.mlr.press/v51/hu16b.html
PDF: http://proceedings.mlr.press/v51/hu16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-hu16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hu
given: Xiaowei
- family: L.A.
given: Prashanth
- family: György
given: András
- family: Szepesvari
given: Csaba
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 819-828
id: hu16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 819
lastpage: 828
published: 2016-05-02 00:00:00 +0000
- title: 'Variational Gaussian Copula Inference'
abstract: 'We utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational Gaussian copula approach, in which the parametric Gaussian copula family is able to preserve multivariate posterior dependence, and the nonparametric transformations based on Bernstein polynomials provide ample flexibility in characterizing the univariate marginal posteriors.'
volume: 51
URL: http://proceedings.mlr.press/v51/han16.html
PDF: http://proceedings.mlr.press/v51/han16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-han16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Han
given: Shaobo
- family: Liao
given: Xuejun
- family: Dunson
given: David
- family: Carin
given: Lawrence
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 829-838
id: han16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 829
lastpage: 838
published: 2016-05-02 00:00:00 +0000
- title: 'Low-Rank Approximation of Weighted Tree Automata'
abstract: 'We describe a technique to minimize weighted tree automata (WTA), a powerful formalisms that subsumes probabilistic context-free grammars (PCFGs) and latent-variable PCFGs. Our method relies on a singular value decomposition of the underlying Hankel matrix defined by the WTA. Our main theoretical result is an efficient algorithm for computing the SVD of an infinite Hankel matrix implicitly represented as a WTA. We evaluate our method on real-world data originating in newswire treebank. We show that the model achieves lower perplexity than previous methods for PCFG minimization, and also is much more stable due to the absence of local optima.'
volume: 51
URL: http://proceedings.mlr.press/v51/rabusseau16.html
PDF: http://proceedings.mlr.press/v51/rabusseau16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-rabusseau16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Rabusseau
given: Guillaume
- family: Balle
given: Borja
- family: Cohen
given: Shay
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 839-847
id: rabusseau16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 839
lastpage: 847
published: 2016-05-02 00:00:00 +0000
- title: 'Accelerating Online Convex Optimization via Adaptive Prediction'
abstract: 'We present a powerful general framework for designing data-dependent online convex optimization algorithms, building upon and unifying recent techniques in adaptive regularization, optimistic gradient predictions, and problem-dependent randomization. We first present a series of new regret guarantees that hold at any time and under very minimal assumptions, and then show how different relaxations recover existing algorithms, both basic as well as more recent sophisticated ones. Finally, we show how combining adaptivity, optimism, and problem-dependent randomization can guide the design of algorithms that benefit from more favorable guarantees than recent state-of-the-art methods.'
volume: 51
URL: http://proceedings.mlr.press/v51/mohri16.html
PDF: http://proceedings.mlr.press/v51/mohri16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-mohri16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Mohri
given: Mehryar
- family: Yang
given: Scott
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 848-856
id: mohri16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 848
lastpage: 856
published: 2016-05-02 00:00:00 +0000
- title: 'Scalable geometric density estimation'
abstract: 'It is standard to assume a low-dimensional structure in estimating a high-dimensional density. However, popular methods, such as probabilistic principal component analysis, scale poorly computationally. We introduce a novel empirical Bayes method that we term geometric density estimation (GEODE) and show that, with mild conditions and among all d-dimensional linear subspaces, the span of the d leading principal axes of the data maximizes the model posterior. With these axes pre-computed using fast singular value decomposition, GEODE easily scales to high dimensional problems while providing uncertainty characterization. The model is also capable of imputing missing data and dynamically deleting redundant dimensions. Finally, we generalize GEODE by mixing it across a dyadic clustering tree. Both simulation studies and real world data applications show superior performance of GEODE in terms of robustness and computational efficiency.'
volume: 51
URL: http://proceedings.mlr.press/v51/wang16e.html
PDF: http://proceedings.mlr.press/v51/wang16e.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-wang16e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Ye
- family: Canale
given: Antonio
- family: Dunson
given: David
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 857-865
id: wang16e
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 857
lastpage: 865
published: 2016-05-02 00:00:00 +0000
- title: 'Model-based Co-clustering for High Dimensional Sparse Data'
abstract: 'We propose a novel model based on the von Mises-Fisher (vMF) distribution for co-clustering high dimensional sparse matrices. While existing vMF-based models are only suitable for clustering along one dimension, our model acts simultaneously on both dimensions of a data matrix. Thereby it has the advantage of exploiting the inherent duality between rows and columns. Setting our model under the maximum likelihood (ML) approach and the classification ML (CML) approach, we derive two novel, hard and soft, co-clustering algorithms. Empirical results on numerous synthetic and real-world text datasets, demonstrate the effectiveness of our approach, for modelling high dimensional sparse data and co-clustering. Furthermore, thanks to our formulation, that performs an implicitly adaptive dimensionality reduction at each stage, our model alleviates the problem of high concentration parameters kappa’s, a well known difficulty in the classical vMF-based models.'
volume: 51
URL: http://proceedings.mlr.press/v51/salah16.html
PDF: http://proceedings.mlr.press/v51/salah16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-salah16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Salah
given: Aghiles
- family: Rogovschi
given: Nicoleta
- family: Nadif
given: Mohamed
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 866-874
id: salah16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 866
lastpage: 874
published: 2016-05-02 00:00:00 +0000
- title: 'DUAL-LOCO: Distributing Statistical Estimation Using Random Projections'
abstract: 'We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed across workers according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependencies between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy. In particular, DUAL-LOCO allows for fast cross validation as only part of the algorithm depends on the regularization parameter.'
volume: 51
URL: http://proceedings.mlr.press/v51/heinze16.html
PDF: http://proceedings.mlr.press/v51/heinze16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-heinze16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Heinze
given: Christina
- family: McWilliams
given: Brian
- family: Meinshausen
given: Nicolai
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 875-883
id: heinze16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 875
lastpage: 883
published: 2016-05-02 00:00:00 +0000
- title: 'High Dimensional Bayesian Optimization via Restricted Projection Pursuit Models'
abstract: 'Bayesian Optimization (BO) is commonly used to optimize blackbox objective functions which are expensive to evaluate. A common approach is based on using Gaussian Process (GP) to model the objective function. Applying GP to higher dimensional settings is generally difficult due to the curse of dimensionality for nonparametric regression. Existing works makes strong assumptions such as the function is low-dimensional embedding (Wang et al., 2013) or is axis-aligned additive (Kandasamy et al., 2015). In this pa- per, we generalize the existing assumption to a projected-additive assumption. Our generalization provides the benefits of i) greatly increasing the space of functions that can be modeled by our approach, which covers the previous works (Wang et al., 2013; Kandasamy et al., 2015) as special cases, and ii) efficiently handling the learning in a larger model space. We prove that the regret for projected-additive functions has only linear dependence on the number of dimensions in this general setting. Directly using projected-additive GP (Gilboa et al., 2013) to BO results in a non-box constraint, which is not easy to optimize. We tackle this problem by proposing a restricted-projection-pursuit GP for BO. We conduct experiments on synthetic examples and scientific and hyper-parameter tuning tasks in many cases. Our method outperforms existing approaches even when the function does not meet the projected additive assumption. Last, we study the validity of the additive and projected-additive assumption in practice.'
volume: 51
URL: http://proceedings.mlr.press/v51/li16e.html
PDF: http://proceedings.mlr.press/v51/li16e.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-li16e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Chun-Liang
- family: Kandasamy
given: Kirthevasan
- family: Poczos
given: Barnabas
- family: Schneider
given: Jeff
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 884-892
id: li16e
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 884
lastpage: 892
published: 2016-05-02 00:00:00 +0000
- title: 'On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games'
abstract: 'The main contribution of this paper consists in extending several non-stationary Reinforcement Learning (RL) algorithms and their theoretical guarantees to the case of γ-discounted zero-sum Markov Games (MGs). As in the case of Markov Decision Processes (MDPs), non-stationary algorithms are shown to exhibit better performance bounds compared to their stationary counterparts. The obtained bounds are generically composed of three terms: 1) a dependency on γ(discount factor), 2) a concentrability coefficient and 3) a propagation error term. This error, depending on the algorithm, can be caused by a regression step, a policy evaluation step or a best-response evaluation step. As a second contribution, we empirically demonstrate, on generic MGs (called Garnets), that non-stationary algorithms outperform their stationary counterparts. In addition, it is shown that their performance mostly depends on the nature of the propagation error. Indeed, algorithms where the error is due to the evaluation of a best-response are penalized (even if they exhibit better concentrability coefficients and dependencies on γ) compared to those suffering from a regression error.'
volume: 51
URL: http://proceedings.mlr.press/v51/perolat16.html
PDF: http://proceedings.mlr.press/v51/perolat16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-perolat16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Pérolat
given: Julien
- family: Piot
given: Bilal
- family: Scherrer
given: Bruno
- family: Pietquin
given: Olivier
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 893-901
id: perolat16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 893
lastpage: 901
published: 2016-05-02 00:00:00 +0000
- title: 'Semi-Supervised Learning with Adaptive Spectral Transform'
abstract: 'This paper proposes a novel nonparametric framework for semi-supervised learning and for optimizing the Laplacian spectrum of the data manifold simultaneously. Our formulation leads to a convex optimization problem that can be efficiently solved via the bundle method, and can be interpreted as to asymptotically minimize the generalization error bound of semi-supervised learning with respect to the graph spectrum. Experiments over benchmark datasets in various domains show advantageous performance of the proposed method over strong baselines.'
volume: 51
URL: http://proceedings.mlr.press/v51/liu16.html
PDF: http://proceedings.mlr.press/v51/liu16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-liu16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Liu
given: Hanxiao
- family: Yang
given: Yiming
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 902-910
id: liu16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 902
lastpage: 910
published: 2016-05-02 00:00:00 +0000
- title: 'Pseudo-Marginal Slice Sampling'
abstract: 'Markov chain Monte Carlo (MCMC) methods asymptotically sample from complex probability distributions. The pseudo-marginal MCMC framework only requires an unbiased estimator of the unnormalized probability distribution function to construct a Markov chain. However, the resulting chains are harder to tune to a target distribution than conventional MCMC, and the types of updates available are limited. We describe a general way to clamp and update the random numbers used in a pseudo-marginal method’s unbiased estimator. In this framework we can use slice sampling and other adaptive methods. We obtain more robust Markov chains, which often mix more quickly.'
volume: 51
URL: http://proceedings.mlr.press/v51/murray16.html
PDF: http://proceedings.mlr.press/v51/murray16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-murray16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Murray
given: Iain
- family: Graham
given: Matthew
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 911-919
id: murray16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 911
lastpage: 919
published: 2016-05-02 00:00:00 +0000
- title: 'How to Learn a Graph from Smooth Signals'
abstract: 'We propose a framework to learn the graph structure underlying a set of smooth signals. Given X∈\mathbbR^m\times n whose rows reside on the vertices of an unknown graph, we learn the edge weights w∈\mathbbR_+^m(m-1)/2 under the smoothness assumption that \rm trX^⊤LX is small, where L is the graph Laplacian. We show that the problem is a weighted \ell-1 minimization that leads to naturally sparse solutions. We prove that the standard graph construction with Gaussian weights w_ij = \exp(-\frac1σ^2\|x_i-x_j\|^2) and the previous state of the art are special cases of our framework. We propose a new model and present efficient, scalable primal-dual based algorithms both for this and the previous state of the art, to evaluate their performance on artificial and real data. The new model performs best in most settings.'
volume: 51
URL: http://proceedings.mlr.press/v51/kalofolias16.html
PDF: http://proceedings.mlr.press/v51/kalofolias16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-kalofolias16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kalofolias
given: Vassilis
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 920-929
id: kalofolias16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 920
lastpage: 929
published: 2016-05-02 00:00:00 +0000
- title: 'Ordered Weighted L1 Regularized Regression with Strongly Correlated Covariates: Theoretical Aspects'
abstract: 'This paper studies the ordered weighted L1 (OWL) family of regularizers for sparse linear regression with strongly correlated covariates. We prove sufficient conditions for clustering correlated covariates, extending and qualitatively strengthening previous results for a particular member of the OWL family: OSCAR (octagonal shrinkage and clustering algorithm for regression). We derive error bounds for OWL with correlated Gaussian covariates: for cases in which clusters of covariates are strongly (even perfectly) correlated, but covariates in different clusters are uncorrelated, we show that if the true p-dimensional signal involves only s clusters, then O(s \log p) samples suffice to accurately estimate it, regardless of the number of coefficients within the clusters. Since the estimation of s-sparse signals with completely independent covariates also requires O(s \log p) measurements, this shows that by using OWL regularization, we pay no price (in the number of measurements) for the presence of strongly correlated covariates.'
volume: 51
URL: http://proceedings.mlr.press/v51/figueiredo16.html
PDF: http://proceedings.mlr.press/v51/figueiredo16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-figueiredo16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Figueiredo
given: Mario
- family: Nowak
given: Robert
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 930-938
id: figueiredo16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 930
lastpage: 938
published: 2016-05-02 00:00:00 +0000
- title: 'Pareto Front Identification from Stochastic Bandit Feedback'
abstract: 'We consider the problem of identifying the Pareto front for multiple objectives from a finite set of operating points. Sampling an operating point gives a random vector where each coordinate corresponds to the value of one of the objectives. The Pareto front is the set of operating points that are not dominated by any other operating point in respect to all objectives (considering the mean of their objective values). We propose a confidence bound algorithm to approximate the Pareto front, and prove problem specific lower and upper bounds, showing that the sample complexity is characterized by some natural geometric properties of the operating points. Experiments confirm the reliability of our algorithm. For the problem of finding a sparse cover of the Pareto front, we propose an asymmetric covering algorithm of independent interest.'
volume: 51
URL: http://proceedings.mlr.press/v51/auer16.html
PDF: http://proceedings.mlr.press/v51/auer16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-auer16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Auer
given: Peter
- family: Chiang
given: Chao-Kai
- family: Ortner
given: Ronald
- family: Drugan
given: Madalina
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 939-947
id: auer16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 939
lastpage: 947
published: 2016-05-02 00:00:00 +0000
- title: 'Sketching, Embedding and Dimensionality Reduction in Information Theoretic Spaces'
abstract: 'In this paper we show how to embed information distances like the χ^2 and Jensen-Shannon divergences efficiently in low dimensional spaces while preserving all pairwise distances. We then prove a dimensionality reduction result for the Hellinger, Jensen–Shannon, and χ^2 divergences that preserves the information geometry of the distributions, specifically, by retaining the simplex structure of the space. While our first result already implies these divergences can be explicitly embedded in the Euclidean space, retaining the simplex structure is important because it allows us to do inferences in the reduced space. We also show that these divergences can be sketched efficiently (i.e., up to a multiplicative error in sublinear space) in the aggregate streaming model. This result is exponentially stronger than known upper bounds for sketching these distances in the strict turnstile streaming model.'
volume: 51
URL: http://proceedings.mlr.press/v51/abdullah16.html
PDF: http://proceedings.mlr.press/v51/abdullah16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-abdullah16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Abdullah
given: Amirali
- family: Kumar
given: Ravi
- family: McGregor
given: Andrew
- family: Vassilvitskii
given: Sergei
- family: Venkatasubramanian
given: Suresh
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 948-956
id: abdullah16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 948
lastpage: 956
published: 2016-05-02 00:00:00 +0000
- title: 'AdaDelay: Delay Adaptive Distributed Stochastic Optimization'
abstract: 'We develop distributed stochastic convex optimization algorithms under a delayed gradient model in which server nodes update parameters and worker nodes compute stochastic (sub)gradients. Our setup is motivated by the behavior of real-world distributed computation systems; in particular, we analyze a setting wherein worker nodes can be differently slow at different times. In contrast to existing approaches, we do not impose a worst-case bound on the delays experienced but rather allow the updates to be sensitive to the actual delays experienced. This sensitivity allows use of larger stepsizes, which can help speed up initial convergence without having to wait too long for slower machines; the global convergence rate is still preserved. We experiment with different delay patterns, and obtain noticeable improvements for large-scale real datasets with billions of examples and features.'
volume: 51
URL: http://proceedings.mlr.press/v51/sra16.html
PDF: http://proceedings.mlr.press/v51/sra16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-sra16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sra
given: Suvrit
- family: Yu
given: Adams Wei
- family: Li
given: Mu
- family: Smola
given: Alex
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 957-965
id: sra16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 957
lastpage: 965
published: 2016-05-02 00:00:00 +0000
- title: 'Exponential Stochastic Cellular Automata for Massively Parallel Inference'
abstract: 'We propose an embarrassingly parallel, memory efficient inference algorithm for latent variable models in which the complete data likelihood is in the exponential family. The algorithm is a stochastic cellular automaton and converges to a valid maximum a posteriori fixed point. Applied to latent Dirichlet allocation we find that our algorithm is over an order or magnitude faster than the fastest current approaches. A simple C++/MPI implementation on a 20-node Amazon EC2 cluster samples at more than 1 billion tokens per second. We process 3 billion documents and achieve predictive power competitive with collapsed Gibbs sampling and variational inference.'
volume: 51
URL: http://proceedings.mlr.press/v51/zaheer16.html
PDF: http://proceedings.mlr.press/v51/zaheer16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-zaheer16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zaheer
given: Manzil
- family: Wick
given: Michael
- family: Tristan
given: Jean-Baptiste
- family: Smola
given: Alex
- family: Steele
given: Guy
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 966-975
id: zaheer16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 966
lastpage: 975
published: 2016-05-02 00:00:00 +0000
- title: 'Globally Sparse Probabilistic PCA'
abstract: 'With the flourishing development of high-dimensional data, sparse versions of principal component analysis (PCA) have imposed themselves as simple, yet powerful ways of selecting relevant features in an unsupervised manner. However, when several sparse principal components are computed, the interpretation of the selected variables may be difficult since each axis has its own sparsity pattern and has to be interpreted separately. To overcome this drawback, we propose a Bayesian procedure that allows to obtain several sparse components with the same sparsity pattern. To this end, using Roweis’ probabilistic interpretation of PCA and an isotropic Gaussian prior on the loading matrix, we provide the first exact computation of the marginal likelihood of a Bayesian PCA model. In order to avoid the drawbacks of discrete model selection, we propose a simple relaxation of our framework which allows to find a path of models using a variational expectation-maximization algorithm. The exact marginal likelihood can eventually be maximized over this path, relying on Occam’s razor to select the relevant variables. Since the sparsity pattern is common to all components, we call this approach globally sparse probabilistic PCA (GSPPCA). Its usefulness is illustrated on synthetic data sets and on several real unsupervised feature selection problems.'
volume: 51
URL: http://proceedings.mlr.press/v51/mattei16.html
PDF: http://proceedings.mlr.press/v51/mattei16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-mattei16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Mattei
given: Pierre-Alexandre
- family: Bouveyron
given: Charles
- family: Latouche
given: Pierre
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 976-984
id: mattei16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 976
lastpage: 984
published: 2016-05-02 00:00:00 +0000
- title: 'Provable Bayesian Inference via Particle Mirror Descent'
abstract: 'Bayesian methods are appealing in their flexibility in modeling complex data and ability in capturing uncertainty in parameters. However, when Bayes’ rule does not result in tractable closed-form, most approximate inference algorithms lack either scalability or rigorous guarantees. To tackle this challenge, we propose a simple yet provable algorithm, Particle Mirror Descent (PMD), to iteratively approximate the posterior density. PMD is inspired by stochastic functional mirror descent where one descends in the density space using a small batch of data points at each iteration, and by particle filtering where one uses samples to approximate a function. We prove result of the first kind that, with m particles, PMD provides a posterior density estimator that converges in terms of KL-divergence to the true posterior in rate O(1/\sqrtm). We demonstrate competitive empirical performances of PMD compared to several approximate inference algorithms in mixture models, logistic regression, sparse Gaussian processes and latent Dirichlet allocation on large scale datasets.'
volume: 51
URL: http://proceedings.mlr.press/v51/dai16.html
PDF: http://proceedings.mlr.press/v51/dai16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-dai16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Dai
given: Bo
- family: He
given: Niao
- family: Dai
given: Hanjun
- family: Song
given: Le
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 985-994
id: dai16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 985
lastpage: 994
published: 2016-05-02 00:00:00 +0000
- title: 'Unsupervised Feature Selection by Preserving Stochastic Neighbors'
abstract: 'Feature selection is an important technique for alleviating the curse of dimensionality. Unsupervised feature selection is more challenging than its supervised counterpart due to the lack of labels. In this paper, we present an effective method, Stochastic Neighbor-preserving Feature Selection (SNFS), for selecting discriminative features in unsupervised setting. We employ the concept of stochastic neighbors and select the features that can best preserve such stochastic neighbors by minimizing the Kullback-Leibler (KL) Divergence between neighborhood distributions. The proposed approach measures feature utility jointly in a non-linear way and discriminative features can be selected due to its ’push-pull’ property. We develop an efficient algorithm for optimizing the objective function based on projected quasi-Newton method. Moreover, few existing methods provide ways for determining the optimal number of selected features and this hampers their utility in practice. Our approach is equipped with a guideline for choosing the number of features, which provides nearly optimal performance in our experiments. Experimental results show that the proposed method outperforms state-of-the-art methods significantly on several real-world datasets.'
volume: 51
URL: http://proceedings.mlr.press/v51/wei16.html
PDF: http://proceedings.mlr.press/v51/wei16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-wei16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wei
given: Xiaokai
- family: Yu
given: Philip S.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 995-1003
id: wei16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 995
lastpage: 1003
published: 2016-05-02 00:00:00 +0000
- title: 'Improved Learning Complexity in Combinatorial Pure Exploration Bandits'
abstract: 'We study the problem of combinatorial pure exploration in the stochastic multi-armed bandit problem. We first construct a new measure of complexity that provably characterizes the learning performance of the algorithms we propose for the fixed confidence and the fixed budget setting. We show that this complexity is never higher than the one in existing work and illustrate a number of configurations in which it can be significantly smaller. While in general this improvement comes at the cost of increased computational complexity, we provide a series of examples, including a planning problem, where this extra cost is not significant.'
volume: 51
URL: http://proceedings.mlr.press/v51/gabillon16.html
PDF: http://proceedings.mlr.press/v51/gabillon16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-gabillon16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gabillon
given: Victor
- family: Lazaric
given: Alessandro
- family: Ghavamzadeh
given: Mohammad
- family: Ortner
given: Ronald
- family: Bartlett
given: Peter
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1004-1012
id: gabillon16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1004
lastpage: 1012
published: 2016-05-02 00:00:00 +0000
- title: 'Scalable Gaussian Processes for Characterizing Multidimensional Change Surfaces'
abstract: 'We present a scalable Gaussian process model for identifying and characterizing smooth multidimensional changepoints, and automatically learning changes in expressive covariance structure. We use Random Kitchen Sink features to flexibly define a change surface in combination with expressive spectral mixture kernels to capture the complex statistical structure. Finally, through the use of novel methods for additive non-separable kernels, we can scale the model to large datasets. We demonstrate the model on numerical and real world data, including a large spatio-temporal disease dataset where we identify previously unknown heterogeneous changes in space and time.'
volume: 51
URL: http://proceedings.mlr.press/v51/herlands16.html
PDF: http://proceedings.mlr.press/v51/herlands16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-herlands16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Herlands
given: William
- family: Wilson
given: Andrew
- family: Nickisch
given: Hannes
- family: Flaxman
given: Seth
- family: Neill
given: Daniel
- family: Van Panhuis
given: Wilbert
- family: Xing
given: Eric
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1013-1021
id: herlands16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1013
lastpage: 1021
published: 2016-05-02 00:00:00 +0000
- title: 'Optimization as Estimation with Gaussian Processes in Bandit Settings'
abstract: 'Recently, there has been rising interest in Bayesian optimization – the optimization of an unknown function with assumptions usually expressed by a Gaussian Process (GP) prior. We study an optimization strategy that directly uses an estimate of the argmax of the function. This strategy offers both practical and theoretical advantages: no tradeoff parameter needs to be selected, and, moreover, we establish close connections to the popular GP-UCB and GP-PI strategies. Our approach can be understood as automatically and adaptively trading off exploration and exploitation in GP-UCB and GP-PI. We illustrate the effects of this adaptive tuning via bounds on the regret as well as an extensive empirical evaluation on robotics and vision tasks, demonstrating the robustness of this strategy for a range of performance criteria.'
volume: 51
URL: http://proceedings.mlr.press/v51/wang16f.html
PDF: http://proceedings.mlr.press/v51/wang16f.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-wang16f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Zi
- family: Zhou
given: Bolei
- family: Jegelka
given: Stefanie
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1022-1031
id: wang16f
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1022
lastpage: 1031
published: 2016-05-02 00:00:00 +0000
- title: 'A Convex Surrogate Operator for General Non-Modular Loss Functions'
abstract: 'Empirical risk minimization frequently employs convex surrogates to underlying discrete loss functions in order to achieve computational tractability during optimization. However, classical convex surrogates can only tightly bound modular loss functions, submodular functions or supermodular functions separately while maintaining polynomial time computation. In this work, a novel generic convex surrogate for general non-modular loss functions is introduced, which provides for the first time a tractable solution for loss functions that are neither supermodular nor submodular. This convex surrogate is based on a submodular-supermodular decomposition for which the existence and uniqueness is proven in this paper. It takes the sum of two convex surrogates that separately bound the supermodular component and the submodular component using slack-rescaling and the Lovasz hinge, respectively. It is further proven that this surrogate is convex, piecewise linear, an extension of the loss function, and for which subgradient computation is polynomial time. Empirical results are reported on a non-submodular loss based on the Sorensen-Dice difference function, and a real-world face track dataset with tens of thousands of frames, demonstrating the improved performance, efficiency, and scalability of the novel convex surrogate.'
volume: 51
URL: http://proceedings.mlr.press/v51/yu16.html
PDF: http://proceedings.mlr.press/v51/yu16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-yu16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Yu
given: Jiaqian
- family: Blaschko
given: Matthew
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1032-1041
id: yu16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1032
lastpage: 1041
published: 2016-05-02 00:00:00 +0000
- title: 'Inference for High-dimensional Exponential Family Graphical Models'
abstract: 'Probabilistic graphical models have been widely used to model complex systems and aid scientific discoveries. Most existing work on high-dimensional estimation of exponential family graphical models, including Gaussian graphical models and Ising models, is focused on consistent model selection. However, these results do not characterize uncertainty in the estimated structure and are of limited value to scientists who worry whether their findings will be reproducible and if the estimated edges are present in the model due to random chance. In this paper, we propose a novel estimator for edge parameters in an exponential family graphical models. We prove that the estimator is \sqrtn-consistent and asymptotically Normal. This result allows us to construct confidence intervals for edge parameters, as well as, hypothesis tests. We establish our results under conditions that are typically assumed in the literature for consistent estimation. However, we do not require that the estimator consistently recovers the graph structure. In particular, we prove that the asymptotic distribution of the estimator is robust to model selection mistakes and uniformly valid for a large number of data-generating processes. We illustrate validity of our estimator through extensive simulation studies.'
volume: 51
URL: http://proceedings.mlr.press/v51/wang16g.html
PDF: http://proceedings.mlr.press/v51/wang16g.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-wang16g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Jialei
- family: Kolar
given: Mladen
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1042-1050
id: wang16g
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1042
lastpage: 1050
published: 2016-05-02 00:00:00 +0000
- title: 'Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization'
abstract: 'Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SG-MCMC algorithm. Furthermore, we extend recent SG-MCMC methods with two key components: i) adaptive preconditioners (as in ADAgrad or RMSprop), and ii) adaptive element-wise momentum weights. The zero-temperature limit gives a novel stochastic optimization method with adaptive element-wise momentum weights, while conventional optimization methods only have a shared, static momentum weight. Under certain assumptions, our theoretical analysis suggests the proposed simulated annealing approach converges close to the global optima. Experiments on several deep neural network models show state-of-the-art results compared to related stochastic optimization algorithms.'
volume: 51
URL: http://proceedings.mlr.press/v51/chen16c.html
PDF: http://proceedings.mlr.press/v51/chen16c.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-chen16c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Changyou
- family: Carlson
given: David
- family: Gan
given: Zhe
- family: Li
given: Chunyuan
- family: Carin
given: Lawrence
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1051-1060
id: chen16c
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1051
lastpage: 1060
published: 2016-05-02 00:00:00 +0000
- title: 'Fitting Spectral Decay with the k-Support Norm'
abstract: 'The spectral k-support norm enjoys good estimation properties in low rank matrix learning problems, empirically outperforming the trace norm. Its unit ball is the convex hull of rank k matrices with unit Frobenius norm. In this paper we generalize the norm to the spectral (k,p)-support norm, whose additional parameter p can be used to tailor the norm to the decay of the spectrum of the underlying model. We characterize the unit ball and we explicitly compute the norm. We further provide a conditional gradient method to solve regularization problems with the norm, and we derive an efficient algorithm to compute the Euclidean projection on the unit ball in the case p=∞. In numerical experiments, we show that allowing p to vary significantly improves performance over the spectral k-support norm on various matrix completion benchmarks, and better captures the spectral decay of the underlying model.'
volume: 51
URL: http://proceedings.mlr.press/v51/mcdonald16.html
PDF: http://proceedings.mlr.press/v51/mcdonald16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-mcdonald16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: McDonald
given: Andrew
- family: Pontil
given: Massimiliano
- family: Stamos
given: Dimitris
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1061-1069
id: mcdonald16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1061
lastpage: 1069
published: 2016-05-02 00:00:00 +0000
- title: 'Early Stopping as Nonparametric Variational Inference'
abstract: 'We show that unconverged stochastic gradient descent can be interpreted as sampling from a nonparametric approximate posterior distribution. This distribution is implicitly defined by the transformation of an initial distribution by a sequence of optimization steps. By tracking the change in entropy of this distribution during optimization, we give a scalable, unbiased estimate of a variational lower bound on the log marginal likelihood. This bound can be used to optimize hyperparameters instead of cross-validation. This Bayesian interpretation of SGD also suggests new overfitting-resistant optimization procedures, and gives a theoretical foundation for early stopping and ensembling. We investigate the properties of this marginal likelihood estimator on neural network models.'
volume: 51
URL: http://proceedings.mlr.press/v51/duvenaud16.html
PDF: http://proceedings.mlr.press/v51/duvenaud16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-duvenaud16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Duvenaud
given: David
- family: Maclaurin
given: Dougal
- family: Adams
given: Ryan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1070-1077
id: duvenaud16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1070
lastpage: 1077
published: 2016-05-02 00:00:00 +0000
- title: 'Bayesian Nonparametric Kernel-Learning'
abstract: 'Kernel methods are ubiquitous tools in machine learning. They have proven to be effective in many domains and tasks. Yet, kernel methods often require the user to select a predefined kernel to build an estimator with. However, there is often little reason for the common practice of selecting a kernel a priori. Even if a universal approximating kernel is selected, the quality of the finite sample estimator may be greatly affected by the choice of kernel. Furthermore, when directly applying kernel methods, one typically needs to compute a N \times N Gram matrix of pairwise kernel evaluations to work with a dataset of N instances. The computation of this Gram matrix precludes the direct application of kernel methods on large datasets, and makes kernel learning especially difficult. In this paper we introduce Bayesian nonparmetric kernel-learning (BaNK), a generic, data-driven framework for scalable learning of kernels. BaNK places a nonparametric prior on the spectral distribution of random frequencies allowing it to both learn kernels and scale to large datasets. We show that this framework can be used for large scale regression and classification tasks. Furthermore, we show that BaNK outperforms several other scalable approaches for kernel learning on a variety of real world datasets.'
volume: 51
URL: http://proceedings.mlr.press/v51/oliva16.html
PDF: http://proceedings.mlr.press/v51/oliva16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-oliva16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Oliva
given: Junier B.
- family: Dubey
given: Avinava
- family: Wilson
given: Andrew G.
- family: Poczos
given: Barnabas
- family: Schneider
given: Jeff
- family: Xing
given: Eric P.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1078-1086
id: oliva16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1078
lastpage: 1086
published: 2016-05-02 00:00:00 +0000
- title: 'Tight Variational Bounds via Random Projections and I-Projections'
abstract: 'Information projections are the key building block of variational inference algorithms and are used to approximate a target probabilistic model by projecting it onto a family of tractable distributions. In general, there is no guarantee on the quality of the approximation obtained. To overcome this issue, we introduce a new class of random projections to reduce the dimensionality and hence the complexity of the original model. In the spirit of random projections, the projection preserves (with high probability) key properties of the target distribution. We show that information projections can be combined with random projections to obtain provable guarantees on the quality of the approximation obtained, regardless of the complexity of the original model. We demonstrate empirically that augmenting mean field with a random projection step dramatically improves partition function and marginal probability estimates, both on synthetic and real world data.'
volume: 51
URL: http://proceedings.mlr.press/v51/hsu16.html
PDF: http://proceedings.mlr.press/v51/hsu16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-hsu16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hsu
given: Lun-Kai
- family: Achim
given: Tudor
- family: Ermon
given: Stefano
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1087-1095
id: hsu16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1087
lastpage: 1095
published: 2016-05-02 00:00:00 +0000
- title: 'Bethe Learning of Graphical Models via MAP Decoding'
abstract: 'Many machine learning tasks require fitting probabilistic models over structured objects, such as pixel grids, matchings, and graph edges. Maximum likelihood estimation (MLE) for such domains is challenging due to the intractability of computing partition functions. One can resort to approximate marginal inference in conjunction with gradient descent, but such algorithms require careful tuning. Alternatively, in frameworks such as the structured support vector machine (SVM-Struct), discriminative functions are learned by iteratively applying efficient maximum a posteriori (MAP) decoders. We introduce MLE-Struct, a method for learning discrete exponential family models using the Bethe approximation to the partition function. Remarkably, this problem can also be reduced to iterative (MAP) decoding. This connection emerges by combining the Bethe approximation with the Frank-Wolfe (FW) algorithm on a convex dual objective, which circumvents the intractable partition function. Our method can learn both generative and conditional models and is substantially faster and easier to implement than existing MLE approaches while still relying on the same black-box interface to MAP decoding as SVM-Struct. We perform competitively on problems in denoising, segmentation, matching, and new datasets of roommate assignments and news and financial time series.'
volume: 51
URL: http://proceedings.mlr.press/v51/tang16a.html
PDF: http://proceedings.mlr.press/v51/tang16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-tang16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tang
given: Kui
- family: Ruozzi
given: Nicholas
- family: Belanger
given: David
- family: Jebara
given: Tony
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1096-1104
id: tang16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1096
lastpage: 1104
published: 2016-05-02 00:00:00 +0000
- title: 'Determinantal Regularization for Ensemble Variable Selection'
abstract: 'Recent years have seen growing interest in deterministic search approaches to spike-and-slab Bayesian variable selection. Such methods have focused on the goal of finding a global mode to identify a “best model”. However, the report of a single model will be a misleading reflection of the model uncertainty inherent in a highly multimodal posterior. Motivated by non-parametric variational Bayes strategies, we move beyond this limitation by proposing an ensemble optimization approach to identify a collection of representative posterior modes. By deploying determinantal penalty functions as diversity regularizers, our approach performs regularization over multiple locations of the posterior. The key driver of these determinantal penalties is a kernel function that induces repulsion in the latent model space domain.'
volume: 51
URL: http://proceedings.mlr.press/v51/rockova16.html
PDF: http://proceedings.mlr.press/v51/rockova16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-rockova16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Rockova
given: Veronika
- family: Moran
given: Gemma
- family: George
given: Edward
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1105-1113
id: rockova16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1105
lastpage: 1113
published: 2016-05-02 00:00:00 +0000
- title: 'Scalable and Sound Low-Rank Tensor Learning'
abstract: 'Many real-world data arise naturally as tensors. Equipped with a low rank prior, learning algorithms can benefit from exploiting the rich dependency encoded in a tensor. Despite its prevalence in low-rank matrix learning, trace norm ceases to be tractable in tensors and therefore most existing works resort to matrix unfolding. Although some theoretical guarantees are available, these approaches may lose valuable structure information and are not scalable in general. To address this problem, we propose directly optimizing the tensor trace norm by approximating its dual spectral norm, and we show that the approximation bounds can be efficiently converted to the original problem via the generalized conditional gradient algorithm. The resulting approach is scalable to large datasets, and matches state-of-the-art recovery guarantees. Experimental results on tensor completion and multitask learning confirm the superiority of the proposed method.'
volume: 51
URL: http://proceedings.mlr.press/v51/cheng16.html
PDF: http://proceedings.mlr.press/v51/cheng16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-cheng16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Cheng
given: Hao
- family: Yu
given: Yaoliang
- family: Zhang
given: Xinhua
- family: Xing
given: Eric
- family: Schuurmans
given: Dale
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1114-1123
id: cheng16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1114
lastpage: 1123
published: 2016-05-02 00:00:00 +0000
- title: 'Non-negative Matrix Factorization for Discrete Data with Hierarchical Side-Information'
abstract: 'We present a probabilistic framework for efficient non-negative matrix factorization of discrete (count/binary) data with side-information. The side-information is given as a multi-level structure, taxonomy, or ontology, with nodes at each level being categorical-valued observations. For example, when modeling documents with a two-level side-information (documents being at level-zero), level-one may represent (one or more) authors associated with each document and level-two may represent affiliations of each author. The model easily generalizes to more than two levels (or taxonomy/ontology of arbitrary depth). Our model can learn embeddings of entities present at each level in the data/side-information hierarchy (e.g., documents, authors, affiliations, in the previous example), with appropriate sharing of information across levels. The model also enjoys full local conjugacy, facilitating efficient Gibbs sampling for model inference. Inference cost scales in the number of non-zero entries in the data matrix, which is especially appealing for real-world massive but sparse matrices. We demonstrate the effectiveness of the model on several real-world data sets.'
volume: 51
URL: http://proceedings.mlr.press/v51/hu16c.html
PDF: http://proceedings.mlr.press/v51/hu16c.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-hu16c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hu
given: Changwei
- family: Rai
given: Piyush
- family: Carin
given: Lawrence
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1124-1132
id: hu16c
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1124
lastpage: 1132
published: 2016-05-02 00:00:00 +0000
- title: 'Topic-Based Embeddings for Learning from Large Knowledge Graphs'
abstract: 'We present a scalable probabilistic framework for learning from multi-relational data given in form of entity-relation-entity triplets, with a potentially massive number of entities and relations (e.g., in multi-relational networks, knowledge bases, etc.). We define each triplet via a relation-specific bilinear function of the embeddings of entities associated with it (these embeddings correspond to “topics”). To handle massive number of relations and the data sparsity problem (very few observations per relation), we also extend this model to allow sharing of parameters across relations, which leads to a substantial reduction in the number of parameters to be learned. In addition to yielding excellent predictive performance (e.g., for knowledge base completion tasks), the interpretability of our topic-based embedding framework enables easy qualitative analyses. Computational cost of our models scales in the number of positive triplets, which makes it easy to scale to massive real-world multi-relational data sets, which are usually extremely sparse. We develop simple-to-implement batch as well as online Gibbs sampling algorithms and demonstrate the effectiveness of our models on tasks such as multi-relational link-prediction, and learning from large knowledge bases.'
volume: 51
URL: http://proceedings.mlr.press/v51/hu16d.html
PDF: http://proceedings.mlr.press/v51/hu16d.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-hu16d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hu
given: Changwei
- family: Rai
given: Piyush
- family: Carin
given: Lawrence
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1133-1141
id: hu16d
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1133
lastpage: 1141
published: 2016-05-02 00:00:00 +0000
- title: 'Consistently Estimating Markov Chains with Noisy Aggregate Data'
abstract: 'We address the problem of estimating the parameters of a time-homogeneous Markov chain given only noisy, aggregate data. This arises when a population of individuals behave independently according to a Markov chain, but individual sample paths cannot be observed due to limitations of the observation process or the need to protect privacy. Instead, only population-level counts of the number of individuals in each state at each time step are available. When these counts are exact, a conditional least squares (CLS) estimator is known to be consistent and asymptotically normal. We initiate the study of method of moments estimators for this problem to handle the more realistic case when observations are additionally corrupted by noise. We show that CLS can be interpreted as a simple “plug-in” method of moments estimator. However, when observations are noisy, it is not consistent because it fails to account for additional variance introduced by the noise. We develop a new, simpler method of moments estimator that bypasses this problem and is consistent under noisy observations.'
volume: 51
URL: http://proceedings.mlr.press/v51/bernstein16.html
PDF: http://proceedings.mlr.press/v51/bernstein16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-bernstein16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bernstein
given: Garrett
- family: Sheldon
given: Daniel
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1142-1150
id: bernstein16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1142
lastpage: 1150
published: 2016-05-02 00:00:00 +0000
- title: 'Unwrapping ADMM: Efficient Distributed Computing via Transpose Reduction'
abstract: 'Recent approaches to distributed model fitting rely heavily on consensus ADMM, where each node solves small sub-problems using only local data. We propose iterative methods that solve global sub-problems over an entire distributed dataset. This is possible using transpose reduction strategies that allow a single node to solve least-squares over massive datasets without putting all the data in one place. This results in simple iterative methods that avoid the expensive inner loops required for consensus methods. We analyze the convergence rates of the proposed schemes and demonstrate the efficiency of this approach by fitting linear classifiers and sparse linear models to large datasets using a distributed implementation with up to 20,000 cores in far less time than previous approaches.'
volume: 51
URL: http://proceedings.mlr.press/v51/goldstein16.html
PDF: http://proceedings.mlr.press/v51/goldstein16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-goldstein16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Goldstein
given: Tom
- family: Taylor
given: Gavin
- family: Barabin
given: Kawika
- family: Sayre
given: Kent
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1151-1158
id: goldstein16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1151
lastpage: 1158
published: 2016-05-02 00:00:00 +0000
- title: 'Improper Deep Kernels'
abstract: 'Neural networks have recently re-emerged as a powerful hypothesis class, yielding impressive classification accuracy in multiple domains. However, their training is a non convex optimization problem. Here we address this difficulty by turning to "improper learning" of neural nets. In other words, we learn a classifier that is not a neural net but is competitive with the best neural net model given a sufficient number of training examples. Our approach relies on a novel kernel which integrates over the set of possible neural models. It turns out that the corresponding integral can be evaluated in closed form via a simple recursion. The learning problem is then an SVM with this kernel, and a global optimum can thus be found efficiently. We also provide sample complexity results which depend on the stability of the optimal neural net.'
volume: 51
URL: http://proceedings.mlr.press/v51/heinemann16.html
PDF: http://proceedings.mlr.press/v51/heinemann16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-heinemann16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Heinemann
given: Uri
- family: Livni
given: Roi
- family: Eban
given: Elad
- family: Elidan
given: Gal
- family: Globerson
given: Amir
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1159-1167
id: heinemann16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1159
lastpage: 1167
published: 2016-05-02 00:00:00 +0000
- title: 'Unbounded Bayesian Optimization via Regularization'
abstract: 'Bayesian optimization has recently emerged as a powerful and flexible tool in machine learning for hyperparameter tuning and more generally for the efficient global optimization of expensive black box functions. The established practice requires a user-defined bounded domain, which is assumed to contain the global optimizer. However, when little is known about the probed objective function, it can be difficult to prescribe such a domain. In this work, we modify the standard Bayesian optimization framework in a principled way to allow for unconstrained exploration of the search space. We introduce a new alternative method and compare it to a volume doubling baseline on two common synthetic benchmarking test functions. Finally, we apply our proposed methods on the task of tuning the stochastic gradient descent optimizer for both a multi-layered perceptron and a convolutional neural network on the MNIST dataset.'
volume: 51
URL: http://proceedings.mlr.press/v51/shahriari16.html
PDF: http://proceedings.mlr.press/v51/shahriari16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-shahriari16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Shahriari
given: Bobak
- family: Bouchard-Cote
given: Alexandre
- family: Freitas
given: Nando
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1168-1176
id: shahriari16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1168
lastpage: 1176
published: 2016-05-02 00:00:00 +0000
- title: 'Non-Gaussian Component Analysis with Log-Density Gradient Estimation'
abstract: 'Non-Gaussian component analysis (NGCA) is aimed at identifying a linear subspace such that the projected data follows a non-Gaussian distribution. In this paper, we propose a novel NGCA algorithm based on log-density gradient estimation. Unlike existing methods, the proposed NGCA algorithm identifies the linear subspace by using the eigenvalue decomposition without any iterative procedures, and thus is computationally reasonable. Furthermore, through theoretical analysis, we prove that the identified subspace converges to the true subspace at the optimal parametric rate. Finally, the practical performance of the proposed algorithm is demonstrated on both artificial and benchmark datasets.'
volume: 51
URL: http://proceedings.mlr.press/v51/sasaki16.html
PDF: http://proceedings.mlr.press/v51/sasaki16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-sasaki16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sasaki
given: Hiroaki
- family: Niu
given: Gang
- family: Sugiyama
given: Masashi
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1177-1185
id: sasaki16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1177
lastpage: 1185
published: 2016-05-02 00:00:00 +0000
- title: 'Online Learning with Noisy Side Observations'
abstract: 'We propose a new partial-observability model for online learning problems where the learner, besides its own loss, also observes some noisy feedback about the other actions, depending on the underlying structure of the problem. We represent this structure by a weighted directed graph, where the edge weights are related to the quality of the feedback shared by the connected nodes. Our main contribution is an efficient algorithm that guarantees a regret of O(\sqrt(α^* T) after T rounds, where α^* is a novel graph property that we call the effective independence number. Our algorithm is completely parameter-free and does not require knowledge (or even estimation) of alpha^*. For the special case of binary edge weights, our setting reduces to the partial-observability models of Mannor & Shamir (2011) and Alon et al. (2013) and our algorithm recovers the near-optimal regret bounds.'
volume: 51
URL: http://proceedings.mlr.press/v51/kocak16.html
PDF: http://proceedings.mlr.press/v51/kocak16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-kocak16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kocák
given: Tomáš
- family: Neu
given: Gergely
- family: Valko
given: Michal
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1186-1194
id: kocak16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1186
lastpage: 1194
published: 2016-05-02 00:00:00 +0000
- title: 'Black-Box Policy Search with Probabilistic Programs'
abstract: 'In this work we show how to represent policies as programs: that is, as stochastic simulators with tunable parameters. To learn the parameters of such policies we develop connections between black box variational inference and existing policy search approaches. We then explain how such learning can be implemented in a probabilistic programming system. Using our own novel implementation of such a system we demonstrate both conciseness of policy representation and automatic policy parameter learning for a set of canonical reinforcement learning problems.'
volume: 51
URL: http://proceedings.mlr.press/v51/vandemeent16.html
PDF: http://proceedings.mlr.press/v51/vandemeent16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-vandemeent16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Vandemeent
given: Jan-Willem
- family: Paige
given: Brooks
- family: Tolpin
given: David
- family: Wood
given: Frank
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1195-1204
id: vandemeent16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1195
lastpage: 1204
published: 2016-05-02 00:00:00 +0000
- title: 'Efficient Bregman Projections onto the Permutahedron and Related Polytopes'
abstract: 'The problem of projecting onto the permutahedron \mathbfPH(c) – the convex hull of all permutations of a fixed vector c – under a uniformly separable Bregman divergence is shown to be reducible to the Isotonic Optimization problem. This allows us to employ known fast algorithms to improve on several recent results on Bregman projections onto permutahedra. In addition, we present a new algorithm \rm mergepool that have better complexity when the number of distinct entries d in the vector c is small, the simplex being one such example, with c=(1,0,0,\dotsc,0)^T and d=2. \rm mergepool runs in O(n \log d) for certain popular Bregman divergence measures and requires O((n \log d) \log \fracU ε) to find ε-close solutions for general uniformly separable Bregman divergences, where U is a bound on the width of the interval containing the dual solution components. These estimates matches or improves best known bounds for all Bregman projection problems onto various permutahedra, including recent results for projection onto the simplex. The same complexity bounds apply to signed permutahedra, a class that includes the \ell_1-ball as a special case. In summary, this work describes a fast unified approach to this well-known class of problems.'
volume: 51
URL: http://proceedings.mlr.press/v51/lim16.html
PDF: http://proceedings.mlr.press/v51/lim16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-lim16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lim
given: Cong Han
- family: Wright
given: Stephen J.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1205-1213
id: lim16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1205
lastpage: 1213
published: 2016-05-02 00:00:00 +0000
- title: 'On Searching for Generalized Instrumental Variables'
abstract: 'Instrumental Variables are a popular way to identify the direct causal effect of a random variable X on a variable Y. Often no single instrumental variable exists, although it is still possible to find a set of generalized instrumental variables (GIVs) and identify the causal effect of all these variables at once. Till now it was not known how to find GIVs systematically or even test efficiently, if given variables satisfy GIV conditions. We provide fast algorithms for searching and testing restricted cases of GIVs. However, we prove that in the most general case it is NP-hard to verify if given variables fulfill the conditions of a general instrumental sets.'
volume: 51
URL: http://proceedings.mlr.press/v51/vanderzander16.html
PDF: http://proceedings.mlr.press/v51/vanderzander16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-vanderzander16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zander
given: Benito
- family: Liśkiewicz
given: Maciej
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1214-1222
id: vanderzander16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1214
lastpage: 1222
published: 2016-05-02 00:00:00 +0000
- title: 'Provable Tensor Methods for Learning Mixtures of Generalized Linear Models'
abstract: 'We consider the problem of learning mixtures of generalized linear models (GLM) which arise in classification and regression problems. Typical learning approaches such as expectation maximization (EM) or variational Bayes can get stuck in spurious local optima. In contrast, we present a tensor decomposition method which is guaranteed to correctly recover the parameters. The key insight is to employ certain feature transformations of the input, which depend on the input generative model. Specifically, we employ score function tensors of the input and compute their cross-correlation with the response variable. We establish that the decomposition of this tensor consistently recovers the parameters, under mild non-degeneracy conditions. We demonstrate that the computational and sample complexity of our method is a low order polynomial of the input and the latent dimensions.'
volume: 51
URL: http://proceedings.mlr.press/v51/sedghi16.html
PDF: http://proceedings.mlr.press/v51/sedghi16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-sedghi16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sedghi
given: Hanie
- family: Janzamin
given: Majid
- family: Anandkumar
given: Anima
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1223-1231
id: sedghi16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1223
lastpage: 1231
published: 2016-05-02 00:00:00 +0000
- title: 'Controlling Bias in Adaptive Data Analysis Using Information Theory'
abstract: 'Modern big data settings often involve messy, high-dimensional data, where it is not clear a priori what are the right questions to ask. To extract the most insights from a dataset, the analyst typically needs to engage in an iterative process of adaptive data analysis. The choice of analytics to be performed next depends on the results of the previous analyses on the same data. It is commonly recognized that such adaptivity (also called researcher degrees of freedom), even if well-intentioned, can lead to false discoveries, contributing to the crisis of reproducibility in science. In this paper, we propose a general information-theoretic framework to quantify and provably bound the bias of arbitrary adaptive analysis process. We prove that our mutual information based bound is tight in natural models. We show how this framework can give rigorous insights into when commonly used feature selection protocols (e.g. rank selection) do and do not lead to biased estimation. We also show how recent insights from differential privacy emerge from this framework when the analyst is assumed to be adversarial, though our bounds applies in more general settings. We illustrate our results with simple simulations.'
volume: 51
URL: http://proceedings.mlr.press/v51/russo16.html
PDF: http://proceedings.mlr.press/v51/russo16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-russo16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Russo
given: Daniel
- family: Zou
given: James
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1232-1240
id: russo16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1232
lastpage: 1240
published: 2016-05-02 00:00:00 +0000
- title: 'A Column Generation Bound Minimization Approach with PAC-Bayesian Generalization Guarantees'
abstract: 'The C-bound, introduced in Lacasse et al (2006), gives a tight upper bound on the risk of the majority vote classifier. Laviolette et al. (2011) designed a learning algorithm named MinCq that outputs a dense distribution on a finite set of base classifiers by minimizing the C-bound, together with a PAC-Bayesian generalization guarantee. In this work, we design a column generation algorithm that we call CqBoost, that optimizes the C-bound and outputs a sparse distribution on a possibly infinite set of voters. We also propose a PAC-Bayesian bound for CqBoost that holds for finite and two cases of continuous sets of base classifiers. Finally, compare the accuracy and the sparsity of CqBoost with MinCq and other state-of-the-art boosting algorithms.'
volume: 51
URL: http://proceedings.mlr.press/v51/roy16.html
PDF: http://proceedings.mlr.press/v51/roy16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-roy16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Roy
given: Jean-Francis
- family: Marchand
given: Mario
- family: Laviolette
given: François
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1241-1249
id: roy16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1241
lastpage: 1249
published: 2016-05-02 00:00:00 +0000
- title: 'Graph Sparsification Approaches for Laplacian Smoothing'
abstract: 'Given a statistical estimation problem where regularization is performed according to the structure of a large, dense graph G, we consider fitting the statistical estimate using a \it sparsified surrogate graph \mathbfG, which shares the vertices of G but has far fewer edges, and is thus more tractable to work with computationally. We examine three types of sparsification: spectral sparsification, which can be seen as the result of sampling edges from the graph with probabilities proportional to their effective resistances, and two simpler sparsifiers, which sample edges uniformly from the graph, either globally or locally. We provide strong theoretical and experimental results, demonstrating that sparsification before estimation can give statistically sensible solutions, with significant computational savings.'
volume: 51
URL: http://proceedings.mlr.press/v51/sadhanala16.html
PDF: http://proceedings.mlr.press/v51/sadhanala16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-sadhanala16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sadhanala
given: Veeru
- family: Wang
given: Yu-Xiang
- family: Tibshirani
given: Ryan
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1250-1259
id: sadhanala16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1250
lastpage: 1259
published: 2016-05-02 00:00:00 +0000
- title: 'Scalable Exemplar Clustering and Facility Location via Augmented Block Coordinate Descent with Column Generation'
abstract: 'In recent years exemplar clustering has become a popular tool for applications in document and video summarization, active learning, and clustering with general similarity, where cluster centroids are required to be a subset of the data samples rather than their linear combinations. The problem is also well-known as facility location in the operations research literature. While the problem has well-developed convex relaxation with approximation and recovery guarantees, it has number of variables grows quadratically with the number of samples. Therefore, state-of-the-art methods can hardly handle more than 10^4 samples (i.e. 10^8 variables). In this work, we propose an Augmented-Lagrangian with Block Coordinate Descent (AL-BCD) algorithm that utilizes problem structure to obtain closed-form solution for each block sub-problem, and exploits low-rank representation of the dissimilarity matrix to search active columns without computing the entire matrix. Experiments show our approach to be orders of magnitude faster than existing approaches and can handle problems of up to 10^6 samples. We also demonstrate successful applications of the algorithm on world-scale facility location, document summarization and active learning.'
volume: 51
URL: http://proceedings.mlr.press/v51/yen16.html
PDF: http://proceedings.mlr.press/v51/yen16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-yen16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Yen
given: Ian En-Hsu
- family: Malioutov
given: Dmitry
- family: Kumar
given: Abhishek
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1260-1269
id: yen16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1260
lastpage: 1269
published: 2016-05-02 00:00:00 +0000
- title: 'Robust Covariate Shift Regression'
abstract: 'In many learning settings, the source data available to train a regression model differs from the target data it encounters when making predictions due to input distribution shift. Appropriately dealing with this situation remains an important challenge. Existing methods attempt to “reweight” the source data samples to better represent the target domain, but this introduces strong inductive biases that are highly extrapolative and can often err greatly in practice. We propose a robust approach for regression under covariate shift that embraces the uncertainty resulting from sample selection bias by producing regression models that are explicitly robust to it. We demonstrate the benefits of our approach on a number of regression tasks.'
volume: 51
URL: http://proceedings.mlr.press/v51/chen16d.html
PDF: http://proceedings.mlr.press/v51/chen16d.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-chen16d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Xiangli
- family: Monfort
given: Mathew
- family: Liu
given: Anqi
- family: Ziebart
given: Brian D.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1270-1279
id: chen16d
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1270
lastpage: 1279
published: 2016-05-02 00:00:00 +0000
- title: 'On Lloyd’s Algorithm: New Theoretical Insights for Clustering in Practice'
abstract: 'We provide new analyses of Lloyd’s algorithm (1982), commonly known as the k-means clustering algorithm. Kumar and Kannan (2010) showed that running k-SVD followed by a constant approximation k-means algorithm, and then Lloyd’s algorithm, will correctly cluster nearly all of the dataset with respect to the optimal clustering, provided the dataset satisfies a deterministic clusterability assumption. This method is viewed as the "Swiss Army knife" for clustering problems, subsuming popular generative models such as Gaussian mixtures. However, it is tailored to high dimensional data, i.e., when d ≫k . We analyze Lloyd’s algorithm for general d without using the spectral projection, which leads to a weaker assumption in the case d < k. Surprisingly, we show that a simple and scalable heuristic that combines random sampling with Single-Linkage serves as a good seeding algorithm for Lloyd’s algorithm under this assumption. We then study stopping criteria for Lloyd’s algorithm under the lens of clusterability, accompanied by controlled simulations.'
volume: 51
URL: http://proceedings.mlr.press/v51/tang16b.html
PDF: http://proceedings.mlr.press/v51/tang16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-tang16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tang
given: Cheng
- family: Monteleoni
given: Claire
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1280-1289
id: tang16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1280
lastpage: 1289
published: 2016-05-02 00:00:00 +0000
- title: 'Towards Stability and Optimality in Stochastic Gradient Descent'
abstract: 'Iterative procedures for parameter estimation based on stochastic gradient descent (SGD) allow the estimation to scale to massive data sets. However, they typically suffer from numerical instability, while estimators based on SGD are statistically inefficient as they do not use all the information in the data set. To address these two issues we propose an iterative estimation procedure termed averaged implicit SGD (AI-SGD). For statistical efficiency AI-SGD employs averaging of the iterates, which achieves the Cramer-Rao bound under strong convexity, i.e., it is asymptotically an optimal unbiased estimator of the true parameter value. For numerical stability AI-SGD employs an implicit update at each iteration, which is similar to updates performed by proximal operators in optimization. In practice, AI-SGD achieves competitive performance with state-of-the-art procedures. Furthermore, it is more stable than averaging procedures that do not employ proximal updates, and is simple to implement as it requires fewer tunable hyperparameters than procedures that do employ proximal updates.'
volume: 51
URL: http://proceedings.mlr.press/v51/toulis16.html
PDF: http://proceedings.mlr.press/v51/toulis16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-toulis16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Toulis
given: Panos
- family: Tran
given: Dustin
- family: Airoldi
given: Edo
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1290-1298
id: toulis16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1290
lastpage: 1298
published: 2016-05-02 00:00:00 +0000
- title: 'Communication Efficient Distributed Agnostic Boosting'
abstract: 'We consider the problem of learning from distributed data in the agnostic setting, i.e., in the presence of arbitrary forms of noise. Our main contribution is a general distributed boosting-based procedure for learning an arbitrary concept space, that is simultaneously noise tolerant, communication efficient, and computationally efficient. This improves significantly over prior works that were either communication efficient only in noise-free scenarios or computationally prohibitive. Empirical results on large synthetic and real-world datasets demonstrate the effectiveness and scalability of the proposed approach.'
volume: 51
URL: http://proceedings.mlr.press/v51/chen16e.html
PDF: http://proceedings.mlr.press/v51/chen16e.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-chen16e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Shang-Tse
- family: Balcan
given: Maria-Florina
- family: Chau
given: Duen Horng
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1299-1307
id: chen16e
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1299
lastpage: 1307
published: 2016-05-02 00:00:00 +0000
- title: 'Private Causal Inference'
abstract: 'Causal inference deals with identifying which random variables ”cause” or control other random variables. Recent advances on the topic of causal inference based on tools from statistical estimation and machine learning have resulted in practical algorithms for causal inference. Causal inference has the potential to have significant impact on medical research, prevention and control of diseases, and identifying factors that impact economic changes to name just a few. However, these promising applications for causal inference are often ones that involve sensitive or personal data of users that need to be kept private (e.g., medical records, personal finances, etc). Therefore, there is a need for the development of causal inference methods that preserve data privacy. We study the problem of inferring causality using the current, popular causal inference framework, the additive noise model (ANM) while simultaneously ensuring privacy of the users. Our framework provides differential privacy guarantees for a variety of ANM variants. We run extensive experiments, and demonstrate that our techniques are practical and easy to implement.'
volume: 51
URL: http://proceedings.mlr.press/v51/kusner16.html
PDF: http://proceedings.mlr.press/v51/kusner16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-kusner16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kusner
given: Matt J.
- family: Sun
given: Yu
- family: Sridharan
given: Karthik
- family: Weinberger
given: Kilian Q.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1308-1317
id: kusner16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1308
lastpage: 1317
published: 2016-05-02 00:00:00 +0000
- title: 'Parallel Markov Chain Monte Carlo via Spectral Clustering'
abstract: 'As it has become common to use many computer cores in routine applications, finding good ways to parallelize popular algorithms has become increasingly important. In this paper, we present a parallelization scheme for Markov chain Monte Carlo (MCMC) methods based on spectral clustering of the underlying state space, generalizing earlier work on parallelization of MCMC methods by state space partitioning. We show empirically that this approach speeds up MCMC sampling for multimodal distributions and that it can be usefully applied in greater generality than several related algorithms. Our algorithm converges under reasonable conditions to an ‘optimal’ MCMC algorithm. We also show that our approach can be asymptotically far more efficient than naive parallelization, even in situations such as completely flat target distributions where no unique optimal algorithm exists. Finally, we combine theoretical and empirical bounds to provide practical guidance on the choice of tuning parameters.'
volume: 51
URL: http://proceedings.mlr.press/v51/basse16a.html
PDF: http://proceedings.mlr.press/v51/basse16a.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-basse16a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Basse
given: Guillaume
- family: Smith
given: Aaron
- family: Pillai
given: Natesh
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1318-1327
id: basse16a
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1318
lastpage: 1327
published: 2016-05-02 00:00:00 +0000
- title: 'Efficient Sampling for k-Determinantal Point Processes'
abstract: 'Determinantal Point Processes (DPPs) are elegant probabilistic models of repulsion and diversity over discrete sets of items. But their applicability to large sets is hindered by expensive cubic-complexity matrix operations for basic tasks such as sampling. In light of this, we propose a new method for approximate sampling from discrete k-DPPs. Our method takes advantage of the diversity property of subsets sampled from a DPP, and proceeds in two stages: first it constructs coresets for the ground set of items; thereafter, it efficiently samples subsets based on the constructed coresets. As opposed to previous approaches, our algorithm aims to minimize the total variation distance to the original distribution. Experiments on both synthetic and real datasets indicate that our sampling algorithm works efficiently on large data sets, and yields more accurate samples than previous approaches.'
volume: 51
URL: http://proceedings.mlr.press/v51/li16f.html
PDF: http://proceedings.mlr.press/v51/li16f.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-li16f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Chengtao
- family: Jegelka
given: Stefanie
- family: Sra
given: Suvrit
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1328-1337
id: li16f
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1328
lastpage: 1337
published: 2016-05-02 00:00:00 +0000
- title: 'A Fast and Reliable Policy Improvement Algorithm'
abstract: 'We introduce a simple, efficient method that improves stochastic policies for Markov decision processes. The computational complexity is the same as that of the value estimation problem. We prove that when the value estimation error is small, this method gives an improvement in performance that increases with certain variance properties of the initial policy and transition dynamics. Performance in numerical experiments compares favorably with previous policy improvement algorithms.'
volume: 51
URL: http://proceedings.mlr.press/v51/abbasi-yadkori16.html
PDF: http://proceedings.mlr.press/v51/abbasi-yadkori16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-abbasi-yadkori16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Abbasi-Yadkori
given: Yasin
- family: Bartlett
given: Peter L.
- family: Wright
given: Stephen J.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1338-1346
id: abbasi-yadkori16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1338
lastpage: 1346
published: 2016-05-02 00:00:00 +0000
- title: 'Learning Sigmoid Belief Networks via Monte Carlo Expectation Maximization'
abstract: 'Belief networks are commonly used generative models of data, but require expensive posterior estimation to train and test the model. Learning typically proceeds by posterior sampling, variational approximations, or recognition networks, combined with stochastic optimization. We propose using an online Monte Carlo expectation-maximization (MCEM) algorithm to learn the maximum a posteriori (MAP) estimator of the generative model or optimize the variational lower bound of a recognition network. The E-step in this algorithm requires posterior samples, which are already generated in current learning schema. For the M-step, we augment with Polya-Gamma (PG) random variables to give an analytic updating scheme. We show relationships to standard learning approaches by deriving stochastic gradient ascent in the MCEM framework. We apply the proposed methods to both binary and count data. Experimental results show that MCEM improves the convergence speed and often improves hold-out performance over existing learning methods. Our approach is readily generalized to other recognition networks.'
volume: 51
URL: http://proceedings.mlr.press/v51/song16.html
PDF: http://proceedings.mlr.press/v51/song16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-song16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Song
given: Zhao
- family: Henao
given: Ricardo
- family: Carlson
given: David
- family: Carin
given: Lawrence
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1347-1355
id: song16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1347
lastpage: 1355
published: 2016-05-02 00:00:00 +0000
- title: 'Active Learning Algorithms for Graphical Model Selection'
abstract: 'The problem of learning the structure of a high dimensional graphical model from data has received considerable attention in recent years. In many applications such as sensor networks and proteomics it is often expensive to obtain samples from all the variables involved simultaneously. For instance, this might involve the synchronization of a large number of sensors or the tagging of a large number of proteins. To address this important issue, we initiate the study of a novel graphical model selection problem, where the goal is to optimize the total number of scalar samples obtained by allowing the collection of samples from only subsets of the variables. We propose a general paradigm for graphical model selection where feedback is used to guide the sampling to high degree vertices, while obtaining only few samples from the ones with the low degrees. We instantiate this framework with two specific active learning algorithms, one of which makes mild assumptions but is computationally expensive, while the other is computationally more efficient but requires stronger (nevertheless standard) assumptions. Whereas the sample complexity of passive algorithms is typically a function of the maximum degree of the graph, we show that the sample complexity of our algorithms is provably smaller and that it depends on a novel local complexity measure that is akin to the average degree of the graph. We finally demonstrate the efficacy of our framework via simulations.'
volume: 51
URL: http://proceedings.mlr.press/v51/dasarathy16.html
PDF: http://proceedings.mlr.press/v51/dasarathy16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-dasarathy16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Dasarathy
given: Gautamd
- family: Singh
given: Aarti
- family: Balcan
given: Maria-Florina
- family: Park
given: Jong H.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1356-1364
id: dasarathy16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1356
lastpage: 1364
published: 2016-05-02 00:00:00 +0000
- title: 'Streaming Kernel Principal Component Analysis'
abstract: 'Kernel principal component analysis (KPCA) provides a concise set of basis vectors which capture non-linear structures within large data sets, and is a central tool in data analysis and learning. To allow for non-linear relations, typically a full n x n kernel matrix is constructed over n data points, but this requires too much space and time for large values of n. Techniques such as the Nystrom method and random feature maps can help towards this goal, but they do not explicitly maintain the basis vectors in a stream and take more space than desired. We propose a new approach for streaming KPCA which maintains a small set of basis elements in a stream, requiring space only logarithmic in n, and also improves the dependence on the error parameter. Our technique combines together random feature maps with recent advances in matrix sketching, it has guaranteed spectral norm error bounds with respect to the original kernel matrix, and it compares favorably in practice to state-of-the-art approaches.'
volume: 51
URL: http://proceedings.mlr.press/v51/ghashami16.html
PDF: http://proceedings.mlr.press/v51/ghashami16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-ghashami16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ghashami
given: Mina
- family: Perry
given: Daniel J.
- family: Phillips
given: Jeff
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1365-1374
id: ghashami16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1365
lastpage: 1374
published: 2016-05-02 00:00:00 +0000
- title: 'Back to the Future: Radial Basis Function Networks Revisited'
abstract: 'Radial Basis Function (RBF) networks are a classical family of algorithms for supervised learning. The most popular approach for training RBF networks has relied on kernel methods using regularization based on a norm in a Reproducing Kernel Hilbert Space (RKHS), which is a principled and empirically successful framework. In this paper we aim to revisit some of the older approaches to training the RBF networks from a more modern perspective. Specifically, we analyze two common regularization procedures, one based on the square norm of the coefficients in the network and another on using centers obtained by k-means clustering. We show that both of these RBF methods can be recast as certain data-dependent kernels. We provide a theoretical analysis of these methods as well as a number of experimental results, pointing out very competitive experimental performance as well as certain advantages over the standard kernel methods in terms of both flexibility (incorporating of unlabeled data) and computational complexity. Finally, our results shed light on some impressive recent successes of using soft k-means features for image recognition and other tasks.'
volume: 51
URL: http://proceedings.mlr.press/v51/que16.html
PDF: http://proceedings.mlr.press/v51/que16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-que16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Que
given: Qichao
- family: Belkin
given: Mikhail
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1375-1383
id: que16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1375
lastpage: 1383
published: 2016-05-02 00:00:00 +0000
- title: 'Cut Pursuit: Fast Algorithms to Learn Piecewise Constant Functions'
abstract: 'We propose working-set/greedy algorithms to efficiently solve problems penalized respectively by the total variation and the Mumford Shah boundary size when the piecewise constant solutions has a small number of levelsets. Our algorithms exploit this structure by recursively splitting the level-sets using graph cuts. We obtain significant speed up on images that can be approximated with few levelsets compared to state-of-the-art algorithms.'
volume: 51
URL: http://proceedings.mlr.press/v51/landrieu16.html
PDF: http://proceedings.mlr.press/v51/landrieu16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-landrieu16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Landrieu
given: Loic
- family: Obozinski
given: Guillaume
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1384-1393
id: landrieu16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1384
lastpage: 1393
published: 2016-05-02 00:00:00 +0000
- title: 'Loss Bounds and Time Complexity for Speed Priors'
abstract: 'This paper establishes for the first time the predictive performance of speed priors and their computational complexity. A speed prior is essentially a probability distribution that puts low probability on strings that are not efficiently computable. We propose a variant to the original speed prior (Schmidhuber, 2002), and show that our prior can predict sequences drawn from probability measures that are estimable in polynomial time. Our speed prior is computable in doubly-exponential time, but not in polynomial time. On a polynomial time computable sequence our speed prior is computable in exponential time. We show that Schmidhuber’s speed prior has better complexity under the same conditions; however, the question of its predictive properties remains open.'
volume: 51
URL: http://proceedings.mlr.press/v51/filan16.html
PDF: http://proceedings.mlr.press/v51/filan16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-filan16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Filan
given: Daniel
- family: Leike
given: Jan
- family: Hutter
given: Marcus
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1394-1402
id: filan16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1394
lastpage: 1402
published: 2016-05-02 00:00:00 +0000
- title: 'NYTRO: When Subsampling Meets Early Stopping'
abstract: 'Early stopping is a well known approach to reduce the time complexity for performing training and model selection of large scale learning machines. On the other hand, memory/space (rather than time) complexity is the main constraint in many applications, and randomized subsampling techniques have been proposed to tackle this issue. In this paper we ask whether early stopping and subsampling ideas can be combined in a fruitful way. We consider the question in a least squares regression setting and propose a form of randomized iterative regularization based on early stopping and subsampling. In this context, we analyze the statistical and computational properties of the proposed method. Theoretical results are complemented and validated by a thorough experimental analysis.'
volume: 51
URL: http://proceedings.mlr.press/v51/camoriano16.html
PDF: http://proceedings.mlr.press/v51/camoriano16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-camoriano16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Camoriano
given: Raffaello
- family: Angles
given: Tomás
- family: Rudi
given: Alessandro
- family: Rosasco
given: Lorenzo
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1403-1411
id: camoriano16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1403
lastpage: 1411
published: 2016-05-02 00:00:00 +0000
- title: 'Randomization and The Pernicious Effects of Limited Budgets on Auction Experiments'
abstract: 'Buyers (e.g., advertisers) often have limited financial or processing resources, and so their participation in auctions is throttled. Changes to auctions may affect bids or throttling, and any change may affect what buyers pay. This paper shows that if an A/B experiment affects only bids, then the observed treatment effect is an unbiased estimator when all the bidders in the same auction are randomly assigned to A or B but the observed treatment effect can be severely biased otherwise, even in the absence of throttling. Experiments that affect throttling algorithms can also be badly biased, but the bias can be much reduced if separate budgets are maintained for the A and B arms of the experiment.'
volume: 51
URL: http://proceedings.mlr.press/v51/basse16b.html
PDF: http://proceedings.mlr.press/v51/basse16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-basse16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Basse
given: Guillaume W.
- family: Azari Soufiani
given: Hossein
- family: Lambert
given: Diane
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1412-1420
id: basse16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1412
lastpage: 1420
published: 2016-05-02 00:00:00 +0000
- title: 'Spectral M-estimation with Applications to Hidden Markov Models'
abstract: 'Method of moment estimators exhibit appealing statistical properties, such as asymptotic unbiasedness, for nonconvex problems. However, they typically require a large number of samples and are extremely sensitive to model misspecification. In this paper, we apply the framework of M-estimation to develop both a generalized method of moments procedure and a principled method for regularization. Our proposed M-estimator obtains optimal sample efficiency rates (in the class of moment-based estimators) and the same well-known rates on prediction accuracy as other spectral estimators. It also makes it straightforward to incorporate regularization into the sample moment conditions. We demonstrate empirically the gains in sample efficiency from our approach on hidden Markov models.'
volume: 51
URL: http://proceedings.mlr.press/v51/tran16.html
PDF: http://proceedings.mlr.press/v51/tran16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-tran16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tran
given: Dustin
- family: Kim
given: Minjae
- family: Doshi-Velez
given: Finale
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1421-1430
id: tran16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1421
lastpage: 1430
published: 2016-05-02 00:00:00 +0000
- title: 'Chained Gaussian Processes'
abstract: 'Gaussian process models are flexible, Bayesian non-parametric approaches to regression. Properties of multivariate Gaussians mean that they can be combined linearly in the manner of additive models and via a link function (like in generalized linear models) to handle non-Gaussian data. However, the link function formalism is restrictive, link functions are always invertible and must convert a parameter of interest to an linear combination of the underlying processes. There are many likelihoods and models where a non-linear combination is more appropriate. We term these more general models "Chained Gaussian Processes": the transformation of the GPs to the likelihood parameters will not generally be invertible, and that implies that linearisation would only be possible with multiple (localized) links, i.e a chain. We develop an approximate inference procedure for Chained GPs that is scalable and applicable to any factorized likelihood. We demonstrate the approximation on a range of likelihood functions.'
volume: 51
URL: http://proceedings.mlr.press/v51/saul16.html
PDF: http://proceedings.mlr.press/v51/saul16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-saul16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Saul
given: Alan D.
- family: Hensman
given: James
- family: Vehtari
given: Aki
- family: Lawrence
given: Neil D.
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1431-1440
id: saul16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1431
lastpage: 1440
published: 2016-05-02 00:00:00 +0000
- title: 'Multiresolution Matrix Compression'
abstract: 'Multiresolution Matrix Factorization (MMF) is a recently introduced method for finding multiscale structure and defining wavelets on graphs and matrices. MMF can also be used for matrix compression (sketching). However, the original MMF algorithm of (Kondor et al., 2014) scales with n^3 or worse (where n is the number of rows/columns in the matrix to be factorized) making it infeasible for large scale problems. In this paper we describe pMMF, a fast parallel MMF algorithm, which can scale to n in the range of millions. Our experimental results show that when used for matrix compression, pMMF often achieves much lower error than competing algorithms (especially on network data), yet for sparse matrices its running time scales close to linearly with n.'
volume: 51
URL: http://proceedings.mlr.press/v51/teneva16.html
PDF: http://proceedings.mlr.press/v51/teneva16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-teneva16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Teneva
given: Nedelina
- family: Mudrakarta
given: Pramod Kaushik
- family: Kondor
given: Risi
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1441-1449
id: teneva16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1441
lastpage: 1449
published: 2016-05-02 00:00:00 +0000
- title: 'Supervised Neighborhoods for Distributed Nonparametric Regression'
abstract: 'Techniques for nonparametric regression based on fitting small-scale local models at prediction time have long been studied in statistics and pattern recognition, but have received less attention in modern large-scale machine learning applications. In practice, such methods are generally applied to low-dimensional problems, but may falter with high-dimensional predictors if they use a Euclidean distance-based kernel. We propose a new method, SILO, for fitting prediction-time local models that uses supervised neighborhoods that adapt to the local shape of the regression surface. To learn such neighborhoods, we use a weight function between points derived from random forests. We prove the consistency of SILO, and demonstrate through simulations and real data that our method works well in both the serial and distributed settings. In the latter case, SILO learns the weighting function in a divide-and-conquer manner, entirely avoiding communication at training time.'
volume: 51
URL: http://proceedings.mlr.press/v51/bloniarz16.html
PDF: http://proceedings.mlr.press/v51/bloniarz16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-bloniarz16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bloniarz
given: Adam
- family: Talwalkar
given: Ameet
- family: Yu
given: Bin
- family: Wu
given: Christopher
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1450-1459
id: bloniarz16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1450
lastpage: 1459
published: 2016-05-02 00:00:00 +0000
- title: 'Global Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation'
abstract: 'It has been observed in a variety of contexts that gradient descent methods have great success in solving low-rank matrix factorization problems, despite the relevant problem formulation being non-convex. We tackle a particular instance of this scenario, where we seek the d-dimensional subspace spanned by a streaming data matrix. We apply the natural first order incremental gradient descent method, constraining the gradient method to the Grassmannian. In this paper, we propose an adaptive step size scheme that is greedy for the noiseless case, that maximizes the improvement of our metric of convergence at each data index t, and yields an expected improvement for the noisy case. We show that, with noise-free data, this method converges from any random initialization to the global minimum of the problem. For noisy data, we provide the expected convergence rate of the proposed algorithm per iteration.'
volume: 51
URL: http://proceedings.mlr.press/v51/zhang16b.html
PDF: http://proceedings.mlr.press/v51/zhang16b.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-zhang16b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhang
given: Dejiao
- family: Balzano
given: Laura
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1460-1468
id: zhang16b
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1460
lastpage: 1468
published: 2016-05-02 00:00:00 +0000
- title: 'Online and Distributed Bayesian Moment Matching for Parameter Learning in Sum-Product Networks'
abstract: 'Probabilistic graphical models provide a general and flexible framework for reasoning about complex dependencies in noisy domains with many variables. Among the various types of probabilistic graphical models, sum-product networks (SPNs) have recently generated some interest because exact inference can always be done in linear time with respect to the size of the network. This is particularly attractive since it means that learning an SPN from data always yields a tractable model for inference. However, existing parameter learning algorithms for SPNs operate in batch mode and do not scale easily to large datasets. In this work, we explore online algorithms to ensure that parameter learning can also be done tractably with respect to the amount of data. More specifically, we propose a new Bayesian moment matching (BMM) algorithm that operates naturally in an online fashion and that can be easily distributed. We demonstrate the effectiveness and scalability of BMM in comparison to online extensions of gradient descent, exponentiated gradient and expectation maximization on 20 classic benchmarks and 4 large scale datasets.'
volume: 51
URL: http://proceedings.mlr.press/v51/rashwan16.html
PDF: http://proceedings.mlr.press/v51/rashwan16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-rashwan16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Rashwan
given: Abdullah
- family: Zhao
given: Han
- family: Poupart
given: Pascal
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1469-1477
id: rashwan16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1469
lastpage: 1477
published: 2016-05-02 00:00:00 +0000
- title: 'Mondrian Forests for Large-Scale Regression when Uncertainty Matters'
abstract: 'Many real-world regression problems demand a measure of the uncertainty associated with each prediction. Standard decision forests deliver efficient state-of-the-art predictive performance, but high-quality uncertainty estimates are lacking. Gaussian processes (GPs) deliver uncertainty estimates, but scaling GPs to large-scale data sets comes at the cost of approximating the uncertainty estimates. We extend Mondrian forests, first proposed by Lakshminarayanan et al. (2014) for classification problems, to the large-scale nonparametric regression setting. Using a novel hierarchical Gaussian prior that dovetails with the Mondrian forest framework, we obtain principled uncertainty estimates, while still retaining the computational advantages of decision forests. Through a combination of illustrative examples, real-world large-scale datasets and Bayesian optimization benchmarks, we demonstrate that Mondrian forests outperform approximate GPs on large-scale regression tasks and deliver better-calibrated uncertainty assessments than decision-forest-based methods.'
volume: 51
URL: http://proceedings.mlr.press/v51/lakshminarayanan16.html
PDF: http://proceedings.mlr.press/v51/lakshminarayanan16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-lakshminarayanan16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lakshminarayanan
given: Balaji
- family: Roy
given: Daniel M.
- family: Teh
given: Yee Whye
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1478-1487
id: lakshminarayanan16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1478
lastpage: 1487
published: 2016-05-02 00:00:00 +0000
- title: 'Online (and Offline) Robust PCA: Novel Algorithms and Performance Guarantees'
abstract: 'In this work we develop and study a novel online robust principal components’ analysis (RPCA) algorithm based on the recently introduced ReProCS framework. Our algorithm significantly improves upon the original ReProCS algorithm and it also returns even more accurate offline estimates. The key contribution of this work is a correctness result for this algorithm under relatively mild assumptions. By using extra (but usually valid) assumptions we are able to remove one important limitation of batch RPCA results and two important limitations of a recent result for ReProCS for online RPCA. To the best of our knowledge, this work is among the first correctness results for online RPCA.'
volume: 51
URL: http://proceedings.mlr.press/v51/zhan16.html
PDF: http://proceedings.mlr.press/v51/zhan16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-zhan16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhan
given: Jinchun
- family: Lois
given: Brian
- family: Guo
given: Han
- family: Vaswani
given: Namrata
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1488-1496
id: zhan16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1488
lastpage: 1496
published: 2016-05-02 00:00:00 +0000
- title: 'Parallel Majorization Minimization with Dynamically Restricted Domains for Nonconvex Optimization'
abstract: 'We propose an optimization framework for nonconvex problems based on majorization-minimization that is particularity well-suited for parallel computing. It reduces the optimization of a high dimensional nonconvex objective function to successive optimizations of locally tight and convex upper bounds which are additively separable into low dimensional objectives. The original problem is then broken into simpler and parallel tasks, while guaranteeing the monotonic reduction of the original objective function and convergence to a local minimum. This framework also allows one to restrict the upper bound to a local dynamic convex domain, so that the bound is better matched to the local curvature of the objective function, resulting in accelerated convergence. We test the proposed framework on a nonconvex support vector machine based on a sigmoid loss function and on nonconvex penalized logistic regression.'
volume: 51
URL: http://proceedings.mlr.press/v51/kaganovsky16.html
PDF: http://proceedings.mlr.press/v51/kaganovsky16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-kaganovsky16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kaganovsky
given: Yan
- family: Odinaka
given: Ikenna
- family: Carlson
given: David
- family: Carin
given: Lawrence
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1497-1505
id: kaganovsky16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1497
lastpage: 1505
published: 2016-05-02 00:00:00 +0000
- title: 'Discriminative Structure Learning of Arithmetic Circuits'
abstract: 'The biggest limitation of probabilistic graphical models is the complexity of inference, which is often intractable. An appealing alternative is to use tractable probabilistic models, such as arithmetic circuits (ACs) and sum-product networks (SPNs), in which marginal and conditional queries can be answered efficiently. In this paper, we present the first discriminative structure learning algorithm for ACs, DACLearn (Discriminative AC Learner). Like previous work on generative structure learning, DACLearn finds a log-linear model with conjunctive features, using the size of an equivalent AC representation as a learning bias. Unlike previous work, DACLearn optimizes conditional likelihood, resulting in a more accurate conditional distribution. DACLearn also learns much more compact ACs than generative methods, since it does not need to represent a consistent distribution over the evidence variables. To ensure efficiency, DACLearn uses novel initialization and search heuristics to drastically reduce the number of feature evaluations required to learn an accurate model. In experiments on 20 benchmark domains, we find that our DACLearn learns models that are more accurate and compact than other tractable generative and discriminative methods.'
volume: 51
URL: http://proceedings.mlr.press/v51/rooshenas16.html
PDF: http://proceedings.mlr.press/v51/rooshenas16.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-rooshenas16.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Rooshenas
given: Amirmohammad
- family: Lowd
given: Daniel
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1506-1514
id: rooshenas16
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1506
lastpage: 1514
published: 2016-05-02 00:00:00 +0000
- title: 'One Scan 1-Bit Compressed Sensing'
abstract: 'Based on α-stable random projections with small α, we develop a simple algorithm for compressed sensing (sparse signal recovery) by utilizing only the signs (i.e., 1-bit) of the measurements. Using only 1-bit information of the measurements results in substantial cost reduction in collection, storage, communication, and decoding for compressed sensing. The proposed algorithm is efficient in that the decoding procedure requires only one scan of the coordinates. Our analysis can precisely show that, for a K-sparse signal of length N, 12.3K\log N/δmeasurements (where δis the confidence) would be sufficient for recovering the support and the signs of the signal. While the method is highly robust against typical measurement noises, we also provide the analysis of the scheme under random flipping of the signs of measurements. Compared to the well-known work on 1-bit marginal regression (which can also be viewed as a one-scan method), the proposed algorithm requires orders of magnitude fewer measurements. Compared to 1-bit Iterative Hard Thresholding (IHT) (which is not a one-scan algorithm), our method is still significantly more accurate. Furthermore, the proposed method is reasonably robust against random sign flipping while IHT is known to be very sensitive to this type of noise.'
volume: 51
URL: http://proceedings.mlr.press/v51/li16g.html
PDF: http://proceedings.mlr.press/v51/li16g.pdf
edit: https://github.com/mlresearch/v51/edit/gh-pages/_posts/2016-05-02-li16g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 19th International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Ping
editor:
- family: Gretton
given: Arthur
- family: Robert
given: Christian C.
address: Cadiz, Spain
page: 1515-1523
id: li16g
issued:
date-parts:
- 2016
- 5
- 2
firstpage: 1515
lastpage: 1523
published: 2016-05-02 00:00:00 +0000