- title: 'Stochastic Optimization with Importance Sampling for Regularized Loss Minimization'
abstract: 'Uniform sampling of training data has been commonly used in traditional stochastic optimization algorithms such as Proximal Stochastic Mirror Descent (prox-SMD) and Proximal Stochastic Dual Coordinate Ascent (prox-SDCA). Although uniform sampling can guarantee that the sampled stochastic quantity is an unbiased estimate of the corresponding true quantity, the resulting estimator may have a rather high variance, which negatively affects the convergence of the underlying optimization procedure. In this paper we study stochastic optimization, including prox-SMD and prox-SDCA, with importance sampling, which improves the convergence rate by reducing the stochastic variance. We theoretically analyze the algorithms and empirically validate their effectiveness.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhaoa15.html
PDF: http://proceedings.mlr.press/v37/zhaoa15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhaoa15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhao
given: Peilin
- family: Zhang
given: Tong
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1-9
id: zhaoa15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1
lastpage: 9
published: 2015-06-01 00:00:00 +0000
- title: 'Approval Voting and Incentives in Crowdsourcing'
abstract: 'The growing need for labeled training data has made crowdsourcing an important part of machine learning. The quality of crowdsourced labels is, however, adversely affected by three factors: (1) the workers are not experts; (2) the incentives of the workers are not aligned with those of the requesters; and (3) the interface does not allow workers to convey their knowledge accurately, by forcing them to make a single choice among a set of options. In this paper, we address these issues by introducing approval voting to utilize the expertise of workers who have partial knowledge of the true answer, and coupling it with a ("strictly proper") incentive-compatible compensation mechanism. We show rigorous theoretical guarantees of optimality of our mechanism together with a simple axiomatic characterization. We also conduct preliminary empirical studies on Amazon Mechanical Turk which validate our approach.'
volume: 37
URL: http://proceedings.mlr.press/v37/shaha15.html
PDF: http://proceedings.mlr.press/v37/shaha15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-shaha15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Shah
given: Nihar
- family: Zhou
given: Dengyong
- family: Peres
given: Yuval
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 10-19
id: shaha15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 10
lastpage: 19
published: 2015-06-01 00:00:00 +0000
- title: 'A low variance consistent test of relative dependency'
abstract: 'We describe a novel non-parametric statistical hypothesis test of relative dependence between a source variable and two candidate target variables. Such a test enables us to determine whether one source variable is significantly more dependent on a first target variable or a second. Dependence is measured via the Hilbert-Schmidt Independence Criterion (HSIC), resulting in a pair of empirical dependence measures (source-target 1, source-target 2). We test whether the first dependence measure is significantly larger than the second. Modeling the covariance between these HSIC statistics leads to a provably more powerful test than the construction of independent HSIC statistics by sub-sampling. The resulting test is consistent and unbiased, and (being based on U-statistics) has favorable convergence properties. The test can be computed in quadratic time, matching the computational complexity of standard empirical HSIC estimators. The effectiveness of the test is demonstrated on several real-world problems: we identify language groups from a multilingual corpus, and we prove that tumor location is more dependent on gene expression than chromosomal imbalances. Source code is available for download at https://github.com/wbounliphone/reldep/.'
volume: 37
URL: http://proceedings.mlr.press/v37/bounliphone15.html
PDF: http://proceedings.mlr.press/v37/bounliphone15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-bounliphone15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Bounliphone
given: Wacha
- family: Gretton
given: Arthur
- family: Tenenhaus
given: Arthur
- family: Blaschko
given: Matthew
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 20-29
id: bounliphone15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 20
lastpage: 29
published: 2015-06-01 00:00:00 +0000
- title: 'An Aligned Subtree Kernel for Weighted Graphs'
abstract: 'In this paper, we develop a new entropic matching kernel for weighted graphs by aligning depth-based representations. We demonstrate that this kernel can be seen as an \textbfaligned subtree kernel that incorporates explicit subtree correspondences, and thus addresses the drawback of neglecting the relative locations between substructures that arises in the R-convolution kernels. Experiments on standard datasets demonstrate that our kernel can easily outperform state-of-the-art graph kernels in terms of classification accuracy.'
volume: 37
URL: http://proceedings.mlr.press/v37/bai15.html
PDF: http://proceedings.mlr.press/v37/bai15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-bai15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Bai
given: Lu
- family: Rossi
given: Luca
- family: Zhang
given: Zhihong
- family: Hancock
given: Edwin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 30-39
id: bai15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 30
lastpage: 39
published: 2015-06-01 00:00:00 +0000
- title: 'Spectral Clustering via the Power Method - Provably'
abstract: 'Spectral clustering is one of the most important algorithms in data mining and machine intelligence; however, its computational complexity limits its application to truly large scale data analysis. The computational bottleneck in spectral clustering is computing a few of the top eigenvectors of the (normalized) Laplacian matrix corresponding to the graph representing the data to be clustered. One way to speed up the computation of these eigenvectors is to use the “power method” from the numerical linear algebra literature. Although the power method has been empirically used to speed up spectral clustering, the theory behind this approach, to the best of our knowledge, remains unexplored. This paper provides the first such rigorous theoretical justification, arguing that a small number of power iterations suffices to obtain near-optimal partitionings using the approximate eigenvectors. Specifically, we prove that solving the k-means clustering problem on the approximate eigenvectors obtained via the power method gives an additive-error approximation to solving the k-means problem on the optimal eigenvectors.'
volume: 37
URL: http://proceedings.mlr.press/v37/boutsidis15.html
PDF: http://proceedings.mlr.press/v37/boutsidis15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-boutsidis15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Boutsidis
given: Christos
- family: Kambadur
given: Prabhanjan
- family: Gittens
given: Alex
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 40-48
id: boutsidis15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 40
lastpage: 48
published: 2015-06-01 00:00:00 +0000
- title: 'Information Geometry and Minimum Description Length Networks'
abstract: 'We study parametric unsupervised mixture learning. We measure the loss of intrinsic information from the observations to complex mixture models, and then to simple mixture models. We present a geometric picture, where all these representations are regarded as free points in the space of probability distributions. Based on minimum description length, we derive a simple geometric principle to learn all these models together. We present a new learning machine with theories, algorithms, and simulations.'
volume: 37
URL: http://proceedings.mlr.press/v37/suna15.html
PDF: http://proceedings.mlr.press/v37/suna15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-suna15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sun
given: Ke
- family: Wang
given: Jun
- family: Kalousis
given: Alexandros
- family: Marchand-Maillet
given: Stephan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 49-58
id: suna15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 49
lastpage: 58
published: 2015-06-01 00:00:00 +0000
- title: 'Efficient Training of LDA on a GPU by Mean-for-Mode Estimation'
abstract: 'We introduce Mean-for-Mode estimation, a variant of an uncollapsed Gibbs sampler that we use to train LDA on a GPU. The algorithm combines benefits of both uncollapsed and collapsed Gibbs samplers. Like a collapsed Gibbs sampler — and unlike an uncollapsed Gibbs sampler — it has good statistical performance, and can use sampling complexity reduction techniques such as sparsity. Meanwhile, like an uncollapsed Gibbs sampler — and unlike a collapsed Gibbs sampler — it is embarrassingly parallel, and can use approximate counters.'
volume: 37
URL: http://proceedings.mlr.press/v37/tristan15.html
PDF: http://proceedings.mlr.press/v37/tristan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-tristan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Tristan
given: Jean-Baptiste
- family: Tassarotti
given: Joseph
- family: Steele
given: Guy
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 59-68
id: tristan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 59
lastpage: 68
published: 2015-06-01 00:00:00 +0000
- title: 'Adaptive Stochastic Alternating Direction Method of Multipliers'
abstract: 'The Alternating Direction Method of Multipliers (ADMM) has been studied for years. Traditional ADMM algorithms need to compute, at each iteration, an (empirical) expected loss function on all training examples, resulting in a computational complexity proportional to the number of training examples. To reduce the complexity, stochastic ADMM algorithms were proposed to replace the expected loss function with a random loss function associated with one uniformly drawn example plus a Bregman divergence term. The Bregman divergence, however, is derived from a simple 2nd-order proximal function, i.e., the half squared norm, which could be a suboptimal choice. In this paper, we present a new family of stochastic ADMM algorithms with optimal 2nd-order proximal functions, which produce a new family of adaptive stochastic ADMM methods. We theoretically prove that the regret bounds are as good as the bounds which could be achieved by the best proximal function that can be chosen in hindsight. Encouraging empirical results on a variety of real-world datasets confirm the effectiveness and efficiency of the proposed algorithms.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhaob15.html
PDF: http://proceedings.mlr.press/v37/zhaob15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhaob15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhao
given: Peilin
- family: Yang
given: Jinwei
- family: Zhang
given: Tong
- family: Li
given: Ping
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 69-77
id: zhaob15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 69
lastpage: 77
published: 2015-06-01 00:00:00 +0000
- title: 'A Lower Bound for the Optimization of Finite Sums'
abstract: 'This paper presents a lower bound for optimizing a finite sum of n functions, where each function is L-smooth and the sum is μ-strongly convex. We show that no algorithm can reach an error εin minimizing all functions from this class in fewer than Ω(n + \sqrtn(κ-1)\log(1/ε)) iterations, where κ=L/μis a surrogate condition number. We then compare this lower bound to upper bounds for recently developed methods specializing to this setting. When the functions involved in this sum are not arbitrary, but based on i.i.d. random data, then we further contrast these complexity results with those for optimal first-order methods to directly optimize the sum. The conclusion we draw is that a lot of caution is necessary for an accurate comparison, and identify machine learning scenarios where the new methods help computationally.'
volume: 37
URL: http://proceedings.mlr.press/v37/agarwal15.html
PDF: http://proceedings.mlr.press/v37/agarwal15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-agarwal15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Agarwal
given: Alekh
- family: Bottou
given: Leon
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 78-86
id: agarwal15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 78
lastpage: 86
published: 2015-06-01 00:00:00 +0000
- title: 'Learning Word Representations with Hierarchical Sparse Coding'
abstract: 'We propose a new method for learning word representations using hierarchical regularization in sparse coding inspired by the linguistic study of word meanings. We show an efficient learning algorithm based on stochastic proximal methods that is significantly faster than previous approaches, making it possible to perform hierarchical sparse coding on a corpus of billions of word tokens. Experiments on various benchmark tasks—word similarity ranking, syntactic and semantic analogies, sentence completion, and sentiment analysis—demonstrate that the method outperforms or is competitive with state-of-the-art methods.'
volume: 37
URL: http://proceedings.mlr.press/v37/yogatama15.html
PDF: http://proceedings.mlr.press/v37/yogatama15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yogatama15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yogatama
given: Dani
- family: Faruqui
given: Manaal
- family: Dyer
given: Chris
- family: Smith
given: Noah
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 87-96
id: yogatama15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 87
lastpage: 96
published: 2015-06-01 00:00:00 +0000
- title: 'Learning Transferable Features with Deep Adaptation Networks'
abstract: 'Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks for domain adaptation. However, as deep features eventually transition from general to specific along the network, the feature transferability drops significantly in higher layers with increasing domain discrepancy. Hence, it is important to formally reduce the dataset bias and enhance the transferability in task-specific layers. In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. In DAN, hidden representations of all task-specific layers are embedded in a reproducing kernel Hilbert space where the mean embeddings of different domain distributions can be explicitly matched. The domain discrepancy is further reduced using an optimal multi-kernel selection method for mean embedding matching. DAN can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding. Extensive empirical evidence shows that the proposed architecture yields state-of-the-art image classification error rates on standard domain adaptation benchmarks.'
volume: 37
URL: http://proceedings.mlr.press/v37/long15.html
PDF: http://proceedings.mlr.press/v37/long15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-long15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Long
given: Mingsheng
- family: Cao
given: Yue
- family: Wang
given: Jianmin
- family: Jordan
given: Michael
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 97-105
id: long15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 97
lastpage: 105
published: 2015-06-01 00:00:00 +0000
- title: 'Robust partially observable Markov decision process'
abstract: 'We seek to find the robust policy that maximizes the expected cumulative reward for the worst case when a partially observable Markov decision process (POMDP) has uncertain parameters whose values are only known to be in a given region. We prove that the robust value function, which represents the expected cumulative reward that can be obtained with the robust policy, is convex with respect to the belief state. Based on the convexity, we design a value-iteration algorithm for finding the robust policy. We prove that our value iteration converges for an infinite horizon. We also design point-based value iteration for fining the robust policy more efficiency possibly with approximation. Numerical experiments show that our point-based value iteration can adequately find robust policies.'
volume: 37
URL: http://proceedings.mlr.press/v37/osogami15.html
PDF: http://proceedings.mlr.press/v37/osogami15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-osogami15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Osogami
given: Takayuki
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 106-115
id: osogami15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 106
lastpage: 115
published: 2015-06-01 00:00:00 +0000
- title: 'On the Relationship between Sum-Product Networks and Bayesian Networks'
abstract: 'In this paper, we establish some theoretical connections between Sum-Product Networks (SPNs) and Bayesian Networks (BNs). We prove that every SPN can be converted into a BN in linear time and space in terms of the network size. The key insight is to use Algebraic Decision Diagrams (ADDs) to compactly represent the local conditional probability distributions at each node in the resulting BN by exploiting context-specific independence (CSI). The generated BN has a simple directed bipartite graphical structure. We show that by applying the Variable Elimination algorithm (VE) to the generated BN with ADD representations, we can recover the original SPN where the SPN can be viewed as a history record or caching of the VE inference process. To help state the proof clearly, we introduce the notion of \em normal SPN and present a theoretical analysis of the consistency and decomposability properties. We conclude the paper with some discussion of the implications of the proof and establish a connection between the depth of an SPN and a lower bound of the tree-width of its corresponding BN.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhaoc15.html
PDF: http://proceedings.mlr.press/v37/zhaoc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhaoc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhao
given: Han
- family: Melibari
given: Mazen
- family: Poupart
given: Pascal
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 116-124
id: zhaoc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 116
lastpage: 124
published: 2015-06-01 00:00:00 +0000
- title: 'Learning from Corrupted Binary Labels via Class-Probability Estimation'
abstract: 'Many supervised learning problems involve learning from samples whose labels are corrupted in some way. For example, each sample may have some constant probability of being incorrectly labelled (learning with label noise), or one may have a pool of unlabelled samples in lieu of negative samples (learning from positive and unlabelled data). This paper uses class-probability estimation to study these and other corruption processes belonging to the mutually contaminated distributions framework (Scott et al., 2013), with three conclusions. First, one can optimise balanced error and AUC without knowledge of the corruption process parameters. Second, given estimates of the corruption parameters, one can minimise a range of classification risks. Third, one can estimate the corruption parameters using only corrupted data. Experiments confirm the efficacy of class-probability estimation in learning from corrupted labels.'
volume: 37
URL: http://proceedings.mlr.press/v37/menon15.html
PDF: http://proceedings.mlr.press/v37/menon15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-menon15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Menon
given: Aditya
- family: Rooyen
given: Brendan Van
- family: Ong
given: Cheng Soon
- family: Williamson
given: Bob
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 125-134
id: menon15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 125
lastpage: 134
published: 2015-06-01 00:00:00 +0000
- title: 'An Explicit Sampling Dependent Spectral Error Bound for Column Subset Selection'
abstract: 'In this paper, we consider the problem of column subset selection. We present a novel analysis of the spectral norm reconstruction for a simple randomized algorithm and establish a new bound that depends explicitly on the sampling probabilities. The sampling dependent error bound (i) allows us to better understand the tradeoff in the reconstruction error due to sampling probabilities, (ii) exhibits more insights than existing error bounds that exploit specific probability distributions, and (iii) implies better sampling distributions. In particular, we show that a sampling distribution with probabilities proportional to the square root of the statistical leverage scores is better than uniform sampling, and is better than leverage-based sampling when the statistical leverage scores are very nonuniform. And by solving a constrained optimization problem related to the error bound with an efficient bisection search we are able to achieve better performance than using either the leverage-based distribution or that proportional to the square root of the statistical leverage scores. Numerical simulations demonstrate the benefits of the new sampling distributions for low-rank matrix approximation and least square approximation compared to state-of-the art algorithms.'
volume: 37
URL: http://proceedings.mlr.press/v37/yanga15.html
PDF: http://proceedings.mlr.press/v37/yanga15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yanga15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yang
given: Tianbao
- family: Zhang
given: Lijun
- family: Jin
given: Rong
- family: Zhu
given: Shenghuo
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 135-143
id: yanga15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 135
lastpage: 143
published: 2015-06-01 00:00:00 +0000
- title: 'A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate'
abstract: 'We describe and analyze a simple algorithm for principal component analysis and singular value decomposition, VR-PCA, which uses computationally cheap stochastic iterations, yet converges exponentially fast to the optimal solution. In contrast, existing algorithms suffer either from slow convergence, or computationally intensive iterations whose runtime scales with the data size. The algorithm builds on a recent variance-reduced stochastic gradient technique, which was previously analyzed for strongly convex optimization, whereas here we apply it to an inherently non-convex problem, using a very different analysis.'
volume: 37
URL: http://proceedings.mlr.press/v37/shamir15.html
PDF: http://proceedings.mlr.press/v37/shamir15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-shamir15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Shamir
given: Ohad
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 144-152
id: shamir15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 144
lastpage: 152
published: 2015-06-01 00:00:00 +0000
- title: 'Attribute Efficient Linear Regression with Distribution-Dependent Sampling'
abstract: 'We consider a budgeted learning setting, where the learner can only choose and observe a small subset of the attributes of each training example. We develop efficient algorithms for Ridge and Lasso linear regression, which utilize the geometry of the data by a novel distribution-dependent sampling scheme, and have excess risk bounds which are better a factor of up to O(d/k) over the state-of-the-art, where d is the dimension and k+1 is the number of observed attributes per example. Moreover, under reasonable assumptions, our algorithms are the first in our setting which can provably use *less* attributes than full-information algorithms, which is the main concern in budgeted learning. We complement our theoretical analysis with experiments which support our claims.'
volume: 37
URL: http://proceedings.mlr.press/v37/kukliansky15.html
PDF: http://proceedings.mlr.press/v37/kukliansky15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-kukliansky15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Kukliansky
given: Doron
- family: Shamir
given: Ohad
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 153-161
id: kukliansky15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 153
lastpage: 161
published: 2015-06-01 00:00:00 +0000
- title: 'Learning Local Invariant Mahalanobis Distances'
abstract: 'For many tasks and data types, there are natural transformations to which the data should be invariant or insensitive. For instance, in visual recognition, natural images should be insensitive to rotation and translation. This requirement and its implications have been important in many machine learning applications, and tolerance for image transformations was primarily achieved by using robust feature vectors. In this paper we propose a novel and computationally efficient way to learn a local Mahalanobis metric per datum, and show how we can learn a local invariant metric to any transformation in order to improve performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/fetaya15.html
PDF: http://proceedings.mlr.press/v37/fetaya15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-fetaya15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Fetaya
given: Ethan
- family: Ullman
given: Shimon
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 162-168
id: fetaya15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 162
lastpage: 168
published: 2015-06-01 00:00:00 +0000
- title: 'Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis'
abstract: 'Canonical Correlation Analysis (CCA) is a widely used spectral technique for finding correlation structures in multi-view datasets. In this paper, we tackle the problem of large scale CCA, where classical algorithms, usually requiring computing the product of two huge matrices and huge matrix decomposition, are computationally and storage expensive. We recast CCA from a novel perspective and propose a scalable and memory efficient \textitAugmented Approximate Gradient (AppGrad) scheme for finding top k dimensional canonical subspace which only involves large matrix multiplying a thin matrix of width k and small matrix decomposition of dimension k\times k. Further, \textitAppGrad achieves optimal storage complexity O(k(p_1+p_2)), compared with classical algorithms which usually require O(p_1^2+p_2^2) space to store two dense whitening matrices. The proposed scheme naturally generalizes to stochastic optimization regime, especially efficient for huge datasets where batch algorithms are prohibitive. The online property of stochastic \textitAppGrad is also well suited to the streaming scenario, where data comes sequentially. To the best of our knowledge, it is the first stochastic algorithm for CCA. Experiments on four real data sets are provided to show the effectiveness of the proposed methods.'
volume: 37
URL: http://proceedings.mlr.press/v37/maa15.html
PDF: http://proceedings.mlr.press/v37/maa15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-maa15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ma
given: Zhuang
- family: Lu
given: Yichao
- family: Foster
given: Dean
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 169-178
id: maa15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 169
lastpage: 178
published: 2015-06-01 00:00:00 +0000
- title: 'Abstraction Selection in Model-based Reinforcement Learning'
abstract: 'State abstractions are often used to reduce the complexity of model-based reinforcement learning when only limited quantities of data are available. However, choosing the appropriate level of abstraction is an important problem in practice. Existing approaches have theoretical guarantees only under strong assumptions on the domain or asymptotically large amounts of data, but in this paper we propose a simple algorithm based on statistical hypothesis testing that comes with a finite-sample guarantee under assumptions on candidate abstractions. Our algorithm trades off the low approximation error of finer abstractions against the low estimation error of coarser abstractions, resulting in a loss bound that depends only on the quality of the best available abstraction and is polynomial in planning horizon.'
volume: 37
URL: http://proceedings.mlr.press/v37/jiang15.html
PDF: http://proceedings.mlr.press/v37/jiang15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-jiang15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Jiang
given: Nan
- family: Kulesza
given: Alex
- family: Singh
given: Satinder
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 179-188
id: jiang15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 179
lastpage: 188
published: 2015-06-01 00:00:00 +0000
- title: 'Surrogate Functions for Maximizing Precision at the Top'
abstract: 'The problem of maximizing precision at the top of a ranked list, often dubbed Precision@k (prec@k), finds relevance in myriad learning applications such as ranking, multi-label classification, and learning with severe label imbalance. However, despite its popularity, there exist significant gaps in our understanding of this problem and its associated performance measure. The most notable of these is the lack of a convex upper bounding surrogate for prec@k. We also lack scalable perceptron and stochastic gradient descent algorithms for optimizing this performance measure. In this paper we make key contributions in these directions. At the heart of our results is a family of truly upper bounding surrogates for prec@k. These surrogates are motivated in a principled manner and enjoy attractive properties such as consistency to prec@k under various natural margin/noise conditions. These surrogates are then used to design a class of novel perceptron algorithms for optimizing prec@k with provable mistake bounds. We also devise scalable stochastic gradient descent style methods for this problem with provable convergence bounds. Our proofs rely on novel uniform convergence bounds which require an in-depth analysis of the structural properties of prec@k and its surrogates. We conclude with experimental results comparing our algorithms with state-of-the-art cutting plane and stochastic gradient algorithms for maximizing prec@k.'
volume: 37
URL: http://proceedings.mlr.press/v37/kar15.html
PDF: http://proceedings.mlr.press/v37/kar15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-kar15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Kar
given: Purushottam
- family: Narasimhan
given: Harikrishna
- family: Jain
given: Prateek
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 189-198
id: kar15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 189
lastpage: 198
published: 2015-06-01 00:00:00 +0000
- title: 'Optimizing Non-decomposable Performance Measures: A Tale of Two Classes'
abstract: 'Modern classification problems frequently present mild to severe label imbalance as well as specific requirements on classification characteristics, and require optimizing performance measures that are non-decomposable over the dataset, such as F-measure. Such measures have spurred much interest and pose specific challenges to learning algorithms since their non-additive nature precludes a direct application of well-studied large scale optimization methods such as stochastic gradient descent. In this paper we reveal that for two large families of performance measures that can be expressed as functions of true positive/negative rates, it is indeed possible to implement point stochastic updates. The families we consider are concave and pseudo-linear functions of TPR, TNR which cover several popularly used performance measures such as F-measure, G-mean and H-mean. Our core contribution is an adaptive linearization scheme for these families, using which we develop optimization techniques that enable truly point-based stochastic updates. For concave performance measures we propose SPADE, a stochastic primal dual solver; for pseudo-linear measures we propose STAMP, a stochastic alternate maximization procedure. Both methods have crisp convergence guarantees, demonstrate significant speedups over existing methods - often by an order of magnitude or more, and give similar or more accurate predictions on test data.'
volume: 37
URL: http://proceedings.mlr.press/v37/narasimhana15.html
PDF: http://proceedings.mlr.press/v37/narasimhana15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-narasimhana15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Narasimhan
given: Harikrishna
- family: Kar
given: Purushottam
- family: Jain
given: Prateek
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 199-208
id: narasimhana15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 199
lastpage: 208
published: 2015-06-01 00:00:00 +0000
- title: 'Coresets for Nonparametric Estimation - the Case of DP-Means'
abstract: 'Scalable training of Bayesian nonparametric models is a notoriously difficult challenge. We explore the use of coresets - a data summarization technique originating from computational geometry - for this task. Coresets are weighted subsets of the data such that models trained on these coresets are provably competitive with models trained on the full dataset. Coresets sublinear in the dataset size allow for fast approximate inference with provable guarantees. Existing constructions, however, are limited to parametric problems. Using novel techniques in coreset construction we show the existence of coresets for DP-Means - a prototypical nonparametric clustering problem - and provide a practical construction algorithm. We empirically demonstrate that our algorithm allows us to efficiently trade off computation time and approximation error and thus scale DP-Means to large datasets. For instance, with coresets we can obtain a computational speedup of 45x at an approximation error of only 2.4% compared to solving on the full data set. In contrast, for the same subsample size, the “naive” approach of uniformly subsampling the data incurs an approximation error of 22.5%.'
volume: 37
URL: http://proceedings.mlr.press/v37/bachem15.html
PDF: http://proceedings.mlr.press/v37/bachem15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-bachem15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Bachem
given: Olivier
- family: Lucic
given: Mario
- family: Krause
given: Andreas
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 209-217
id: bachem15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 209
lastpage: 217
published: 2015-06-01 00:00:00 +0000
- title: 'A Relative Exponential Weighing Algorithm for Adversarial Utility-based Dueling Bandits'
abstract: 'We study the K-armed dueling bandit problem which is a variation of the classical Multi-Armed Bandit (MAB) problem in which the learner receives only relative feedback about the selected pairs of arms. We propose a new algorithm called Relative Exponential-weight algorithm for Exploration and Exploitation (REX3) to handle the adversarial utility-based formulation of this problem. This algorithm is a non-trivial extension of the Exponential-weight algorithm for Exploration and Exploitation (EXP3) algorithm. We prove a finite time expected regret upper bound of order O(sqrt(K ln(K)T)) for this algorithm and a general lower bound of order omega(sqrt(KT)). At the end, we provide experimental results using real data from information retrieval applications.'
volume: 37
URL: http://proceedings.mlr.press/v37/gajane15.html
PDF: http://proceedings.mlr.press/v37/gajane15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gajane15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gajane
given: Pratik
- family: Urvoy
given: Tanguy
- family: Clérot
given: Fabrice
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 218-227
id: gajane15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 218
lastpage: 227
published: 2015-06-01 00:00:00 +0000
- title: 'Functional Subspace Clustering with Application to Time Series'
abstract: 'Functional data, where samples are random functions, are increasingly common and important in a variety of applications, such as health care and traffic analysis. They are naturally high dimensional and lie along complex manifolds. These properties warrant use of the subspace assumption, but most state-of-the-art subspace learning algorithms are limited to linear or other simple settings. To address these challenges, we propose a new framework called Functional Subspace Clustering (FSC). FSC assumes that functional samples lie in deformed linear subspaces and formulates the subspace learning problem as a sparse regression over operators. The resulting problem can be efficiently solved via greedy variable selection, given access to a fast deformation oracle. We provide theoretical guarantees for FSC and show how it can be applied to time series with warped alignments. Experimental results on both synthetic data and real clinical time series show that FSC outperforms both standard time series clustering and state-of-the-art subspace clustering.'
volume: 37
URL: http://proceedings.mlr.press/v37/bahadori15.html
PDF: http://proceedings.mlr.press/v37/bahadori15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-bahadori15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Bahadori
given: Mohammad Taha
- family: Kale
given: David
- family: Fan
given: Yingying
- family: Liu
given: Yan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 228-237
id: bahadori15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 228
lastpage: 237
published: 2015-06-01 00:00:00 +0000
- title: 'Accelerated Online Low Rank Tensor Learning for Multivariate Spatiotemporal Streams'
abstract: 'Low-rank tensor learning has many applications in machine learning. A series of batch learning algorithms have achieved great successes. However, in many emerging applications, such as climate data analysis, we are confronted with large-scale tensor streams, which poses significant challenges to existing solution in terms of computational costs and limited response time. In this paper, we propose an online accelerated low-rank tensor learning algorithm (ALTO) to solve the problem. At each iteration, we project the current tensor to the subspace of low-rank tensors in order to perform efficient tensor decomposition, then recover the decomposition of the new tensor. By randomly glancing at additional subspaces, we successfully avoid local optima at negligible extra computational cost. We evaluate our method on two tasks in streaming multivariate spatio-temporal analysis: online forecasting and multi-model ensemble, which shows that our method achieves comparable predictive accuracy with significant boost in run time.'
volume: 37
URL: http://proceedings.mlr.press/v37/yua15.html
PDF: http://proceedings.mlr.press/v37/yua15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yua15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yu
given: Rose
- family: Cheng
given: Dehua
- family: Liu
given: Yan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 238-247
id: yua15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 238
lastpage: 247
published: 2015-06-01 00:00:00 +0000
- title: 'Atomic Spatial Processes'
abstract: 'The emergence of compact GPS systems and the establishment of open data initiatives has resulted in widespread availability of spatial data for many urban centres. These data can be leveraged to develop data-driven intelligent resource allocation systems for urban issues such as policing, sanitation, and transportation. We employ techniques from Bayesian non-parametric statistics to develop a process which captures a common characteristic of urban spatial datasets. Specifically, our new spatial process framework models events which occur repeatedly at discrete spatial points, the number and locations of which are unknown a priori. We develop a representation of our spatial process which facilitates posterior simulation, resulting in an interpretable and computationally tractable model. The framework’s superiority over both empirical grid-based models and Dirichlet process mixture models is demonstrated by fitting, interpreting, and comparing models of graffiti prevalence for both downtown Vancouver and Manhattan.'
volume: 37
URL: http://proceedings.mlr.press/v37/jewell15.html
PDF: http://proceedings.mlr.press/v37/jewell15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-jewell15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Jewell
given: Sean
- family: Spencer
given: Neil
- family: Bouchard-Côté
given: Alexandre
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 248-256
id: jewell15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 248
lastpage: 256
published: 2015-06-01 00:00:00 +0000
- title: 'Classification with Low Rank and Missing Data'
abstract: 'We consider classification and regression tasks where we have missing data and assume that the (clean) data resides in a low rank subspace. Finding a hidden subspace is known to be computationally hard. Nevertheless, using a non-proper formulation we give an efficient agnostic algorithm that classifies as good as the best linear classifier coupled with the best low-dimensional subspace in which the data resides. A direct implication is that our algorithm can linearly (and non-linearly through kernels) classify provably as well as the best classifier that has access to the full data.'
volume: 37
URL: http://proceedings.mlr.press/v37/hazan15.html
PDF: http://proceedings.mlr.press/v37/hazan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hazan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hazan
given: Elad
- family: Livni
given: Roi
- family: Mansour
given: Yishay
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 257-266
id: hazan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 257
lastpage: 266
published: 2015-06-01 00:00:00 +0000
- title: 'Dynamic Sensing: Better Classification under Acquisition Constraints'
abstract: 'In many machine learning applications the quality of the data is limited by resource constraints (may it be power, bandwidth, memory, ...). In such cases, the constraints are on the average resources allocated, therefore there is some control over each sample’s quality. In most cases this option remains unused and the data’s quality is uniform over the samples. In this paper we propose to actively allocate resources to each sample such that resources are used optimally overall. We propose a method to compute the optimal resource allocation. We further derive generalization bounds for the case where the problem’s model is unknown. We demonstrate the potential benefit of this approach on both simulated and real-life problems.'
volume: 37
URL: http://proceedings.mlr.press/v37/richman15.html
PDF: http://proceedings.mlr.press/v37/richman15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-richman15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Richman
given: Oran
- family: Mannor
given: Shie
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 267-275
id: richman15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 267
lastpage: 275
published: 2015-06-01 00:00:00 +0000
- title: 'A Modified Orthant-Wise Limited Memory Quasi-Newton Method with Convergence Analysis'
abstract: 'The Orthant-Wise Limited memory Quasi-Newton (OWL-QN) method has been demonstrated to be very effective in solving the \ell_1-regularized sparse learning problem. OWL-QN extends the L-BFGS from solving unconstrained smooth optimization problems to \ell_1-regularized (non-smooth) sparse learning problems. At each iteration, OWL-QN does not involve any \ell_1-regularized quadratic optimization subproblem and only requires matrix-vector multiplications without an explicit use of the (inverse) Hessian matrix, which enables OWL-QN to tackle large-scale problems efficiently. Although many empirical studies have demonstrated that OWL-QN works quite well in practice, several recent papers point out that the existing convergence proof of OWL-QN is flawed and a rigorous convergence analysis for OWL-QN still remains to be established. In this paper, we propose a modified Orthant-Wise Limited memory Quasi-Newton (mOWL-QN) algorithm by slightly modifying the OWL-QN algorithm. As the main technical contribution of this paper, we establish a rigorous convergence proof for the mOWL-QN algorithm. To the best of our knowledge, our work fills the theoretical gap by providing the first rigorous convergence proof for the OWL-QN-type algorithm on solving \ell_1-regularized sparse learning problems. We also provide empirical studies to show that mOWL-QN works well and is as efficient as OWL-QN.'
volume: 37
URL: http://proceedings.mlr.press/v37/gonga15.html
PDF: http://proceedings.mlr.press/v37/gonga15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gonga15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gong
given: Pinghua
- family: Ye
given: Jieping
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 276-284
id: gonga15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 276
lastpage: 284
published: 2015-06-01 00:00:00 +0000
- title: 'Telling cause from effect in deterministic linear dynamical systems'
abstract: 'Telling a cause from its effect using observed time series data is a major challenge in natural and social sciences. Assuming the effect is generated by the cause through a linear system, we propose a new approach based on the hypothesis that nature chooses the “cause” and the “mechanism generating the effect from the cause” independently of each other. Specifically we postulate that the power spectrum of the “cause” time series is uncorrelated with the square of the frequency response of the linear filter (system) generating the effect. While most causal discovery methods for time series mainly rely on the noise, our method relies on asymmetries of the power spectral density properties that exist even in deterministic systems. We describe mathematical assumptions in a deterministic model under which the causal direction is identifiable. In particular, we show a scenario where the method works but Granger causality fails. Experiments show encouraging results on synthetic as well as real-world data. Overall, this suggests that the postulate of Independence of Cause and Mechanism is a promising principle for causal inference on observed time series.'
volume: 37
URL: http://proceedings.mlr.press/v37/shajarisales15.html
PDF: http://proceedings.mlr.press/v37/shajarisales15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-shajarisales15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Shajarisales
given: Naji
- family: Janzing
given: Dominik
- family: Schoelkopf
given: Bernhard
- family: Besserve
given: Michel
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 285-294
id: shajarisales15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 285
lastpage: 294
published: 2015-06-01 00:00:00 +0000
- title: 'High Dimensional Bayesian Optimisation and Bandits via Additive Models'
abstract: 'Bayesian Optimisation (BO) is a technique used in optimising a D-dimensional function which is typically expensive to evaluate. While there have been many successes for BO in low dimensions, scaling it to high dimensions has been notoriously difficult. Existing literature on the topic are under very restrictive settings. In this paper, we identify two key challenges in this endeavour. We tackle these challenges by assuming an additive structure for the function. This setting is substantially more expressive and contains a richer class of functions than previous work. We prove that, for additive functions the regret has only linear dependence on D even though the function depends on all D dimensions. We also demonstrate several other statistical and computational benefits in our framework. Via synthetic examples, a scientific simulation and a face detection problem we demonstrate that our method outperforms naive BO on additive functions and on several examples where the function is not additive.'
volume: 37
URL: http://proceedings.mlr.press/v37/kandasamy15.html
PDF: http://proceedings.mlr.press/v37/kandasamy15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-kandasamy15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Kandasamy
given: Kirthevasan
- family: Schneider
given: Jeff
- family: Poczos
given: Barnabas
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 295-304
id: kandasamy15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 295
lastpage: 304
published: 2015-06-01 00:00:00 +0000
- title: 'Theory of Dual-sparse Regularized Randomized Reduction'
abstract: 'In this paper, we study randomized reduction methods, which reduce high-dimensional features into low-dimensional space by randomized methods (e.g., random projection, random hashing), for large-scale high-dimensional classification. Previous theoretical results on randomized reduction methods hinge on strong assumptions about the data, e.g., low rank of the data matrix or a large separable margin of classification, which hinder their in broad domains. To address these limitations, we propose dual-sparse regularized randomized reduction methods that introduce a sparse regularizer into the reduced dual problem. Under a mild condition that the original dual solution is a (nearly) sparse vector, we show that the resulting dual solution is close to the original dual solution and concentrates on its support set. In numerical experiments, we present an empirical study to support the analysis and we also present a novel application of the dual-sparse randomized reduction methods to reducing the communication cost of distributed learning from large-scale high-dimensional data.'
volume: 37
URL: http://proceedings.mlr.press/v37/yangb15.html
PDF: http://proceedings.mlr.press/v37/yangb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yangb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yang
given: Tianbao
- family: Zhang
given: Lijun
- family: Jin
given: Rong
- family: Zhu
given: Shenghuo
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 305-314
id: yangb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 305
lastpage: 314
published: 2015-06-01 00:00:00 +0000
- title: 'Generalization error bounds for learning to rank: Does the length of document lists matter?'
abstract: 'We consider the generalization ability of algorithms for learning to rank at a query level, a problem also called subset ranking. Existing generalization error bounds necessarily degrade as the size of the document list associated with a query increases. We show that such a degradation is not intrinsic to the problem. For several loss functions, including the cross-entropy loss used in the well known ListNet method, there is no degradation in generalization ability as document lists become longer. We also provide novel generalization error bounds under \ell_1 regularization and faster convergence rates if the loss function is smooth.'
volume: 37
URL: http://proceedings.mlr.press/v37/tewari15.html
PDF: http://proceedings.mlr.press/v37/tewari15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-tewari15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Tewari
given: Ambuj
- family: Chaudhuri
given: Sougata
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 315-323
id: tewari15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 315
lastpage: 323
published: 2015-06-01 00:00:00 +0000
- title: 'PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data'
abstract: 'Peak detection is a central problem in genomic data analysis, and current algorithms for this task are unsupervised and mostly effective for a single data type and pattern (e.g. H3K4me3 data with a sharp peak pattern). We propose PeakSeg, a new constrained maximum likelihood segmentation model for peak detection with an efficient inference algorithm: constrained dynamic programming. We investigate unsupervised and supervised learning of penalties for the critical model selection problem. We show that the supervised method has state-of-the-art peak detection across all data sets in a benchmark that includes both sharp H3K4me3 and broad H3K36me3 patterns.'
volume: 37
URL: http://proceedings.mlr.press/v37/hocking15.html
PDF: http://proceedings.mlr.press/v37/hocking15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hocking15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hocking
given: Toby
- family: Rigaill
given: Guillem
- family: Bourque
given: Guillaume
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 324-332
id: hocking15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 324
lastpage: 332
published: 2015-06-01 00:00:00 +0000
- title: 'Mind the duality gap: safer rules for the Lasso'
abstract: 'Screening rules allow to early discard irrelevant variables from the optimization in Lasso problems, or its derivatives, making solvers faster. In this paper, we propose new versions of the so-called \textitsafe rules for the Lasso. Based on duality gap considerations, our new rules create safe test regions whose diameters converge to zero, provided that one relies on a converging solver. This property helps screening out more variables, for a wider range of regularization parameter values. In addition to faster convergence, we prove that we correctly identify the active sets (supports) of the solutions in finite time. While our proposed strategy can cope with any solver, its performance is demonstrated using a coordinate descent algorithm particularly adapted to machine learning use cases. Significant computing time reductions are obtained with respect to previous safe rules.'
volume: 37
URL: http://proceedings.mlr.press/v37/fercoq15.html
PDF: http://proceedings.mlr.press/v37/fercoq15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-fercoq15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Fercoq
given: Olivier
- family: Gramfort
given: Alexandre
- family: Salmon
given: Joseph
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 333-342
id: fercoq15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 333
lastpage: 342
published: 2015-06-01 00:00:00 +0000
- title: 'A General Analysis of the Convergence of ADMM'
abstract: 'We provide a new proof of the linear convergence of the alternating direction method of multipliers (ADMM) when one of the objective terms is strongly convex. Our proof is based on a framework for analyzing optimization algorithms introduced in Lessard et al. (2014), reducing algorithm convergence to verifying the stability of a dynamical system. This approach generalizes a number of existing results and obviates any assumptions about specific choices of algorithm parameters. On a numerical example, we demonstrate that minimizing the derived bound on the convergence rate provides a practical approach to selecting algorithm parameters for particular ADMM instances. We complement our upper bound by constructing a nearly-matching lower bound on the worst-case rate of convergence.'
volume: 37
URL: http://proceedings.mlr.press/v37/nishihara15.html
PDF: http://proceedings.mlr.press/v37/nishihara15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-nishihara15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Nishihara
given: Robert
- family: Lessard
given: Laurent
- family: Recht
given: Ben
- family: Packard
given: Andrew
- family: Jordan
given: Michael
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 343-352
id: nishihara15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 343
lastpage: 352
published: 2015-06-01 00:00:00 +0000
- title: 'Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization'
abstract: 'We consider a generic convex optimization problem associated with regularized empirical risk minimization of linear predictors. The problem structure allows us to reformulate it as a convex-concave saddle point problem. We propose a stochastic primal-dual coordinate method, which alternates between maximizing over one (or more) randomly chosen dual variable and minimizing over the primal variable. We also develop an extension to non-smooth and non-strongly convex loss functions, and an extension with better convergence rate on unnormalized data. Both theoretically and empirically, we show that the SPDC method has comparable or better performance than several state-of-the-art optimization methods.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhanga15.html
PDF: http://proceedings.mlr.press/v37/zhanga15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhanga15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhang
given: Yuchen
- family: Lin
given: Xiao
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 353-361
id: zhanga15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 353
lastpage: 361
published: 2015-06-01 00:00:00 +0000
- title: 'DiSCO: Distributed Optimization for Self-Concordant Empirical Loss'
abstract: 'We propose a new distributed algorithm for empirical risk minimization in machine learning. The algorithm is based on an inexact damped Newton method, where the inexact Newton steps are computed by a distributed preconditioned conjugate gradient method. We analyze its iteration complexity and communication efficiency for minimizing self-concordant empirical loss functions, and discuss the results for distributed ridge regression, logistic regression and binary classification with a smoothed hinge loss. In a standard setting for supervised learning, where the n data points are i.i.d. sampled and when the regularization parameter scales as 1/\sqrtn, we show that the proposed algorithm is communication efficient: the required round of communication does not increase with the sample size n, and only grows slowly with the number of machines.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhangb15.html
PDF: http://proceedings.mlr.press/v37/zhangb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhangb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhang
given: Yuchen
- family: Lin
given: Xiao
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 362-370
id: zhangb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 362
lastpage: 370
published: 2015-06-01 00:00:00 +0000
- title: 'Spectral MLE: Top-K Rank Aggregation from Pairwise Comparisons'
abstract: 'This paper explores the preference-based top-K rank aggregation problem. Suppose that a collection of items is repeatedly compared in pairs, and one wishes to recover a consistent ordering that emphasizes the top-K ranked items, based on partially revealed preferences. We focus on the Bradley-Terry-Luce (BTL) model that postulates a set of latent preference scores underlying all items, where the odds of paired comparisons depend only on the relative scores of the items involved. We characterize the minimax limits on identifiability of top-K ranked items, in the presence of random and non-adaptive sampling. Our results highlight a separation measure that quantifies the gap of preference scores between the K-th and (K+1)-th ranked items. The minimum sample complexity required for reliable top-K ranking scales inversely with the separation measure irrespective of other preference distribution metrics. To approach this minimax limit, we propose a nearly linear-time ranking scheme, called Spectral MLE, that returns the indices of the top-K items in accordance to a careful score estimate. In a nutshell, Spectral MLE starts with an initial score estimate with minimal squared loss (obtained via a spectral method), and then successively refines each component with the assistance of coordinate-wise MLEs. Encouragingly, Spectral MLE allows perfect top-K item identification under minimal sample complexity. The practical applicability of Spectral MLE is further corroborated by numerical experiments.'
volume: 37
URL: http://proceedings.mlr.press/v37/chena15.html
PDF: http://proceedings.mlr.press/v37/chena15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-chena15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Chen
given: Yuxin
- family: Suh
given: Changho
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 371-380
id: chena15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 371
lastpage: 380
published: 2015-06-01 00:00:00 +0000
- title: 'Paired-Dual Learning for Fast Training of Latent Variable Hinge-Loss MRFs'
abstract: 'Latent variables allow probabilistic graphical models to capture nuance and structure in important domains such as network science, natural language processing, and computer vision. Naive approaches to learning such complex models can be prohibitively expensive—because they require repeated inferences to update beliefs about latent variables—so lifting this restriction for useful classes of models is an important problem. Hinge-loss Markov random fields (HL-MRFs) are graphical models that allow highly scalable inference and learning in structured domains, in part by representing structured problems with continuous variables. However, this representation leads to challenges when learning with latent variables. We introduce paired-dual learning, a framework that greatly speeds up training by using tractable entropy surrogates and avoiding repeated inferences. Paired-dual learning optimizes an objective with a pair of dual inference problems. This allows fast, joint optimization of parameters and dual variables. We evaluate on social-group detection, trust prediction in social networks, and image reconstruction, finding that paired-dual learning trains models as accurate as those trained by traditional methods in much less time, often before traditional methods make even a single parameter update.'
volume: 37
URL: http://proceedings.mlr.press/v37/bach15.html
PDF: http://proceedings.mlr.press/v37/bach15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-bach15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Bach
given: Stephen
- family: Huang
given: Bert
- family: Boyd-Graber
given: Jordan
- family: Getoor
given: Lise
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 381-390
id: bach15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 381
lastpage: 390
published: 2015-06-01 00:00:00 +0000
- title: 'Structural Maxent Models'
abstract: 'We present a new class of density estimation models, Structural Maxent models, with feature functions selected from possibly very complex families. The design of our models is motivated by data-dependent convergence bounds and benefits from new data-dependent learning bounds expressed in terms of the Rademacher complexities of the sub-families composing the family of features considered. We prove a duality theorem, which we use to derive our Structural Maxent algorithm. We give a full description of our algorithm, including the details of its derivation and report the results of several experiments demonstrating that its performance compares favorably to that of existing regularized Maxent. We further similarly define conditional Structural Maxent models for multi-class classification problems. These are conditional probability models making use of possibly complex feature families. We also prove a duality theorem for these models which shows the connection between these models and existing binary and multi-class deep boosting algorithms.'
volume: 37
URL: http://proceedings.mlr.press/v37/cortes15.html
PDF: http://proceedings.mlr.press/v37/cortes15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-cortes15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Cortes
given: Corinna
- family: Kuznetsov
given: Vitaly
- family: Mohri
given: Mehryar
- family: Syed
given: Umar
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 391-399
id: cortes15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 391
lastpage: 399
published: 2015-06-01 00:00:00 +0000
- title: 'A Provable Generalized Tensor Spectral Method for Uniform Hypergraph Partitioning'
abstract: 'Matrix spectral methods play an important role in statistics and machine learning, and most often the word ‘matrix’ is dropped as, by default, one assumes that similarities or affinities are measured between two points, thereby resulting in similarity matrices. However, recent challenges in computer vision and text mining have necessitated the use of multi-way affinities in the learning methods, and this has led to a considerable interest in hypergraph partitioning methods in machine learning community. A plethora of “higher-order” algorithms have been proposed in the past decade, but their theoretical guarantees are not well-studied. In this paper, we develop a unified approach for partitioning uniform hypergraphs by means of a tensor trace optimization problem involving the affinity tensor, and a number of existing higher-order methods turn out to be special cases of the proposed formulation. We further propose an algorithm to solve the proposed trace optimization problem, and prove that it is consistent under a planted hypergraph model. We also provide experimental results to validate our theoretical findings.'
volume: 37
URL: http://proceedings.mlr.press/v37/ghoshdastidar15.html
PDF: http://proceedings.mlr.press/v37/ghoshdastidar15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ghoshdastidar15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ghoshdastidar
given: Debarghya
- family: Dukkipati
given: Ambedkar
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 400-409
id: ghoshdastidar15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 400
lastpage: 409
published: 2015-06-01 00:00:00 +0000
- title: 'The Benefits of Learning with Strongly Convex Approximate Inference'
abstract: 'We explore the benefits of strongly convex free energies in variational inference, providing both theoretical motivation and a new meta-algorithm. Using the duality between strong convexity and stability, we prove a high-probability bound on the error of learned marginals that is inversely proportional to the modulus of convexity of the free energy, thereby motivating free energies whose moduli are constant with respect to the size of the graph. We identify sufficient conditions for Ω(1)-strong convexity in two popular variational techniques: tree-reweighted and counting number entropies. Our insights for the latter suggest a novel counting number optimization framework, which guarantees strong convexity for any given modulus. Our experiments demonstrate that learning with a strongly convex free energy, using our optimization framework to guarantee a given modulus, results in substantially more accurate marginal probabilities, thereby validating our theoretical claims and the effectiveness of our framework.'
volume: 37
URL: http://proceedings.mlr.press/v37/london15.html
PDF: http://proceedings.mlr.press/v37/london15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-london15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: London
given: Ben
- family: Huang
given: Bert
- family: Getoor
given: Lise
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 410-418
id: london15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 410
lastpage: 418
published: 2015-06-01 00:00:00 +0000
- title: 'Pushing the Limits of Affine Rank Minimization by Adapting Probabilistic PCA'
abstract: 'Many applications require recovering a matrix of minimal rank within an affine constraint set, with matrix completion a notable special case. Because the problem is NP-hard in general, it is common to replace the matrix rank with the nuclear norm, which acts as a convenient convex surrogate. While elegant theoretical conditions elucidate when this replacement is likely to be successful, they are highly restrictive and convex algorithms fail when the ambient rank is too high or when the constraint set is poorly structured. Non-convex alternatives fare somewhat better when carefully tuned; however, convergence to locally optimal solutions remains a continuing source of failure. Against this backdrop we derive a deceptively simple and parameter-free probabilistic PCA-like algorithm that is capable, over a wide battery of empirical tests, of successful recovery even at the theoretical limit where the number of measurements equals the degrees of freedom in the unknown low-rank matrix. Somewhat surprisingly, this is possible even when the affine constraint set is highly ill-conditioned. While proving general recovery guarantees remains evasive for non-convex algorithms, Bayesian-inspired or otherwise, we nonetheless show conditions whereby the underlying cost function has a unique stationary point located at the global optimum; no existing cost function we are aware of satisfies this property. The algorithm has also been successfully deployed on a computer vision application involving image rectification and a standard collaborative filtering benchmark.'
volume: 37
URL: http://proceedings.mlr.press/v37/xin15.html
PDF: http://proceedings.mlr.press/v37/xin15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-xin15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Xin
given: Bo
- family: Wipf
given: David
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 419-427
id: xin15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 419
lastpage: 427
published: 2015-06-01 00:00:00 +0000
- title: 'Budget Allocation Problem with Multiple Advertisers: A Game Theoretic View'
abstract: 'In marketing planning, advertisers seek to maximize the number of customers by allocating given budgets to each media channel effectively. The budget allocation problem with a bipartite influence model captures this scenario; however, the model is problematic because it assumes there is only one advertiser in the market. In reality, there are many advertisers which are in conflict of advertisement; thus we must extend the model for such a case. By extending the budget allocation problem with a bipartite influence model, we propose a game-theoretic model problem that considers many advertisers. By simulating our model, we can analyze the behavior of a media channel market, e.g., we can estimate which media channels are allocated by an advertiser, and which customers are influenced by an advertiser. Our model has many attractive features. First, our model is a potential game; therefore, it has a pure Nash equilibrium. Second, any Nash equilibrium of our game has 2-optimal social utility, i.e., the price of anarchy is 2. Finally, the proposed model can be simulated very efficiently; thus it can be used to analyze large markets.'
volume: 37
URL: http://proceedings.mlr.press/v37/maehara15.html
PDF: http://proceedings.mlr.press/v37/maehara15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-maehara15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Maehara
given: Takanori
- family: Yabe
given: Akihiro
- family: Kawarabayashi
given: Ken-ichi
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 428-437
id: maehara15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 428
lastpage: 437
published: 2015-06-01 00:00:00 +0000
- title: 'Tracking Approximate Solutions of Parameterized Optimization Problems over Multi-Dimensional (Hyper-)Parameter Domains'
abstract: 'Many machine learning methods are given as parameterized optimization problems. Important examples of such parameters are regularization- and kernel hyperparameters. These parameters have to be tuned carefully since the choice of their values can have a significant impact on the statistical performance of the learning methods. In most cases the parameter space does not carry much structure and parameter tuning essentially boils down to exploring the whole parameter space. The case when there is only one parameter received quite some attention over the years. First, algorithms for tracking an optimal solution for several machine learning optimization problems over regularization- and hyperparameter intervals had been developed, but since these algorithms can suffer from numerical problems more robust and efficient approximate path tracking algorithms have been devised and analyzed recently. By now approximate path tracking algorithms are known for regularization-and kernel hyperparameter paths with optimal path complexities that depend only on the prescribed approximation error. Here we extend the work on approximate path tracking algorithms with approximation guarantees to multi-dimensional parameter domains. We show a lower bound on the complexity of approximately exploring a multi-dimensional parameter domain that is the product of the corresponding path complexities. We also show a matching upper bound that can be turned into a theoretically and practically efficient algorithm. Experimental results for kernelized support vector machines and the elastic net confirm the theoretical complexity analysis.'
volume: 37
URL: http://proceedings.mlr.press/v37/blechschmidt15.html
PDF: http://proceedings.mlr.press/v37/blechschmidt15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-blechschmidt15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Blechschmidt
given: Katharina
- family: Giesen
given: Joachim
- family: Laue
given: Soeren
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 438-447
id: blechschmidt15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 438
lastpage: 447
published: 2015-06-01 00:00:00 +0000
- title: 'Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift'
abstract: 'Training Deep Neural Networks is complicated by the fact that the distribution of each layer’s inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a stateof-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.'
volume: 37
URL: http://proceedings.mlr.press/v37/ioffe15.html
PDF: http://proceedings.mlr.press/v37/ioffe15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ioffe15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ioffe
given: Sergey
- family: Szegedy
given: Christian
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 448-456
id: ioffe15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 448
lastpage: 456
published: 2015-06-01 00:00:00 +0000
- title: 'Distributed Estimation of Generalized Matrix Rank: Efficient Algorithms and Lower Bounds'
abstract: 'We study the following generalized matrix rank estimation problem: given an n-by-n matrix and a constant c > 0, estimate the number of eigenvalues that are greater than c. In the distributed setting, the matrix of interest is the sum of m matrices held by separate machines. We show that any deterministic algorithm solving this problem must communicate Ω(n^2) bits, which is order-equivalent to transmitting the whole matrix. In contrast, we propose a randomized algorithm that communicates only O(n) bits. The upper bound is matched by an Ω(n) lower bound on the randomized communication complexity. We demonstrate the practical effectiveness of the proposed algorithm with some numerical experiments.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhangc15.html
PDF: http://proceedings.mlr.press/v37/zhangc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhangc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhang
given: Yuchen
- family: Wainwright
given: Martin
- family: Jordan
given: Michael
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 457-465
id: zhangc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 457
lastpage: 465
published: 2015-06-01 00:00:00 +0000
- title: 'Landmarking Manifolds with Gaussian Processes'
abstract: 'We present an algorithm for finding landmarks along a manifold. These landmarks provide a small set of locations spaced out along the manifold such that they capture the low-dimensional non-linear structure of the data embedded in the high-dimensional space. The approach does not select points directly from the dataset, but instead we optimize each landmark by moving along the continuous manifold space (as approximated by the data) according to the gradient of an objective function. We borrow ideas from active learning with Gaussian processes to define the objective, which has the property that a new landmark is "repelled" by those currently selected, allowing for exploration of the manifold. We derive a stochastic algorithm for learning with large datasets and show results on several datasets, including the Million Song Dataset and articles from the New York Times.'
volume: 37
URL: http://proceedings.mlr.press/v37/liang15.html
PDF: http://proceedings.mlr.press/v37/liang15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-liang15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Liang
given: Dawen
- family: Paisley
given: John
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 466-474
id: liang15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 466
lastpage: 474
published: 2015-06-01 00:00:00 +0000
- title: 'Markov Mixed Membership Models'
abstract: 'We present a Markov mixed membership model (Markov M3) for grouped data that learns a fully connected graph structure among mixing components. A key feature of Markov M3 is that it interprets the mixed membership assignment as a Markov random walk over this graph of nodes. This is in contrast to tree-structured models in which the assignment is done according to a tree structure on the mixing components. The Markov structure results in a simple parametric model that can learn a complex dependency structure between nodes, while still maintaining full conjugacy for closed-form stochastic variational inference. Empirical results demonstrate that Markov M3 performs well compared with tree structured topic models, and can learn meaningful dependency structure between topics.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhangd15.html
PDF: http://proceedings.mlr.press/v37/zhangd15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhangd15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhang
given: Aonan
- family: Paisley
given: John
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 475-483
id: zhangd15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 475
lastpage: 483
published: 2015-06-01 00:00:00 +0000
- title: 'A Unified Framework for Outlier-Robust PCA-like Algorithms'
abstract: 'We propose a unified framework for making a wide range of PCA-like algorithms – including the standard PCA, sparse PCA and non-negative sparse PCA, etc. – robust when facing a constant fraction of arbitrarily corrupted outliers. Our theoretic analysis establishes solid performance guarantees of the proposed framework: its estimation error is upper bounded by a term depending on the intrinsic parameters of the data model, the selected PCA-like algorithm and the fraction of outliers. Comprehensive experiments on synthetic and real-world datasets demonstrate that the outlier-robust PCA-like algorithms derived from our framework have outstanding performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/yangc15.html
PDF: http://proceedings.mlr.press/v37/yangc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yangc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yang
given: Wenzhuo
- family: Xu
given: Huan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 484-493
id: yangc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 484
lastpage: 493
published: 2015-06-01 00:00:00 +0000
- title: 'Streaming Sparse Principal Component Analysis'
abstract: 'This paper considers estimating the leading k principal components with at most s non-zero attributes from p-dimensional samples collected sequentially in memory limited environments. We develop and analyze two memory and computational efficient algorithms called streaming sparse PCA and streaming sparse ECA for analyzing data generated according to the spike model and the elliptical model respectively. In particular, the proposed algorithms have memory complexity O(pk), computational complexity O(pk mink,slogp) and sample complexity Θ(s \log p). We provide their finite sample performance guarantees, which implies statistical consistency in the high dimensional regime. Numerical experiments on synthetic and real-world datasets demonstrate good empirical performance of the proposed algorithms.'
volume: 37
URL: http://proceedings.mlr.press/v37/yangd15.html
PDF: http://proceedings.mlr.press/v37/yangd15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yangd15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yang
given: Wenzhuo
- family: Xu
given: Huan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 494-503
id: yangd15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 494
lastpage: 503
published: 2015-06-01 00:00:00 +0000
- title: 'A Divide and Conquer Framework for Distributed Graph Clustering'
abstract: 'Graph clustering is about identifying clusters of closely connected nodes, and is a fundamental technique of data analysis with many applications including community detection, VLSI network partitioning, collaborative filtering, and many others. In order to improve the scalability of existing graph clustering algorithms, we propose a novel divide and conquer framework for graph clustering, and establish theoretical guarantees of exact recovery of the clusters. One additional advantage of the proposed framework is that it can identify small clusters – the size of the smallest cluster can be of size o(\sqrtn), in contrast to Ω(\sqrtn) required by standard methods. Extensive experiments on synthetic and real-world datasets demonstrate the efficiency and effectiveness of our framework.'
volume: 37
URL: http://proceedings.mlr.press/v37/yange15.html
PDF: http://proceedings.mlr.press/v37/yange15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yange15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yang
given: Wenzhuo
- family: Xu
given: Huan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 504-513
id: yange15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 504
lastpage: 513
published: 2015-06-01 00:00:00 +0000
- title: 'How Can Deep Rectifier Networks Achieve Linear Separability and Preserve Distances?'
abstract: 'This paper investigates how hidden layers of deep rectifier networks are capable of transforming two or more pattern sets to be linearly separable while preserving the distances with a guaranteed degree, and proves the universal classification power of such distance preserving rectifier networks. Through the nearly isometric nonlinear transformation in the hidden layers, the margin of the linear separating plane in the output layer and the margin of the nonlinear separating boundary in the original data space can be closely related so that the maximum margin classification in the input data space can be achieved approximately via the maximum margin linear classifiers in the output layer. The generalization performance of such distance preserving deep rectifier neural networks can be well justified by the distance-preserving properties of their hidden layers and the maximum margin property of the linear classifiers in the output layer.'
volume: 37
URL: http://proceedings.mlr.press/v37/an15.html
PDF: http://proceedings.mlr.press/v37/an15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-an15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: An
given: Senjian
- family: Boussaid
given: Farid
- family: Bennamoun
given: Mohammed
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 514-523
id: an15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 514
lastpage: 523
published: 2015-06-01 00:00:00 +0000
- title: 'Improved Regret Bounds for Undiscounted Continuous Reinforcement Learning'
abstract: 'We consider the problem of undiscounted reinforcement learning in continuous state space. Regret bounds in this setting usually hold under various assumptions on the structure of the reward and transition function. Under the assumption that the rewards and transition probabilities are Lipschitz, for 1-dimensional state space a regret bound of O(T^3/4) after any T steps has been given by Ortner and Ryabko (2012). Here we improve upon this result by using non-parametric kernel density estimation for estimating the transition probability distributions, and obtain regret bounds that depend on the smoothness of the transition probability distributions. In particular, under the assumption that the transition probability functions are smoothly differentiable, the regret bound is shown to be O(T^2/3) asymptotically for reinforcement learning in 1-dimensional state space. Finally, we also derive improved regret bounds for higher dimensional state space.'
volume: 37
URL: http://proceedings.mlr.press/v37/lakshmanan15.html
PDF: http://proceedings.mlr.press/v37/lakshmanan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-lakshmanan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lakshmanan
given: K.
- family: Ortner
given: Ronald
- family: Ryabko
given: Daniil
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 524-532
id: lakshmanan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 524
lastpage: 532
published: 2015-06-01 00:00:00 +0000
- title: 'The Fundamental Incompatibility of Scalable Hamiltonian Monte Carlo and Naive Data Subsampling'
abstract: 'Leveraging the coherent exploration of Hamiltonian flow, Hamiltonian Monte Carlo produces computationally efficient Monte Carlo estimators, even with respect to complex and high-dimensional target distributions. When confronted with data-intensive applications, however, the algorithm may be too expensive to implement, leaving us to consider the utility of approximations such as data subsampling. In this paper I demonstrate how data subsampling fundamentally compromises the scalability of Hamiltonian Monte Carlo.'
volume: 37
URL: http://proceedings.mlr.press/v37/betancourt15.html
PDF: http://proceedings.mlr.press/v37/betancourt15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-betancourt15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Betancourt
given: Michael
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 533-540
id: betancourt15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 533
lastpage: 540
published: 2015-06-01 00:00:00 +0000
- title: 'Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets'
abstract: 'The Frank-Wolfe method (a.k.a. conditional gradient algorithm) for smooth optimization has regained much interest in recent years in the context of large scale optimization and machine learning. A key advantage of the method is that it avoids projections - the computational bottleneck in many applications - replacing it by a linear optimization step. Despite this advantage, the known convergence rates of the FW method fall behind standard first order methods for most settings of interest. It is an active line of research to derive faster linear optimization-based algorithms for various settings of convex optimization. In this paper we consider the special case of optimization over strongly convex sets, for which we prove that the vanila FW method converges at a rate of \frac1t^2. This gives a quadratic improvement in convergence rate compared to the general case, in which convergence is of the order \frac1t, and known to be tight. We show that various balls induced by \ell_p norms, Schatten norms and group norms are strongly convex on one hand and on the other hand, linear optimization over these sets is straightforward and admits a closed-form solution. We further show how several previous fast-rate results for the FW method follow easily from our analysis.'
volume: 37
URL: http://proceedings.mlr.press/v37/garbera15.html
PDF: http://proceedings.mlr.press/v37/garbera15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-garbera15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Garber
given: Dan
- family: Hazan
given: Elad
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 541-549
id: garbera15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 541
lastpage: 549
published: 2015-06-01 00:00:00 +0000
- title: 'Ordered Stick-Breaking Prior for Sequential MCMC Inference of Bayesian Nonparametric Models'
abstract: 'This paper introduces ordered stick-breaking process (OSBP), where the atoms in a stick-breaking process (SBP) appear in order. The choice of weights on the atoms of OSBP ensure that; (1) probability of adding new atoms exponentially decrease, and (2) OSBP, though non-exchangeable, admit predictive probability functions (PPFs). In a Bayesian nonparametric (BNP) setting, OSBP serves as a natural prior over sequential mini-batches, facilitating exchange of relevant statistical information by sharing the atoms of OSBP. One of the major contributions of this paper is SUMO, an MCMC algorithm, for solving the inference problem arising from applying OSBP to BNP models. SUMO uses the PPFs of OSBP to obtain a Gibbs-sampling based truncation-free algorithm which applies generally to BNP models. For large scale inference problems existing algorithms such as particle filtering (PF) are not practical and variational procedures such as TSVI (Wang & Blei, 2012) are the only alternative. For Dirichlet process mixture model (DPMM), SUMO outperforms TSVI on perplexity by 33% on 3 datasets with million data points, which are beyond the scope of PF, using only 3GB RAM.'
volume: 37
URL: http://proceedings.mlr.press/v37/das15.html
PDF: http://proceedings.mlr.press/v37/das15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-das15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Das
given: Mrinal
- family: Bansal
given: Trapit
- family: Bhattacharyya
given: Chiranjib
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 550-559
id: das15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 550
lastpage: 559
published: 2015-06-01 00:00:00 +0000
- title: 'Online Learning of Eigenvectors'
abstract: 'Computing the leading eigenvector of a symmetric real matrix is a fundamental primitive of numerical linear algebra with numerous applications. We consider a natural online extension of the leading eigenvector problem: a sequence of matrices is presented and the goal is to predict for each matrix a unit vector, with the overall goal of competing with the leading eigenvector of the cumulative matrix. Existing regret-minimization algorithms for this problem either require to compute an \textiteigen decompostion every iteration, or suffer from a large dependency of the regret bound on the dimension. In both cases the algorithms are not practical for large scale applications. In this paper we present new algorithms that avoid both issues. On one hand they do not require any expensive matrix decompositions and on the other, they guarantee regret rates with a mild dependence on the dimension at most. In contrast to previous algorithms, our algorithms also admit implementations that enable to leverage sparsity in the data to further reduce computation. We extend our results to also handle non-symmetric matrices.'
volume: 37
URL: http://proceedings.mlr.press/v37/garberb15.html
PDF: http://proceedings.mlr.press/v37/garberb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-garberb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Garber
given: Dan
- family: Hazan
given: Elad
- family: Ma
given: Tengyu
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 560-568
id: garberb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 560
lastpage: 568
published: 2015-06-01 00:00:00 +0000
- title: 'A Unifying Framework of Anytime Sparse Gaussian Process Regression Models with Stochastic Variational Inference for Big Data'
abstract: 'This paper presents a novel unifying framework of anytime sparse Gaussian process regression (SGPR) models that can produce good predictive performance fast and improve their predictive performance over time. Our proposed unifying framework reverses the variational inference procedure to theoretically construct a non-trivial, concave functional that is maximized at the predictive distribution of any SGPR model of our choice. As a result, a stochastic natural gradient ascent method can be derived that involves iteratively following the stochastic natural gradient of the functional to improve its estimate of the predictive distribution of the chosen SGPR model and is guaranteed to achieve asymptotic convergence to it. Interestingly, we show that if the predictive distribution of the chosen SGPR model satisfies certain decomposability conditions, then the stochastic natural gradient is an unbiased estimator of the exact natural gradient and can be computed in constant time (i.e., independent of data size) at each iteration. We empirically evaluate the trade-off between the predictive performance vs. time efficiency of the anytime SGPR models on two real-world million-sized datasets.'
volume: 37
URL: http://proceedings.mlr.press/v37/hoang15.html
PDF: http://proceedings.mlr.press/v37/hoang15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hoang15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hoang
given: Trong Nghia
- family: Hoang
given: Quang Minh
- family: Low
given: Bryan Kian Hsiang
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 569-578
id: hoang15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 569
lastpage: 578
published: 2015-06-01 00:00:00 +0000
- title: 'Yinyang K-Means: A Drop-In Replacement of the Classic K-Means with Consistent Speedup'
abstract: 'This paper presents Yinyang K-means, a new algorithm for K-means clustering. By clustering the centers in the initial stage, and leveraging efficiently maintained lower and upper bounds between a point and centers, it more effectively avoids unnecessary distance calculations than prior algorithms. It significantly outperforms classic K-means and prior alternative K-means algorithms consistently across all experimented data sets, cluster numbers, and machine configurations. The consistent, superior performance—plus its simplicity, user-control of overheads, and guarantee in producing the same clustering results as the standard K-means does—makes Yinyang K-means a drop-in replacement of the classic K-means with an order of magnitude higher performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/ding15.html
PDF: http://proceedings.mlr.press/v37/ding15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ding15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ding
given: Yufei
- family: Zhao
given: Yue
- family: Shen
given: Xipeng
- family: Musuvathi
given: Madanlal
- family: Mytkowicz
given: Todd
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 579-587
id: ding15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 579
lastpage: 587
published: 2015-06-01 00:00:00 +0000
- title: 'Ordinal Mixed Membership Models'
abstract: 'We present a novel class of mixed membership models for joint distributions of groups of observations that co-occur with ordinal response variables for each group for learning statistical associations between the ordinal response variables and the observation groups. The class of proposed models addresses a requirement for predictive and diagnostic methods in a wide range of practical contemporary applications. In this work, by way of illustration, we apply the models to a collection of consumer-generated reviews of mobile software applications, where each review contains unstructured text data accompanied with an ordinal rating, and demonstrate that the models infer useful and meaningful recurring patterns of consumer feedback. We also compare the developed models to relevant existing works, which rely on improper statistical assumptions for ordinal variables, showing significant improvements both in predictive ability and knowledge extraction.'
volume: 37
URL: http://proceedings.mlr.press/v37/virtanen15.html
PDF: http://proceedings.mlr.press/v37/virtanen15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-virtanen15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Virtanen
given: Seppo
- family: Girolami
given: Mark
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 588-596
id: virtanen15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 588
lastpage: 596
published: 2015-06-01 00:00:00 +0000
- title: 'Online Tracking by Learning Discriminative Saliency Map with Convolutional Neural Network'
abstract: 'We propose an online visual tracking algorithm by learning discriminative saliency map using Convolutional Neural Network (CNN). Given a CNN pre-trained on a large-scale image repository in offline, our algorithm takes outputs from hidden layers of the network as feature descriptors since they show excellent representation performance in various general visual recognition problems. The features are used to learn discriminative target appearance models using an online Support Vector Machine (SVM). In addition, we construct target-specific saliency map by back-projecting CNN features with guidance of the SVM, and obtain the final tracking result in each frame based on the appearance model generatively constructed with the saliency map. Since the saliency map reveals spatial configuration of target effectively, it improves target localization accuracy and enables us to achieve pixel-level target segmentation. We verify the effectiveness of our tracking algorithm through extensive experiment on a challenging benchmark, where our method illustrates outstanding performance compared to the state-of-the-art tracking algorithms.'
volume: 37
URL: http://proceedings.mlr.press/v37/hong15.html
PDF: http://proceedings.mlr.press/v37/hong15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hong15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hong
given: Seunghoon
- family: You
given: Tackgeun
- family: Kwak
given: Suha
- family: Han
given: Bohyung
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 597-606
id: hong15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 597
lastpage: 606
published: 2015-06-01 00:00:00 +0000
- title: 'Fast Kronecker Inference in Gaussian Processes with non-Gaussian Likelihoods'
abstract: 'Gaussian processes (GPs) are a flexible class of methods with state of the art performance on spatial statistics applications. However, GPs require O(n^3) computations and O(n^2) storage, and popular GP kernels are typically limited to smoothing and interpolation. To address these difficulties, Kronecker methods have been used to exploit structure in the GP covariance matrix for scalability, while allowing for expressive kernel learning (Wilson et al., 2014). However, fast Kronecker methods have been confined to Gaussian likelihoods. We propose new scalable Kronecker methods for Gaussian processes with non-Gaussian likelihoods, using a Laplace approximation which involves linear conjugate gradients for inference, and a lower bound on the GP marginal likelihood for kernel learning. Our approach has near linear scaling, requiring O(D n^(D+1)/D) operations and O(D n^2/D) storage, for n training data-points on a dense D > 1 dimensional grid. Moreover, we introduce a log Gaussian Cox process, with highly expressive kernels, for modelling spatiotemporal count processes, and apply it to a point pattern (n = 233,088) of a decade of crime events in Chicago. Using our model, we discover spatially varying multiscale seasonal trends and produce highly accurate long-range local area forecasts.'
volume: 37
URL: http://proceedings.mlr.press/v37/flaxman15.html
PDF: http://proceedings.mlr.press/v37/flaxman15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-flaxman15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Flaxman
given: Seth
- family: Wilson
given: Andrew
- family: Neill
given: Daniel
- family: Nickisch
given: Hannes
- family: Smola
given: Alex
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 607-616
id: flaxman15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 607
lastpage: 616
published: 2015-06-01 00:00:00 +0000
- title: 'Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares'
abstract: 'We consider statistical and algorithmic aspects of solving large-scale least-squares (LS) problems using randomized sketching algorithms. Prior results show that, from an \emphalgorithmic perspective, when using sketching matrices constructed from random projections and leverage-score sampling, if the number of samples r much smaller than the original sample size n, then the worst-case (WC) error is the same as solving the original problem, up to a very small relative error. From a \emphstatistical perspective, one typically considers the mean-squared error performance of randomized sketching algorithms, when data are generated according to a statistical linear model. In this paper, we provide a rigorous comparison of both perspectives leading to insights on how they differ. To do this, we first develop a framework for assessing, in a unified manner, algorithmic and statistical aspects of randomized sketching methods. We then consider the statistical prediction efficiency (PE) and the statistical residual efficiency (RE) of the sketched LS estimator; and we use our framework to provide upper bounds for several types of random projection and random sampling algorithms. Among other results, we show that the RE can be upper bounded when r is much smaller than n, while the PE typically requires the number of samples r to be substantially larger. Lower bounds developed in subsequent work show that our upper bounds on PE can not be improved.'
volume: 37
URL: http://proceedings.mlr.press/v37/raskutti15.html
PDF: http://proceedings.mlr.press/v37/raskutti15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-raskutti15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Raskutti
given: Garvesh
- family: Mahoney
given: Michael
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 617-625
id: raskutti15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 617
lastpage: 625
published: 2015-06-01 00:00:00 +0000
- title: 'On TD(0) with function approximation: Concentration bounds and a centered variant with exponential convergence'
abstract: 'We provide non-asymptotic bounds for the well-known temporal difference learning algorithm TD(0) with linear function approximators. These include high-probability bounds as well as bounds in expectation. Our analysis suggests that a step-size inversely proportional to the number of iterations cannot guarantee optimal rate of convergence unless we assume (partial) knowledge of the stationary distribution for the Markov chain underlying the policy considered. We also provide bounds for the iterate averaged TD(0) variant, which gets rid of the step-size dependency while exhibiting the optimal rate of convergence. Furthermore, we propose a variant of TD(0) with linear approximators that incorporates a centering sequence, and establish that it exhibits an exponential rate of convergence in expectation. We demonstrate the usefulness of our bounds on two synthetic experimental settings.'
volume: 37
URL: http://proceedings.mlr.press/v37/korda15.html
PDF: http://proceedings.mlr.press/v37/korda15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-korda15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Korda
given: Nathaniel
- family: La
given: Prashanth
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 626-634
id: korda15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 626
lastpage: 634
published: 2015-06-01 00:00:00 +0000
- title: 'Learning Parametric-Output HMMs with Two Aliased States'
abstract: 'In various applications involving hidden Markov models (HMMs), some of the hidden states are aliased, having identical output distributions. The minimality, identifiability and learnability of such aliased HMMs have been long standing problems, with only partial solutions provided thus far. In this paper we focus on parametric-output HMMs, whose output distributions come from a parametric family, and that have exactly two aliased states. For this class, we present a complete characterization of their minimality and identifiability. Furthermore, for a large family of parametric output distributions, we derive computationally efficient and statistically consistent algorithms to detect the presence of aliasing and learn the aliased HMM transition and emission parameters. We illustrate our theoretical analysis by several simulations.'
volume: 37
URL: http://proceedings.mlr.press/v37/weiss15.html
PDF: http://proceedings.mlr.press/v37/weiss15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-weiss15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Weiss
given: Roi
- family: Nadler
given: Boaz
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 635-644
id: weiss15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 635
lastpage: 644
published: 2015-06-01 00:00:00 +0000
- title: 'Latent Gaussian Processes for Distribution Estimation of Multivariate Categorical Data'
abstract: 'Multivariate categorical data occur in many applications of machine learning. One of the main difficulties with these vectors of categorical variables is sparsity. The number of possible observations grows exponentially with vector length, but dataset diversity might be poor in comparison. Recent models have gained significant improvement in supervised tasks with this data. These models embed observations in a continuous space to capture similarities between them. Building on these ideas we propose a Bayesian model for the unsupervised task of distribution estimation of multivariate categorical data. We model vectors of categorical variables as generated from a non-linear transformation of a continuous latent space. Non-linearity captures multi-modality in the distribution. The continuous representation addresses sparsity. Our model ties together many existing models, linking the linear categorical latent Gaussian model, the Gaussian process latent variable model, and Gaussian process classification. We derive inference for our model based on recent developments in sampling based variational inference. We show empirically that the model outperforms its linear and discrete counterparts in imputation tasks of sparse data.'
volume: 37
URL: http://proceedings.mlr.press/v37/gala15.html
PDF: http://proceedings.mlr.press/v37/gala15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gala15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gal
given: Yarin
- family: Chen
given: Yutian
- family: Ghahramani
given: Zoubin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 645-654
id: gala15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 645
lastpage: 654
published: 2015-06-01 00:00:00 +0000
- title: 'Improving the Gaussian Process Sparse Spectrum Approximation by Representing Uncertainty in Frequency Inputs'
abstract: 'Standard sparse pseudo-input approximations to the Gaussian process (GP) cannot handle complex functions well. Sparse spectrum alternatives attempt to answer this but are known to over-fit. We suggest the use of variational inference for the sparse spectrum approximation to avoid both issues. We model the covariance function with a finite Fourier series approximation and treat it as a random variable. The random covariance function has a posterior, on which a variational distribution is placed. The variational distribution transforms the random covariance function to fit the data. We study the properties of our approximate inference, compare it to alternative ones, and extend it to the distributed and stochastic domains. Our approximation captures complex functions better than standard approaches and avoids over-fitting.'
volume: 37
URL: http://proceedings.mlr.press/v37/galb15.html
PDF: http://proceedings.mlr.press/v37/galb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-galb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gal
given: Yarin
- family: Turner
given: Richard
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 655-664
id: galb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 655
lastpage: 664
published: 2015-06-01 00:00:00 +0000
- title: 'Ranking from Stochastic Pairwise Preferences: Recovering Condorcet Winners and Tournament Solution Sets at the Top'
abstract: 'We consider the problem of ranking n items from stochastically sampled pairwise preferences. It was shown recently that when the underlying pairwise preferences are acyclic, several algorithms including the Rank Centrality algorithm, the Matrix Borda algorithm, and the SVM-RankAggregation algorithm succeed in recovering a ranking that minimizes a global pairwise disagreement error (Rajkumar and Agarwal, 2014). In this paper, we consider settings where pairwise preferences can contain cycles. In such settings, one may still like to be able to recover ‘good’ items at the top of the ranking. For example, if a Condorcet winner exists that beats every other item, it is natural to ask that this be ranked at the top. More generally, several tournament solution concepts such as the top cycle, Copeland set, Markov set and others have been proposed in the social choice literature for choosing a set of winners in the presence of cycles. We show that existing algorithms can fail to perform well in terms of ranking Condorcet winners and various natural tournament solution sets at the top. We then give alternative ranking algorithms that provably rank Condorcet winners, top cycles, and other tournament solution sets of interest at the top. In all cases, we give finite sample complexity bounds for our algorithms to recover such winners. As a by-product of our analysis, we also obtain an improved sample complexity bound for the Rank Centrality algorithm to recover an optimal ranking under a Bradley-Terry-Luce (BTL) condition, which answers an open question of Rajkumar and Agarwal (2014).'
volume: 37
URL: http://proceedings.mlr.press/v37/rajkumar15.html
PDF: http://proceedings.mlr.press/v37/rajkumar15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-rajkumar15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Rajkumar
given: Arun
- family: Ghoshal
given: Suprovat
- family: Lim
given: Lek-Heng
- family: Agarwal
given: Shivani
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 665-673
id: rajkumar15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 665
lastpage: 673
published: 2015-06-01 00:00:00 +0000
- title: 'Stochastic Dual Coordinate Ascent with Adaptive Probabilities'
abstract: 'This paper introduces AdaSDCA: an adaptive variant of stochastic dual coordinate ascent (SDCA) for solving the regularized empirical risk minimization problems. Our modification consists in allowing the method adaptively change the probability distribution over the dual variables throughout the iterative process. AdaSDCA achieves provably better complexity bound than SDCA with the best fixed probability distribution, known as importance sampling. However, it is of a theoretical character as it is expensive to implement. We also propose AdaSDCA+: a practical variant which in our experiments outperforms existing non-adaptive methods.'
volume: 37
URL: http://proceedings.mlr.press/v37/csiba15.html
PDF: http://proceedings.mlr.press/v37/csiba15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-csiba15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Csiba
given: Dominik
- family: Qu
given: Zheng
- family: Richtarik
given: Peter
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 674-683
id: csiba15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 674
lastpage: 683
published: 2015-06-01 00:00:00 +0000
- title: 'Vector-Space Markov Random Fields via Exponential Families'
abstract: 'We present Vector-Space Markov Random Fields (VS-MRFs), a novel class of undirected graphical models where each variable can belong to an arbitrary vector space. VS-MRFs generalize a recent line of work on scalar-valued, uni-parameter exponential family and mixed graphical models, thereby greatly broadening the class of exponential families available (e.g., allowing multinomial and Dirichlet distributions). Specifically, VS-MRFs are the joint graphical model distributions where the node-conditional distributions belong to generic exponential families with general vector space domains. We also present a sparsistent M-estimator for learning our class of MRFs that recovers the correct set of edges with high probability. We validate our approach via a set of synthetic data experiments as well as a real-world case study of over four million foods from the popular diet tracking app MyFitnessPal. Our results demonstrate that our algorithm performs well empirically and that VS-MRFs are capable of capturing and highlighting interesting structure in complex, real-world data. All code for our algorithm is open source and publicly available.'
volume: 37
URL: http://proceedings.mlr.press/v37/tansey15.html
PDF: http://proceedings.mlr.press/v37/tansey15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-tansey15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Tansey
given: Wesley
- family: Padilla
given: Oscar Hernan Madrid
- family: Suggala
given: Arun Sai
- family: Ravikumar
given: Pradeep
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 684-692
id: tansey15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 684
lastpage: 692
published: 2015-06-01 00:00:00 +0000
- title: 'JUMP-Means: Small-Variance Asymptotics for Markov Jump Processes'
abstract: 'Markov jump processes (MJPs) are used to model a wide range of phenomenon from disease progression to RNA path folding. However, existing methods suffer from a number of shortcomings: degenerate trajectories in the case of ML estimation of parametric models and poor inferential performance in the case of nonparametric models. We take a small-variance asymptotics (SVA) approach to overcome these limitations. We derive the small-variance asymptotics for parametric and nonparametric MJPs for both directly observed and hidden state models. In the parametric case we obtain a novel objective function which leads to non-degenerate trajectories. To derive the nonparametric version we introduce the gamma-gamma process, a novel extension to the gamma-exponential process. We propose algorithms for each of these formulations, which we call \emphJUMP-means. Our experiments demonstrate that JUMP-means is competitive with or outperforms widely used MJP inference approaches in terms of both speed and reconstruction accuracy.'
volume: 37
URL: http://proceedings.mlr.press/v37/hugginsa15.html
PDF: http://proceedings.mlr.press/v37/hugginsa15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hugginsa15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Huggins
given: Jonathan
- family: Narasimhan
given: Karthik
- family: Saeedi
given: Ardavan
- family: Mansinghka
given: Vikash
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 693-701
id: hugginsa15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 693
lastpage: 701
published: 2015-06-01 00:00:00 +0000
- title: 'Low Rank Approximation using Error Correcting Coding Matrices'
abstract: 'Low-rank matrix approximation is an integral component of tools such as principal component analysis (PCA), as well as is an important instrument used in applications like web search models, text mining and computer vision, e.g., face recognition. Recently, randomized algorithms were proposed to effectively construct low rank approximations of large matrices. In this paper, we show how matrices from error correcting codes can be used to find such low rank approximations. The benefits of using these code matrices are the following: (i) They are easy to generate and they reduce randomness significantly. (ii) Code matrices have low coherence and have a better chance of preserving the geometry of an entire subspace of vectors; (iii) Unlike Fourier transforms or Hadamard matrices, which require sampling O(k\log k) columns for a rank-k approximation, the log factor is not necessary in the case of code matrices. (iv) Under certain conditions, the approximation errors can be better and the singular values obtained can be more accurate, than those obtained using Gaussian random matrices and other structured random matrices.'
volume: 37
URL: http://proceedings.mlr.press/v37/ubaru15.html
PDF: http://proceedings.mlr.press/v37/ubaru15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ubaru15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ubaru
given: Shashanka
- family: Mazumdar
given: Arya
- family: Saad
given: Yousef
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 702-710
id: ubaru15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 702
lastpage: 710
published: 2015-06-01 00:00:00 +0000
- title: 'Off-policy Model-based Learning under Unknown Factored Dynamics'
abstract: 'Off-policy learning in dynamic decision problems is essential for providing strong evidence that a new policy is better than the one in use. But how can we prove superiority without testing the new policy? To answer this question, we introduce the G-SCOPE algorithm that evaluates a new policy based on data generated by the existing policy. Our algorithm is both computationally and sample efficient because it greedily learns to exploit factored structure in the dynamics of the environment. We present a finite sample analysis of our approach and show through experiments that the algorithm scales well on high-dimensional problems with few samples.'
volume: 37
URL: http://proceedings.mlr.press/v37/hallak15.html
PDF: http://proceedings.mlr.press/v37/hallak15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hallak15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hallak
given: Assaf
- family: Schnitzler
given: Francois
- family: Mann
given: Timothy
- family: Mannor
given: Shie
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 711-719
id: hallak15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 711
lastpage: 719
published: 2015-06-01 00:00:00 +0000
- title: 'Log-Euclidean Metric Learning on Symmetric Positive Definite Manifold with Application to Image Set Classification'
abstract: 'The manifold of Symmetric Positive Definite (SPD) matrices has been successfully used for data representation in image set classification. By endowing the SPD manifold with Log-Euclidean Metric, existing methods typically work on vector-forms of SPD matrix logarithms. This however not only inevitably distorts the geometrical structure of the space of SPD matrix logarithms but also brings low efficiency especially when the dimensionality of SPD matrix is high. To overcome this limitation, we propose a novel metric learning approach to work directly on logarithms of SPD matrices. Specifically, our method aims to learn a tangent map that can directly transform the matrix logarithms from the original tangent space to a new tangent space of more discriminability. Under the tangent map framework, the novel metric learning can then be formulated as an optimization problem of seeking a Mahalanobis-like matrix, which can take the advantage of traditional metric learning techniques. Extensive evaluations on several image set classification tasks demonstrate the effectiveness of our proposed metric learning method.'
volume: 37
URL: http://proceedings.mlr.press/v37/huanga15.html
PDF: http://proceedings.mlr.press/v37/huanga15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-huanga15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Huang
given: Zhiwu
- family: Wang
given: Ruiping
- family: Shan
given: Shiguang
- family: Li
given: Xianqiu
- family: Chen
given: Xilin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 720-729
id: huanga15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 720
lastpage: 729
published: 2015-06-01 00:00:00 +0000
- title: 'Asymmetric Transfer Learning with Deep Gaussian Processes'
abstract: 'We introduce a novel Gaussian process based Bayesian model for asymmetric transfer learning. We adopt a two-layer feed-forward deep Gaussian process as the task learner of source and target domains. The first layer projects the data onto a separate non-linear manifold for each task. We perform knowledge transfer by projecting the target data also onto the source domain and linearly combining its representations on the source and target domain manifolds. Our approach achieves the state-of-the-art in a benchmark real-world image categorization task, and improves on it in cross-tissue tumor detection from histopathology tissue slide images.'
volume: 37
URL: http://proceedings.mlr.press/v37/kandemir15.html
PDF: http://proceedings.mlr.press/v37/kandemir15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-kandemir15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Kandemir
given: Melih
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 730-738
id: kandemir15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 730
lastpage: 738
published: 2015-06-01 00:00:00 +0000
- title: 'Towards a Lower Sample Complexity for Robust One-bit Compressed Sensing'
abstract: 'In this paper, we propose a novel algorithm based on nonconvex sparsity-inducing penalty for one-bit compressed sensing. We prove that our algorithm has a sample complexity of O(s/ε^2) for strong signals, and O(s\log d/ε^2) for weak signals, where s is the number of nonzero entries in the signal vector, d is the signal dimension and εis the recovery error. For general signals, the sample complexity of our algorithm lies between O(s/ε^2) and O(s\log d/ε^2). This is a remarkable improvement over the existing best sample complexity O(s\log d/ε^2). Furthermore, we show that our algorithm achieves exact support recovery with high probability for strong signals. Our theory is verified by extensive numerical experiments, which clearly illustrate the superiority of our algorithm for both approximate signal and support recovery in the noisy setting.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhua15.html
PDF: http://proceedings.mlr.press/v37/zhua15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhua15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhu
given: Rongda
- family: Gu
given: Quanquan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 739-747
id: zhua15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 739
lastpage: 747
published: 2015-06-01 00:00:00 +0000
- title: 'BilBOWA: Fast Bilingual Distributed Representations without Word Alignments'
abstract: 'We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simple and computationally-efficient model for learning bilingual distributed representations of words which can scale to large monolingual datasets and does not require word-aligned parallel training data. Instead it trains directly on monolingual data and extracts a bilingual signal from a smaller set of raw-text sentence-aligned data. This is achieved using a novel sampled bag-of-words cross-lingual objective, which is used to regularize two noise-contrastive language models for efficient cross-lingual feature learning. We show that bilingual embeddings learned using the proposed model outperforms state-of-the-art methods on a cross-lingual document classification task as well as a lexical translation task on the WMT11 data.'
volume: 37
URL: http://proceedings.mlr.press/v37/gouws15.html
PDF: http://proceedings.mlr.press/v37/gouws15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gouws15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gouws
given: Stephan
- family: Bengio
given: Yoshua
- family: Corrado
given: Greg
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 748-756
id: gouws15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 748
lastpage: 756
published: 2015-06-01 00:00:00 +0000
- title: 'Multi-view Sparse Co-clustering via Proximal Alternating Linearized Minimization'
abstract: 'When multiple views of data are available for a set of subjects, co-clustering aims to identify subject clusters that agree across the different views. We explore the problem of co-clustering when the underlying clusters exist in different subspaces of each view. We propose a proximal alternating linearized minimization algorithm that simultaneously decomposes multiple data matrices into sparse row and columns vectors. This approach is able to group subjects consistently across the views and simultaneously identify the subset of features in each view that are associated with the clusters. The proposed algorithm can globally converge to a critical point of the problem. A simulation study validates that the proposed algorithm can identify the hypothesized clusters and their associated features. Comparison with several latest multi-view co-clustering methods on benchmark datasets demonstrates the superior performance of the proposed approach.'
volume: 37
URL: http://proceedings.mlr.press/v37/sunb15.html
PDF: http://proceedings.mlr.press/v37/sunb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-sunb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sun
given: Jiangwen
- family: Lu
given: Jin
- family: Xu
given: Tingyang
- family: Bi
given: Jinbo
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 757-766
id: sunb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 757
lastpage: 766
published: 2015-06-01 00:00:00 +0000
- title: 'Cascading Bandits: Learning to Rank in the Cascade Model'
abstract: 'A search engine usually outputs a list of K web pages. The user examines this list, from the first web page to the last, and chooses the first attractive page. This model of user behavior is known as the cascade model. In this paper, we propose cascading bandits, a learning variant of the cascade model where the objective is to identify K most attractive items. We formulate our problem as a stochastic combinatorial partial monitoring problem. We propose two algorithms for solving it, CascadeUCB1 and CascadeKL-UCB. We also prove gap-dependent upper bounds on the regret of these algorithms and derive a lower bound on the regret in cascading bandits. The lower bound matches the upper bound of CascadeKL-UCB up to a logarithmic factor. We experiment with our algorithms on several problems. The algorithms perform surprisingly well even when our modeling assumptions are violated.'
volume: 37
URL: http://proceedings.mlr.press/v37/kveton15.html
PDF: http://proceedings.mlr.press/v37/kveton15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-kveton15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Kveton
given: Branislav
- family: Szepesvari
given: Csaba
- family: Wen
given: Zheng
- family: Ashkan
given: Azin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 767-776
id: kveton15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 767
lastpage: 776
published: 2015-06-01 00:00:00 +0000
- title: 'Latent Topic Networks: A Versatile Probabilistic Programming Framework for Topic Models'
abstract: 'Topic models have become increasingly prominent text-analytic machine learning tools for research in the social sciences and the humanities. In particular, custom topic models can be developed to answer specific research questions. The design of these models requires a non-trivial amount of effort and expertise, motivating general-purpose topic modeling frameworks. In this paper we introduce latent topic networks, a flexible class of richly structured topic models designed to facilitate applied research. Custom models can straightforwardly be developed in our framework with an intuitive first-order logical probabilistic programming language. Latent topic networks admit scalable training via a parallelizable EM algorithm which leverages ADMM in the M-step. We demonstrate the broad applicability of the models with case studies on modeling influence in citation networks, and U.S. Presidential State of the Union addresses.'
volume: 37
URL: http://proceedings.mlr.press/v37/foulds15.html
PDF: http://proceedings.mlr.press/v37/foulds15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-foulds15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Foulds
given: James
- family: Kumar
given: Shachi
- family: Getoor
given: Lise
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 777-786
id: foulds15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 777
lastpage: 786
published: 2015-06-01 00:00:00 +0000
- title: 'Random Coordinate Descent Methods for Minimizing Decomposable Submodular Functions'
abstract: 'Submodular function minimization is a fundamental optimization problem that arises in several applications in machine learning and computer vision. The problem is known to be solvable in polynomial time, but general purpose algorithms have high running times and are unsuitable for large-scale problems. Recent work have used convex optimization techniques to obtain very practical algorithms for minimizing functions that are sums of “simple” functions. In this paper, we use random coordinate descent methods to obtain algorithms with faster \emphlinear convergence rates and cheaper iteration costs. Compared to alternating projection methods, our algorithms do not rely on full-dimensional vector operations and they converge in significantly fewer iterations.'
volume: 37
URL: http://proceedings.mlr.press/v37/ene15.html
PDF: http://proceedings.mlr.press/v37/ene15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ene15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ene
given: Alina
- family: Nguyen
given: Huy
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 787-795
id: ene15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 787
lastpage: 795
published: 2015-06-01 00:00:00 +0000
- title: 'Alpha-Beta Divergences Discover Micro and Macro Structures in Data'
abstract: 'Although recent work in non-linear dimensionality reduction investigates multiple choices of divergence measure during optimization \citeyang2013icml,bunte2012neuro, little work discusses the direct effects that divergence measures have on visualization. We study this relationship, theoretically and through an empirical analysis over 10 datasets. Our works shows how the αand βparameters of the generalized alpha-beta divergence can be chosen to discover hidden macro-structures (categories, e.g. birds) or micro-structures (fine-grained classes, e.g. toucans). Our method, which generalizes t-SNE \citetsne, allows us to discover such structure without extensive grid searches over (α, β) due to our theoretical analysis: such structure is apparent with particular choices of (α, β) that generalize across datasets. We also discuss efficient parallel CPU and GPU schemes which are non-trivial due to the tree-structures employed in optimization and the large datasets that do not fully fit into GPU memory. Our method runs 20x faster than the fastest published code \citefmm. We conclude with detailed case studies on the following very large datasets: ILSVRC 2012, a standard computer vision dataset with 1.2M images; SUSY, a particle physics dataset with 5M instances; and HIGGS, another particle physics dataset with 11M instances. This represents the largest published visualization attained by SNE methods. We have open-sourced our visualization code: \texttthttp://rll.berkeley.edu/absne/.'
volume: 37
URL: http://proceedings.mlr.press/v37/narayan15.html
PDF: http://proceedings.mlr.press/v37/narayan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-narayan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Narayan
given: Karthik
- family: Punjani
given: Ali
- family: Abbeel
given: Pieter
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 796-804
id: narayan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 796
lastpage: 804
published: 2015-06-01 00:00:00 +0000
- title: 'Fictitious Self-Play in Extensive-Form Games'
abstract: 'Fictitious play is a popular game-theoretic model of learning in games. However, it has received little attention in practical applications to large problems. This paper introduces two variants of fictitious play that are implemented in behavioural strategies of an extensive-form game. The first variant is a full-width process that is realization equivalent to its normal-form counterpart and therefore inherits its convergence guarantees. However, its computational requirements are linear in time and space rather than exponential. The second variant, Fictitious Self-Play, is a machine learning framework that implements fictitious play in a sample-based fashion. Experiments in imperfect-information poker games compare our approaches and demonstrate their convergence to approximate Nash equilibria.'
volume: 37
URL: http://proceedings.mlr.press/v37/heinrich15.html
PDF: http://proceedings.mlr.press/v37/heinrich15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-heinrich15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Heinrich
given: Johannes
- family: Lanctot
given: Marc
- family: Silver
given: David
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 805-813
id: heinrich15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 805
lastpage: 813
published: 2015-06-01 00:00:00 +0000
- title: 'Counterfactual Risk Minimization: Learning from Logged Bandit Feedback'
abstract: 'We develop a learning principle and an efficient algorithm for batch learning from logged bandit feedback. This learning setting is ubiquitous in online systems (e.g., ad placement, web search, recommendation), where an algorithm makes a prediction (e.g., ad ranking) for a given input (e.g., query) and observes bandit feedback (e.g., user clicks on presented ads). We first address the counterfactual nature of the learning problem through propensity scoring. Next, we prove generalization error bounds that account for the variance of the propensity-weighted empirical risk estimator. These constructive bounds give rise to the Counterfactual Risk Minimization (CRM) principle. We show how CRM can be used to derive a new learning method – called Policy Optimizer for Exponential Models (POEM) – for learning stochastic linear rules for structured output prediction. We present a decomposition of the POEM objective that enables efficient stochastic gradient optimization. POEM is evaluated on several multi-label classification problems showing substantially improved robustness and generalization performance compared to the state-of-the-art.'
volume: 37
URL: http://proceedings.mlr.press/v37/swaminathan15.html
PDF: http://proceedings.mlr.press/v37/swaminathan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-swaminathan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Swaminathan
given: Adith
- family: Joachims
given: Thorsten
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 814-823
id: swaminathan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 814
lastpage: 823
published: 2015-06-01 00:00:00 +0000
- title: 'The Hedge Algorithm on a Continuum'
abstract: 'We consider an online optimization problem on a subset S of R^n (not necessarily convex), in which a decision maker chooses, at each iteration t, a probability distribution x^(t) over S, and seeks to minimize a cumulative expected loss, where each loss is a Lipschitz function revealed at the end of iteration t. Building on previous work, we propose a generalized Hedge algorithm and show a O(\sqrtt \log t) bound on the regret when the losses are uniformly Lipschitz and S is uniformly fat (a weaker condition than convexity). Finally, we propose a generalization to the dual averaging method on the set of Lebesgue-continuous distributions over S.'
volume: 37
URL: http://proceedings.mlr.press/v37/krichene15.html
PDF: http://proceedings.mlr.press/v37/krichene15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-krichene15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Krichene
given: Walid
- family: Balandat
given: Maximilian
- family: Tomlin
given: Claire
- family: Bayen
given: Alexandre
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 824-832
id: krichene15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 824
lastpage: 832
published: 2015-06-01 00:00:00 +0000
- title: 'A Linear Dynamical System Model for Text'
abstract: 'Low dimensional representations of words allow accurate NLP models to be trained on limited annotated data. While most representations ignore words’ local context, a natural way to induce context-dependent representations is to perform inference in a probabilistic latent-variable sequence model. Given the recent success of continuous vector space word representations, we provide such an inference procedure for continuous states, where words’ representations are given by the posterior mean of a linear dynamical system. Here, efficient inference can be performed using Kalman filtering. Our learning algorithm is extremely scalable, operating on simple co-occurrence counts for both parameter initialization using the method of moments and subsequent iterations of EM. In our experiments, we employ our inferred word embeddings as features in standard tagging tasks, obtaining significant accuracy improvements. Finally, the Kalman filter updates can be seen as a linear recurrent neural network. We demonstrate that using the parameters of our model to initialize a non-linear recurrent neural network language model reduces its training time by a day and yields lower perplexity.'
volume: 37
URL: http://proceedings.mlr.press/v37/belanger15.html
PDF: http://proceedings.mlr.press/v37/belanger15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-belanger15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Belanger
given: David
- family: Kakade
given: Sham
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 833-842
id: belanger15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 833
lastpage: 842
published: 2015-06-01 00:00:00 +0000
- title: 'Unsupervised Learning of Video Representations using LSTMs'
abstract: 'We use Long Short Term Memory (LSTM) networks to learn representations of video sequences. Our model uses an encoder LSTM to map an input sequence into a fixed length representation. This representation is decoded using single or multiple decoder LSTMs to perform different tasks, such as reconstructing the input sequence, or predicting the future sequence. We experiment with two kinds of input sequences – patches of image pixels and high-level representations (“percepts") of video frames extracted using a pretrained convolutional net. We explore different design choices such as whether the decoder LSTMs should condition on the generated output. We analyze the outputs of the model qualitatively to see how well the model can extrapolate the learned video representation into the future and into the past. We further evaluate the representations by finetuning them for a supervised learning problem – human action recognition on the UCF-101 and HMDB-51 datasets. We show that the representations help improve classification accuracy, especially when there are only few training examples. Even models pretrained on unrelated datasets (300 hours of YouTube videos) can help action recognition performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/srivastava15.html
PDF: http://proceedings.mlr.press/v37/srivastava15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-srivastava15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Srivastava
given: Nitish
- family: Mansimov
given: Elman
- family: Salakhudinov
given: Ruslan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 843-852
id: srivastava15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 843
lastpage: 852
published: 2015-06-01 00:00:00 +0000
- title: 'Message Passing for Collective Graphical Models'
abstract: 'Collective graphical models (CGMs) are a formalism for inference and learning about a population of independent and identically distributed individuals when only noisy aggregate data are available. We highlight a close connection between approximate MAP inference in CGMs and marginal inference in standard graphical models. The connection leads us to derive a novel Belief Propagation (BP) style algorithm for collective graphical models. Mathematically, the algorithm is a strict generalization of BP—it can be viewed as an extension to minimize the Bethe free energy plus additional energy terms that are non-linear functions of the marginals. For CGMs, the algorithm is much more efficient than previous approaches to inference. We demonstrate its performance on two synthetic experiments concerning bird migration and collective human mobility.'
volume: 37
URL: http://proceedings.mlr.press/v37/sunc15.html
PDF: http://proceedings.mlr.press/v37/sunc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-sunc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sun
given: Tao
- family: Sheldon
given: Dan
- family: Kumar
given: Akshat
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 853-861
id: sunc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 853
lastpage: 861
published: 2015-06-01 00:00:00 +0000
- title: 'DP-space: Bayesian Nonparametric Subspace Clustering with Small-variance Asymptotics'
abstract: 'Subspace clustering separates data points approximately lying on union of affine subspaces into several clusters. This paper presents a novel nonparametric Bayesian subspace clustering model that infers both the number of subspaces and the dimension of each subspace from the observed data. Though the posterior inference is hard, our model leads to a very efficient deterministic algorithm, DP-space, which retains the nonparametric ability under a small-variance asymptotic analysis. DP-space monotonically minimizes an intuitive objective with an explicit tradeoff between data fitness and model complexity. Experimental results demonstrate that DP-space outperforms various competitors in terms of clustering accuracy and at the same time it is highly efficient.'
volume: 37
URL: http://proceedings.mlr.press/v37/wanga15.html
PDF: http://proceedings.mlr.press/v37/wanga15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wanga15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wang
given: Yining
- family: Zhu
given: Jun
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 862-870
id: wanga15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 862
lastpage: 870
published: 2015-06-01 00:00:00 +0000
- title: 'HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades'
abstract: 'Understanding the diffusion of information in social network and social media requires modeling the text diffusion process. In this work, we develop the HawkesTopic model (HTM) for analyzing text-based cascades, such as "retweeting a post" or "publishing a follow-up blog post". HTM combines Hawkes processes and topic modeling to simultaneously reason about the information diffusion pathways and the topics characterizing the observed textual information. We show how to jointly infer them with a mean-field variational inference algorithm and validate our approach on both synthetic and real-world data sets, including a news media dataset for modeling information diffusion, and an ArXiv publication dataset for modeling scientific influence. The results show that HTM is significantly more accurate than several baselines for both tasks.'
volume: 37
URL: http://proceedings.mlr.press/v37/he15.html
PDF: http://proceedings.mlr.press/v37/he15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-he15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: He
given: Xinran
- family: Rekatsinas
given: Theodoros
- family: Foulds
given: James
- family: Getoor
given: Lise
- family: Liu
given: Yan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 871-880
id: he15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 871
lastpage: 880
published: 2015-06-01 00:00:00 +0000
- title: 'MADE: Masked Autoencoder for Distribution Estimation'
abstract: 'There has been a lot of recent interest in designing neural network models to estimate a distribution from a set of examples. We introduce a simple modification for autoencoder neural networks that yields powerful generative models. Our method masks the autoencoder’s parameters to respect autoregressive constraints: each input is reconstructed only from previous inputs in a given ordering. Constrained this way, the autoencoder outputs can be interpreted as a set of conditional probabilities, and their product, the full joint probability. We can also train a single network that can decompose the joint probability in multiple different orderings. Our simple framework can be applied to multiple architectures, including deep ones. Vectorized implementations, such as on GPUs, are simple and fast. Experiments demonstrate that this approach is competitive with state-of-the-art tractable distribution estimators. At test time, the method is significantly faster and scales better than other autoregressive estimators.'
volume: 37
URL: http://proceedings.mlr.press/v37/germain15.html
PDF: http://proceedings.mlr.press/v37/germain15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-germain15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Germain
given: Mathieu
- family: Gregor
given: Karol
- family: Murray
given: Iain
- family: Larochelle
given: Hugo
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 881-889
id: germain15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 881
lastpage: 889
published: 2015-06-01 00:00:00 +0000
- title: 'An Online Learning Algorithm for Bilinear Models'
abstract: 'We investigate the bilinear model, which is a matrix form linear model with the rank 1 constraint. A new online learning algorithm is proposed to train the model parameters. Our algorithm runs in the manner of online mirror descent, and gradients are computed by the power iteration. To analyze it, we give a new second order approximation of the squared spectral norm, which helps us to get a regret bound. Experiments on two sequential labelling tasks give positive results.'
volume: 37
URL: http://proceedings.mlr.press/v37/wua15.html
PDF: http://proceedings.mlr.press/v37/wua15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wua15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wu
given: Yuanbin
- family: Sun
given: Shiliang
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 890-898
id: wua15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 890
lastpage: 898
published: 2015-06-01 00:00:00 +0000
- title: 'Adaptive Belief Propagation'
abstract: 'Graphical models are widely used in inference problems. In practice, one may construct a single large-scale model to explain a phenomenon of interest, which may be utilized in a variety of settings. The latent variables of interest, which can differ in each setting, may only represent a small subset of all variables. The marginals of variables of interest may change after the addition of measurements at different time points. In such adaptive settings, naive algorithms, such as standard belief propagation (BP), may utilize many unnecessary computations by propagating messages over the entire graph. Here, we formulate an efficient inference procedure, termed adaptive BP (AdaBP), suitable for adaptive inference settings. We show that it gives exact results for trees in discrete and Gaussian Markov Random Fields (MRFs), and provide an extension to Gaussian loopy graphs. We also provide extensions on finding the most likely sequence of the entire latent graph. Lastly, we compare the proposed method to standard BP and to that of (Sumer et al., 2011), which tackles the same problem. We show in synthetic and real experiments that it outperforms standard BP by orders of magnitude and explore the settings that it is advantageous over (Sumer et al., 2011).'
volume: 37
URL: http://proceedings.mlr.press/v37/papachristoudis15.html
PDF: http://proceedings.mlr.press/v37/papachristoudis15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-papachristoudis15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Papachristoudis
given: Georgios
- family: Fisher
given: John
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 899-907
id: papachristoudis15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 899
lastpage: 907
published: 2015-06-01 00:00:00 +0000
- title: 'Large-scale log-determinant computation through stochastic Chebyshev expansions'
abstract: 'Logarithms of determinants of large positive definite matrices appear ubiquitously in machine learning applications including Gaussian graphical and Gaussian process models, partition functions of discrete graphical models, minimum-volume ellipsoids and metric and kernel learning. Log-determinant computation involves the Cholesky decomposition at the cost cubic in the number of variables (i.e., the matrix dimension), which makes it prohibitive for large-scale applications. We propose a linear-time randomized algorithm to approximate log-determinants for very large-scale positive definite and general non-singular matrices using a stochastic trace approximation, called the Hutchinson method, coupled with Chebyshev polynomial expansions that both rely on efficient matrix-vector multiplications. We establish rigorous additive and multiplicative approximation error bounds depending on the condition number of the input matrix. In our experiments, the proposed algorithm can provide very high accuracy solutions at orders of magnitude faster time than the Cholesky decomposition and Shur completion, and enables us to compute log-determinants of matrices involving tens of millions of variables.'
volume: 37
URL: http://proceedings.mlr.press/v37/hana15.html
PDF: http://proceedings.mlr.press/v37/hana15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hana15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Han
given: Insu
- family: Malioutov
given: Dmitry
- family: Shin
given: Jinwoo
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 908-917
id: hana15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 908
lastpage: 917
published: 2015-06-01 00:00:00 +0000
- title: 'Differentially Private Bayesian Optimization'
abstract: 'Bayesian optimization is a powerful tool for fine-tuning the hyper-parameters of a wide variety of machine learning models. The success of machine learning has led practitioners in diverse real-world settings to learn classifiers for practical problems. As machine learning becomes commonplace, Bayesian optimization becomes an attractive method for practitioners to automate the process of classifier hyper-parameter tuning. A key observation is that the data used for tuning models in these settings is often sensitive. Certain data such as genetic predisposition, personal email statistics, and car accident history, if not properly private, may be at risk of being inferred from Bayesian optimization outputs. To address this, we introduce methods for releasing the best hyper-parameters and classifier accuracy privately. Leveraging the strong theoretical guarantees of differential privacy and known Bayesian optimization convergence bounds, we prove that under a GP assumption these private quantities are often near-optimal. Finally, even if this assumption is not satisfied, we can use different smoothness guarantees to protect privacy.'
volume: 37
URL: http://proceedings.mlr.press/v37/kusnera15.html
PDF: http://proceedings.mlr.press/v37/kusnera15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-kusnera15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Kusner
given: Matt
- family: Gardner
given: Jacob
- family: Garnett
given: Roman
- family: Weinberger
given: Kilian
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 918-927
id: kusnera15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 918
lastpage: 927
published: 2015-06-01 00:00:00 +0000
- title: 'A Nearly-Linear Time Framework for Graph-Structured Sparsity'
abstract: 'We introduce a framework for sparsity structures defined via graphs. Our approach is flexible and generalizes several previously studied sparsity models. Moreover, we provide efficient projection algorithms for our sparsity model that run in nearly-linear time. In the context of sparse recovery, we show that our framework achieves an information-theoretically optimal sample complexity for a wide range of parameters. We complement our theoretical analysis with experiments demonstrating that our algorithms improve on prior work also in practice.'
volume: 37
URL: http://proceedings.mlr.press/v37/hegde15.html
PDF: http://proceedings.mlr.press/v37/hegde15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hegde15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hegde
given: Chinmay
- family: Indyk
given: Piotr
- family: Schmidt
given: Ludwig
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 928-937
id: hegde15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 928
lastpage: 937
published: 2015-06-01 00:00:00 +0000
- title: 'Support Matrix Machines'
abstract: 'In many classification problems such as electroencephalogram (EEG) classification and image classification, the input features are naturally represented as matrices rather than vectors or scalars. In general, the structure information of the original feature matrix is useful and informative for data analysis tasks such as classification. One typical structure information is the correlation between columns or rows in the feature matrix. To leverage this kind of structure information, we propose a new classification method that we call support matrix machine (SMM). Specifically, SMM is defined as a hinge loss plus a so-called spectral elastic net penalty which is a spectral extension of the conventional elastic net over a matrix. The spectral elastic net enjoys a property of grouping effect, i.e., strongly correlated columns or rows tend to be selected altogether or not. Since the optimization problem for SMM is convex, this encourages us to devise an alternating direction method of multipliers algorithm for solving the problem. Experimental results on EEG and face image classification data show that our model is more robust and efficient than the state-of-the-art methods.'
volume: 37
URL: http://proceedings.mlr.press/v37/luo15.html
PDF: http://proceedings.mlr.press/v37/luo15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-luo15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Luo
given: Luo
- family: Xie
given: Yubo
- family: Zhang
given: Zhihua
- family: Li
given: Wu-Jun
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 938-947
id: luo15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 938
lastpage: 947
published: 2015-06-01 00:00:00 +0000
- title: 'Rademacher Observations, Private Data, and Boosting'
abstract: 'The minimization of the logistic loss is a popular approach to batch supervised learning. Our paper starts from the surprising observation that, when fitting linear classifiers, the minimization of the logistic loss is \textitequivalent to the minimization of an exponential \textitrado-loss computed (i) over transformed data that we call Rademacher observations (rados), and (ii) over the \textitsame classifier as the one of the logistic loss. Thus, a classifier learnt from rados can be \textitdirectly used to classify \textitobservations. We provide a learning algorithm over rados with boosting-compliant convergence rates on the \textitlogistic loss (computed over examples). Experiments on domains with up to millions of examples, backed up by theoretical arguments, display that learning over a small set of random rados can challenge the state of the art that learns over the \textitcomplete set of examples. We show that rados comply with various privacy requirements that make them good candidates for machine learning in a privacy framework. We give several algebraic, geometric and computational hardness results on reconstructing examples from rados. We also show how it is possible to craft, and efficiently learn from, rados in a differential privacy framework. Tests reveal that learning from differentially private rados brings non-trivial privacy vs accuracy tradeoffs.'
volume: 37
URL: http://proceedings.mlr.press/v37/nock15.html
PDF: http://proceedings.mlr.press/v37/nock15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-nock15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Nock
given: Richard
- family: Patrini
given: Giorgio
- family: Friedman
given: Arik
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 948-956
id: nock15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 948
lastpage: 956
published: 2015-06-01 00:00:00 +0000
- title: 'From Word Embeddings To Document Distances'
abstract: 'We present the Word Mover’s Distance (WMD), a novel distance function between text documents. Our work is based on recent results in word embeddings that learn semantically meaningful representations for words from local co-occurrences in sentences. The WMD distance measures the dissimilarity between two text documents as the minimum amount of distance that the embedded words of one document need to "travel" to reach the embedded words of another document. We show that this distance metric can be cast as an instance of the Earth Mover’s Distance, a well studied transportation problem for which several highly efficient solvers have been developed. Our metric has no hyperparameters and is straight-forward to implement. Further, we demonstrate on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the WMD metric leads to unprecedented low k-nearest neighbor document classification error rates.'
volume: 37
URL: http://proceedings.mlr.press/v37/kusnerb15.html
PDF: http://proceedings.mlr.press/v37/kusnerb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-kusnerb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Kusner
given: Matt
- family: Sun
given: Yu
- family: Kolkin
given: Nicholas
- family: Weinberger
given: Kilian
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 957-966
id: kusnerb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 957
lastpage: 966
published: 2015-06-01 00:00:00 +0000
- title: 'Bayesian and Empirical Bayesian Forests'
abstract: 'We derive ensembles of decision trees through a nonparametric Bayesian model, allowing us to view such ensembles as samples from a posterior distribution. This insight motivates a class of Bayesian Forest (BF) algorithms that provide small gains in performance and large gains in interpretability. Based on the BF framework, we are able to show that high-level tree hierarchy is stable in large samples. This motivates an empirical Bayesian Forest (EBF) algorithm for building approximate BFs on massive distributed datasets and we show that EBFs outperform sub-sampling based alternatives by a large margin.'
volume: 37
URL: http://proceedings.mlr.press/v37/matthew15.html
PDF: http://proceedings.mlr.press/v37/matthew15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-matthew15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Matthew
given: Taddy
- family: Chen
given: Chun-Sheng
- family: Yu
given: Jun
- family: Wyle
given: Mitch
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 967-976
id: matthew15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 967
lastpage: 976
published: 2015-06-01 00:00:00 +0000
- title: 'Inferring Graphs from Cascades: A Sparse Recovery Framework'
abstract: 'In the Graph Inference problem, one seeks to recover the edges of an unknown graph from the observations of cascades propagating over this graph. In this paper, we approach this problem from the sparse recovery perspective. We introduce a general model of cascades, including the voter model and the independent cascade model, for which we provide the first algorithm which recovers the graph’s edges with high probability and O(s log m) measurements where s is the maximum degree of the graph and m is the number of nodes. Furthermore, we show that our algorithm also recovers the edge weights (the parameters of the diffusion process) and is robust in the context of approximate sparsity. Finally we prove an almost matching lower bound of Ω(s \log m/s) and validate our approach empirically on synthetic graphs.'
volume: 37
URL: http://proceedings.mlr.press/v37/pouget-abadie15.html
PDF: http://proceedings.mlr.press/v37/pouget-abadie15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-pouget-abadie15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Pouget-Abadie
given: Jean
- family: Horel
given: Thibaut
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 977-986
id: pouget-abadie15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 977
lastpage: 986
published: 2015-06-01 00:00:00 +0000
- title: 'Distributed Box-Constrained Quadratic Optimization for Dual Linear SVM'
abstract: 'Training machine learning models sometimes needs to be done on large amounts of data that exceed the capacity of a single machine, motivating recent works on developing algorithms that train in a distributed fashion. This paper proposes an efficient box-constrained quadratic optimization algorithm for distributedly training linear support vector machines (SVMs) with large data. Our key technical contribution is an analytical solution to the problem of computing the optimal step size at each iteration, using an efficient method that requires only O(1) communication cost to ensure fast convergence. With this optimal step size, our approach is superior to other methods by possessing global linear convergence, or, equivalently, O(\log(1/ε)) iteration complexity for an epsilon-accurate solution, for distributedly solving the non-strongly-convex linear SVM dual problem. Experiments also show that our method is significantly faster than state-of- the-art distributed linear SVM algorithms including DSVM-AVE, DisDCA and TRON.'
volume: 37
URL: http://proceedings.mlr.press/v37/leea15.html
PDF: http://proceedings.mlr.press/v37/leea15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-leea15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lee
given: Ching-Pei
- family: Roth
given: Dan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 987-996
id: leea15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 987
lastpage: 996
published: 2015-06-01 00:00:00 +0000
- title: 'Safe Exploration for Optimization with Gaussian Processes'
abstract: 'We consider sequential decision problems under uncertainty, where we seek to optimize an unknown function from noisy samples. This requires balancing exploration (learning about the objective) and exploitation (localizing the maximum), a problem well-studied in the multi-armed bandit literature. In many applications, however, we require that the sampled function values exceed some prespecified "safety" threshold, a requirement that existing algorithms fail to meet. Examples include medical applications where patient comfort must be guaranteed, recommender systems aiming to avoid user dissatisfaction, and robotic control, where one seeks to avoid controls causing physical harm to the platform. We tackle this novel, yet rich, set of problems under the assumption that the unknown function satisfies regularity conditions expressed via a Gaussian process prior. We develop an efficient algorithm called SafeOpt, and theoretically guarantee its convergence to a natural notion of optimum reachable under safety constraints. We evaluate SafeOpt on synthetic data, as well as two real applications: movie recommendation, and therapeutic spinal cord stimulation.'
volume: 37
URL: http://proceedings.mlr.press/v37/sui15.html
PDF: http://proceedings.mlr.press/v37/sui15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-sui15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sui
given: Yanan
- family: Gotovos
given: Alkis
- family: Burdick
given: Joel
- family: Krause
given: Andreas
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 997-1005
id: sui15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 997
lastpage: 1005
published: 2015-06-01 00:00:00 +0000
- title: 'The Ladder: A Reliable Leaderboard for Machine Learning Competitions'
abstract: 'The organizer of a machine learning competition faces the problem of maintaining an accurate leaderboard that faithfully represents the quality of the best submission of each competing team. What makes this estimation problem particularly challenging is its sequential and adaptive nature. As participants are allowed to repeatedly evaluate their submissions on the leaderboard, they may begin to overfit to the holdout data that supports the leaderboard. Few theoretical results give actionable advice on how to design a reliable leaderboard. Existing approaches therefore often resort to poorly understood heuristics such as limiting the bit precision of answers and the rate of re-submission. In this work, we introduce a notion of leaderboard accuracy tailored to the format of a competition. We introduce a natural algorithm called the Ladder and demonstrate that it simultaneously supports strong theoretical guarantees in a fully adaptive model of estimation, withstands practical adversarial attacks, and achieves high utility on real submission files from a Kaggle competition. Notably, we are able to sidestep a powerful recent hardness result for adaptive risk estimation that rules out algorithms such as ours under a seemingly very similar notion of accuracy. On a practical note, we provide a completely parameter-free variant of our algorithm that can be deployed in a real competition with no tuning required whatsoever.'
volume: 37
URL: http://proceedings.mlr.press/v37/blum15.html
PDF: http://proceedings.mlr.press/v37/blum15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-blum15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Blum
given: Avrim
- family: Hardt
given: Moritz
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1006-1014
id: blum15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1006
lastpage: 1014
published: 2015-06-01 00:00:00 +0000
- title: 'Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE)'
abstract: 'In applications of Gaussian processes where quantification of uncertainty is of primary interest, it is necessary to accurately characterize the posterior distribution over covariance parameters. This paper proposes an adaptation of the Stochastic Gradient Langevin Dynamics algorithm to draw samples from the posterior distribution over covariance parameters with negligible bias and without the need to compute the marginal likelihood. In Gaussian process regression, this has the enormous advantage that stochastic gradients can be computed by solving linear systems only. A novel unbiased linear systems solver based on parallelizable covariance matrix-vector products is developed to accelerate the unbiased estimation of gradients. The results demonstrate the possibility to enable scalable and exact (in a Monte Carlo sense) quantification of uncertainty in Gaussian processes without imposing any special structure on the covariance or reducing the number of input vectors.'
volume: 37
URL: http://proceedings.mlr.press/v37/filippone15.html
PDF: http://proceedings.mlr.press/v37/filippone15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-filippone15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Filippone
given: Maurizio
- family: Engler
given: Raphael
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1015-1024
id: filippone15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1015
lastpage: 1024
published: 2015-06-01 00:00:00 +0000
- title: 'Finding Galaxies in the Shadows of Quasars with Gaussian Processes'
abstract: 'We develop an automated technique for detecting damped Lyman-αabsorbers (DLAs) along spectroscopic sightlines to quasi-stellar objects (QSOs or quasars). The detection of DLAs in large-scale spectroscopic surveys such as SDSS–III is critical to address outstanding cosmological questions, such as the nature of galaxy formation. We use nearly 50000 QSO spectra to learn a tailored Gaussian process model for quasar emission spectra, which we apply to the DLA detection problem via Bayesian model selection. We demonstrate our method’s effectiveness with a large-scale validation experiment on over 100000 spectra, with excellent performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/garnett15.html
PDF: http://proceedings.mlr.press/v37/garnett15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-garnett15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Garnett
given: Roman
- family: Ho
given: Shirley
- family: Schneider
given: Jeff
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1025-1033
id: garnett15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1025
lastpage: 1033
published: 2015-06-01 00:00:00 +0000
- title: 'Following the Perturbed Leader for Online Structured Learning'
abstract: 'We investigate a new Follow the Perturbed Leader (FTPL) algorithm for online structured prediction problems. We show a regret bound which is comparable to the state of the art of FTPL algorithms and is comparable with the best possible regret in some cases. To better understand FTPL algorithms for online structured learning, we present a lower bound on the regret for a large and natural class of FTPL algorithms that use logconcave perturbations. We complete our investigation with an online shortest path experiment and empirically show that our algorithm is both statistically and computationally efficient.'
volume: 37
URL: http://proceedings.mlr.press/v37/cohena15.html
PDF: http://proceedings.mlr.press/v37/cohena15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-cohena15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Cohen
given: Alon
- family: Hazan
given: Tamir
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1034-1042
id: cohena15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1034
lastpage: 1042
published: 2015-06-01 00:00:00 +0000
- title: 'Reified Context Models'
abstract: 'A classic tension exists between exact inference in a simple model and approximate inference in a complex model. The latter offers expressivity and thus accuracy, but the former provides coverage of the space, an important property for confidence estimation and learning with indirect supervision. In this work, we introduce a new approach, reified context models, to reconcile this tension. Specifically, we let the choice of factors in a graphical model (the contexts) be random variables inside the model itself. In this sense, the contexts are reified and can be chosen in a data-dependent way. Empirically, we show that our approach obtains expressivity and coverage on three sequence modeling tasks.'
volume: 37
URL: http://proceedings.mlr.press/v37/steinhardta15.html
PDF: http://proceedings.mlr.press/v37/steinhardta15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-steinhardta15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Steinhardt
given: Jacob
- family: Liang
given: Percy
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1043-1052
id: steinhardta15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1043
lastpage: 1052
published: 2015-06-01 00:00:00 +0000
- title: 'Large-Scale Markov Decision Problems with KL Control Cost and its Application to Crowdsourcing'
abstract: 'We study average and total cost Markov decision problems with large state spaces. Since the computational and statistical costs of finding the optimal policy scale with the size of the state space, we focus on searching for near-optimality in a low-dimensional family of policies. In particular, we show that for problems with a Kullback-Leibler divergence cost function, we can reduce policy optimization to a convex optimization and solve it approximately using a stochastic subgradient algorithm. We show that the performance of the resulting policy is close to the best in the low-dimensional family. We demonstrate the efficacy of our approach by controlling the important crowdsourcing application of budget allocation in crowd labeling.'
volume: 37
URL: http://proceedings.mlr.press/v37/abbasi-yadkori15.html
PDF: http://proceedings.mlr.press/v37/abbasi-yadkori15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-abbasi-yadkori15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Abbasi-Yadkori
given: Yasin
- family: Bartlett
given: Peter
- family: Chen
given: Xi
- family: Malek
given: Alan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1053-1062
id: abbasi-yadkori15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1053
lastpage: 1062
published: 2015-06-01 00:00:00 +0000
- title: 'Learning Fast-Mixing Models for Structured Prediction'
abstract: 'Markov Chain Monte Carlo (MCMC) algorithms are often used for approximate inference inside learning, but their slow mixing can be difficult to diagnose and the resulting approximate gradients can seriously degrade learning. To alleviate these issues, we define a new model family using strong Doeblin Markov chains, whose mixing times can be precisely controlled by a parameter. We also develop an algorithm to learn such models, which involves maximizing the data likelihood under the induced stationary distribution of these chains. We show empirical improvements on two challenging inference tasks.'
volume: 37
URL: http://proceedings.mlr.press/v37/steinhardtb15.html
PDF: http://proceedings.mlr.press/v37/steinhardtb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-steinhardtb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Steinhardt
given: Jacob
- family: Liang
given: Percy
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1063-1072
id: steinhardtb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1063
lastpage: 1072
published: 2015-06-01 00:00:00 +0000
- title: 'A Probabilistic Model for Dirty Multi-task Feature Selection'
abstract: 'Multi-task feature selection methods often make the hypothesis that learning tasks share relevant and irrelevant features. However, this hypothesis may be too restrictive in practice. For example, there may be a few tasks with specific relevant and irrelevant features (outlier tasks). Similarly, a few of the features may be relevant for only some of the tasks (outlier features). To account for this, we propose a model for multi-task feature selection based on a robust prior distribution that introduces a set of binary latent variables to identify outlier tasks and outlier features. Expectation propagation can be used for efficient approximate inference under the proposed prior. Several experiments show that a model based on the new robust prior provides better predictive performance than other benchmark methods.'
volume: 37
URL: http://proceedings.mlr.press/v37/hernandez-lobatoa15.html
PDF: http://proceedings.mlr.press/v37/hernandez-lobatoa15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hernandez-lobatoa15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hernandez-Lobato
given: Daniel
- family: Hernandez-Lobato
given: Jose Miguel
- family: Ghahramani
given: Zoubin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1073-1082
id: hernandez-lobatoa15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1073
lastpage: 1082
published: 2015-06-01 00:00:00 +0000
- title: 'On Deep Multi-View Representation Learning'
abstract: 'We consider learning representations (features) in the setting in which we have access to multiple unlabeled views of the data for representation learning while only one view is available at test time. Previous work on this problem has proposed several techniques based on deep neural networks, typically involving either autoencoder-like networks with a reconstruction objective or paired feedforward networks with a correlation-based objective. We analyze several techniques based on prior work, as well as new variants, and compare them experimentally on visual, speech, and language domains. To our knowledge this is the first head-to-head comparison of a variety of such techniques on multiple tasks. We find an advantage for correlation-based representation learning, while the best results on most tasks are obtained with our new variant, deep canonically correlated autoencoders (DCCAE).'
volume: 37
URL: http://proceedings.mlr.press/v37/wangb15.html
PDF: http://proceedings.mlr.press/v37/wangb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wangb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wang
given: Weiran
- family: Arora
given: Raman
- family: Livescu
given: Karen
- family: Bilmes
given: Jeff
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1083-1092
id: wangb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1083
lastpage: 1092
published: 2015-06-01 00:00:00 +0000
- title: 'Learning Program Embeddings to Propagate Feedback on Student Code'
abstract: 'Providing feedback, both assessing final work and giving hints to stuck students, is difficult for open-ended assignments in massive online classes which can range from thousands to millions of students. We introduce a neural network method to encode programs as a linear mapping from an embedded precondition space to an embedded postcondition space and propose an algorithm for feedback at scale using these linear maps as features. We apply our algorithm to assessments from the Code.org Hour of Code and Stanford University’s CS1 course, where we propagate human comments on student assignments to orders of magnitude more submissions.'
volume: 37
URL: http://proceedings.mlr.press/v37/piech15.html
PDF: http://proceedings.mlr.press/v37/piech15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-piech15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Piech
given: Chris
- family: Huang
given: Jonathan
- family: Nguyen
given: Andy
- family: Phulsuksombati
given: Mike
- family: Sahami
given: Mehran
- family: Guibas
given: Leonidas
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1093-1102
id: piech15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1093
lastpage: 1102
published: 2015-06-01 00:00:00 +0000
- title: 'Safe Subspace Screening for Nuclear Norm Regularized Least Squares Problems'
abstract: 'Nuclear norm regularization has been shown very promising for pursing a low rank matrix solution in various machine learning problems. Many efforts have been devoted to develop efficient algorithms for solving the optimization problem in nuclear norm regularization. Solving it for large-scale matrix variables, however, is still a challenging task since the complexity grows fast with the size of matrix variable. In this work, we propose a novel method called safe subspace screening (SSS), to improve the efficiency of the solver for nuclear norm regularized least squares problems. Motivated by the fact that the low rank solution can be represented by a few subspaces, the proposed method accurately discards a predominant percentage of inactive subspaces prior to solving the problem to reduce problem size. Consequently, a much smaller problem is required to solve, making it more efficient than optimizing the original problem. The proposed SSS is safe, in that its solution is identical to the solution from the solver. In addition, the proposed SSS can be used together with any existing nuclear norm solver since it is independent of the solver. Extensive results on several synthetic and real data sets show that the proposed SSS is very effective in inactive subspace screening.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhoua15.html
PDF: http://proceedings.mlr.press/v37/zhoua15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhoua15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhou
given: Qiang
- family: Zhao
given: Qi
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1103-1112
id: zhoua15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1103
lastpage: 1112
published: 2015-06-01 00:00:00 +0000
- title: 'Efficient Learning in Large-Scale Combinatorial Semi-Bandits'
abstract: 'A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to combinatorial constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we consider efficient learning in large-scale combinatorial semi-bandits with linear generalization, and as a solution, propose two learning algorithms called Combinatorial Linear Thompson Sampling (CombLinTS) and Combinatorial Linear UCB (CombLinUCB). Both algorithms are computationally efficient as long as the offline version of the combinatorial problem can be solved efficiently. We establish that CombLinTS and CombLinUCB are also provably statistically efficient under reasonable assumptions, by developing regret bounds that are independent of the problem scale (number of items) and sublinear in time. We also evaluate CombLinTS on a variety of problems with thousands of items. Our experiment results demonstrate that CombLinTS is scalable, robust to the choice of algorithm parameters, and significantly outperforms the best of our baselines.'
volume: 37
URL: http://proceedings.mlr.press/v37/wen15.html
PDF: http://proceedings.mlr.press/v37/wen15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wen15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wen
given: Zheng
- family: Kveton
given: Branislav
- family: Ashkan
given: Azin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1113-1122
id: wen15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1113
lastpage: 1122
published: 2015-06-01 00:00:00 +0000
- title: 'Swept Approximate Message Passing for Sparse Estimation'
abstract: 'Approximate Message Passing (AMP) has been shown to be a superior method for inference problems, such as the recovery of signals from sets of noisy, lower-dimensionality measurements, both in terms of reconstruction accuracy and in computational efficiency. However, AMP suffers from serious convergence issues in contexts that do not exactly match its assumptions. We propose a new approach to stabilizing AMP in these contexts by applying AMP updates to individual coefficients rather than in parallel. Our results show that this change to the AMP iteration can provide theoretically expected, but hitherto unobtainable, performance for problems on which the standard AMP iteration diverges. Additionally, we find that the computational costs of this swept coefficient update scheme is not unduly burdensome, allowing it to be applied efficiently to signals of large dimensionality.'
volume: 37
URL: http://proceedings.mlr.press/v37/manoel15.html
PDF: http://proceedings.mlr.press/v37/manoel15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-manoel15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Manoel
given: Andre
- family: Krzakala
given: Florent
- family: Tramel
given: Eric
- family: Zdeborovà
given: Lenka
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1123-1132
id: manoel15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1123
lastpage: 1132
published: 2015-06-01 00:00:00 +0000
- title: 'Simple regret for infinitely many armed bandits'
abstract: 'We consider a stochastic bandit problem with infinitely many arms. In this setting, the learner has no chance of trying all the arms even once and has to dedicate its limited number of samples only to a certain number of arms. All previous algorithms for this setting were designed for minimizing the cumulative regret of the learner. In this paper, we propose an algorithm aiming at minimizing the simple regret. As in the cumulative regret setting of infinitely many armed bandits, the rate of the simple regret will depend on a parameter βcharacterizing the distribution of the near-optimal arms. We prove that depending on β, our algorithm is minimax optimal either up to a multiplicative constant or up to a \log(n) factor. We also provide extensions to several important cases: when βis unknown, in a natural setting where the near-optimal arms have a small variance, and in the case of unknown time horizon.'
volume: 37
URL: http://proceedings.mlr.press/v37/carpentier15.html
PDF: http://proceedings.mlr.press/v37/carpentier15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-carpentier15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Carpentier
given: Alexandra
- family: Valko
given: Michal
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1133-1141
id: carpentier15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1133
lastpage: 1141
published: 2015-06-01 00:00:00 +0000
- title: 'Exponential Integration for Hamiltonian Monte Carlo'
abstract: 'We investigate numerical integration of ordinary differential equations (ODEs) for Hamiltonian Monte Carlo (HMC). High-quality integration is crucial for designing efficient and effective proposals for HMC. While the standard method is leapfrog (Stormer-Verlet) integration, we propose the use of an exponential integrator, which is robust to stiff ODEs with highly-oscillatory components. This oscillation is difficult to reproduce using leapfrog integration, even with carefully selected integration parameters and preconditioning. Concretely, we use a Gaussian distribution approximation to segregate stiff components of the ODE. We integrate this term analytically for stability and account for deviation from the approximation using variation of constants. We consider various ways to derive Gaussian approximations and conduct extensive empirical studies applying the proposed “exponential HMC” to several benchmarked learning problems. We compare to state-of-the-art methods for improving leapfrog HMC and demonstrate the advantages of our method in generating many effective samples with high acceptance rates in short running times.'
volume: 37
URL: http://proceedings.mlr.press/v37/chao15.html
PDF: http://proceedings.mlr.press/v37/chao15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-chao15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Chao
given: Wei-Lun
- family: Solomon
given: Justin
- family: Michels
given: Dominik
- family: Sha
given: Fei
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1142-1151
id: chao15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1142
lastpage: 1151
published: 2015-06-01 00:00:00 +0000
- title: 'Optimal Regret Analysis of Thompson Sampling in Stochastic Multi-armed Bandit Problem with Multiple Plays'
abstract: 'We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a Bayesian spirit, has attracted much attention for its empirically excellent performance, and it is revealed to have an optimal regret bound in the standard single-play MAB problem. In this paper, we propose the multiple-play Thompson sampling (MP-TS) algorithm, an extension of TS to the multiple-play MAB problem, and discuss its regret analysis. We prove that MP-TS has the optimal regret upper bound that matches the regret lower bound provided by Anantharam et al.\,(1987). Therefore, MP-TS is the first computationally efficient algorithm with optimal regret. A set of computer simulations was also conducted, which compared MP-TS with state-of-the-art algorithms. We also propose a modification of MP-TS, which is shown to have better empirical performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/komiyama15.html
PDF: http://proceedings.mlr.press/v37/komiyama15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-komiyama15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Komiyama
given: Junpei
- family: Honda
given: Junya
- family: Nakagawa
given: Hiroshi
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1152-1161
id: komiyama15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1152
lastpage: 1161
published: 2015-06-01 00:00:00 +0000
- title: 'Faster cover trees'
abstract: 'The cover tree data structure speeds up exact nearest neighbor queries over arbitrary metric spaces. This paper makes cover trees even faster. In particular, we provide (1) a simpler definition of the cover tree that reduces the number of nodes from O(n) to exactly n, (2) an additional invariant that makes queries faster in practice, (3) algorithms for constructing and querying the tree in parallel on multiprocessor systems, and (4) a more cache efficient memory layout. On standard benchmark datasets, we reduce the number of distance computations by 10–50%. On a large-scale bioinformatics dataset, we reduce the number of distance computations by 71%. On a large-scale image dataset, our parallel algorithm with 16 cores reduces tree construction time from 3.5 hours to 12 minutes.'
volume: 37
URL: http://proceedings.mlr.press/v37/izbicki15.html
PDF: http://proceedings.mlr.press/v37/izbicki15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-izbicki15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Izbicki
given: Mike
- family: Shelton
given: Christian
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1162-1170
id: izbicki15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1162
lastpage: 1170
published: 2015-06-01 00:00:00 +0000
- title: 'Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization'
abstract: 'By reducing optimization to a sequence of small subproblems, working set methods achieve fast convergence times for many challenging problems. Despite excellent performance, theoretical understanding of working sets is limited, and implementations often resort to heuristics to determine subproblem size, makeup, and stopping criteria. We propose Blitz, a fast working set algorithm accompanied by useful guarantees. Making no assumptions on data, our theory relates subproblem size to progress toward convergence. This result motivates methods for optimizing algorithmic parameters and discarding irrelevant variables as iterations progress. Applied to L1-regularized learning, Blitz convincingly outperforms existing solvers in sequential, limited-memory, and distributed settings. Blitz is not specific to L1-regularized learning, making the algorithm relevant to many applications involving sparsity or constraints.'
volume: 37
URL: http://proceedings.mlr.press/v37/johnson15.html
PDF: http://proceedings.mlr.press/v37/johnson15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-johnson15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Johnson
given: Tyler
- family: Guestrin
given: Carlos
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1171-1179
id: johnson15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1171
lastpage: 1179
published: 2015-06-01 00:00:00 +0000
- title: 'Unsupervised Domain Adaptation by Backpropagation'
abstract: 'Top-performing deep architectures are trained on massive amounts of labeled data. In the absence of labeled data for a certain task, domain adaptation often provides an attractive option given that labeled data of similar nature but from a different domain (e.g. synthetic images) are available. Here, we propose a new approach to domain adaptation in deep architectures that can be trained on large amount of labeled data from the source domain and large amount of unlabeled data from the target domain (no labeled target-domain data is necessary). As the training progresses, the approach promotes the emergence of "deep" features that are (i) discriminative for the main learning task on the source domain and (ii) invariant with respect to the shift between the domains. We show that this adaptation behaviour can be achieved in almost any feed-forward model by augmenting it with few standard layers and a simple new gradient reversal layer. The resulting augmented architecture can be trained using standard backpropagation. Overall, the approach can be implemented with little effort using any of the deep-learning packages. The method performs very well in a series of image classification experiments, achieving adaptation effect in the presence of big domain shifts and outperforming previous state-of-the-art on Office datasets.'
volume: 37
URL: http://proceedings.mlr.press/v37/ganin15.html
PDF: http://proceedings.mlr.press/v37/ganin15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ganin15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ganin
given: Yaroslav
- family: Lempitsky
given: Victor
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1180-1189
id: ganin15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1180
lastpage: 1189
published: 2015-06-01 00:00:00 +0000
- title: 'Non-Linear Cross-Domain Collaborative Filtering via Hyper-Structure Transfer'
abstract: 'The Cross Domain Collaborative Filtering (CDCF) exploits the rating matrices from multiple domains to make better recommendations. Existing CDCF methods adopt the sub-structure sharing technique that can only transfer linearly correlated knowledge between domains. In this paper, we propose the notion of Hyper-Structure Transfer (HST) that requires the rating matrices to be explained by the projections of some more complex structure, called the hyper-structure, shared by all domains, and thus allows the non-linearly correlated knowledge between domains to be identified and transferred. Extensive experiments are conducted and the results demonstrate the effectiveness of our HST models empirically.'
volume: 37
URL: http://proceedings.mlr.press/v37/liua15.html
PDF: http://proceedings.mlr.press/v37/liua15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-liua15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Liu
given: Yan-Fu
- family: Hsu
given: Cheng-Yu
- family: Wu
given: Shan-Hung
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1190-1198
id: liua15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1190
lastpage: 1198
published: 2015-06-01 00:00:00 +0000
- title: 'Manifold-valued Dirichlet Processes'
abstract: 'Statistical models for manifold-valued data permit capturing the intrinsic nature of the curved spaces in which the data lie and have been a topic of research for several decades. Typically, these formulations use geodesic curves and distances defined locally for most cases - this makes it hard to design parametric models globally on smooth manifolds. Thus, most (manifold specific) parametric models available today assume that the data lie in a small neighborhood on the manifold. To address this ’locality’ problem, we propose a novel nonparametric model which unifies multivariate general linear models (MGLMs) using multiple tangent spaces. Our framework generalizes existing work on (both Euclidean and non-Euclidean) general linear models providing a recipe to globally extend the locally-defined parametric models (using a mixture of local models). By grouping observations into sub-populations at multiple tangent spaces, our method provides insights into the hidden structure (geodesic relationships) in the data. This yields a framework to group observations and discover geodesic relationships between covariates X and manifold-valued responses Y, which we call Dirichlet process mixtures of multivariate general linear models (DP-MGLM) on Riemannian manifolds. Finally, we present proof of concept experiments to validate our model.'
volume: 37
URL: http://proceedings.mlr.press/v37/kim15.html
PDF: http://proceedings.mlr.press/v37/kim15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-kim15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Kim
given: Hyunwoo
- family: Xu
given: Jia
- family: Vemuri
given: Baba
- family: Singh
given: Vikas
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1199-1208
id: kim15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1199
lastpage: 1208
published: 2015-06-01 00:00:00 +0000
- title: 'Multi-Task Learning for Subspace Segmentation'
abstract: 'Subspace segmentation is the process of clustering a set of data points that are assumed to lie on the union of multiple linear or affine subspaces, and is increasingly being recognized as a fundamental tool for data analysis in high dimensional settings. Arguably one of the most successful approaches is based on the observation that the sparsest representation of a given point with respect to a dictionary formed by the others involves nonzero coefficients associated with points originating in the same subspace. Such sparse representations are computed independently for each data point via \ell_1-norm minimization and then combined into an affinity matrix for use by a final spectral clustering step. The downside of this procedure is two-fold. First, unlike canonical compressive sensing scenarios with ideally-randomized dictionaries, the data-dependent dictionaries here are unavoidably highly structured, disrupting many of the favorable properties of the \ell_1 norm. Secondly, by treating each data point independently, we ignore useful relationships between points that can be leveraged for jointly computing such sparse representations. Consequently, we motivate a multi-task learning-based framework for learning coupled sparse representations leading to a segmentation pipeline that is both robust against correlation structure and tailored to generate an optimal affinity matrix. Theoretical analysis and empirical tests are provided to support these claims.'
volume: 37
URL: http://proceedings.mlr.press/v37/wangc15.html
PDF: http://proceedings.mlr.press/v37/wangc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wangc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wang
given: Yu
- family: Wipf
given: David
- family: Ling
given: Qing
- family: Chen
given: Wei
- family: Wassell
given: Ian
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1209-1217
id: wangc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1209
lastpage: 1217
published: 2015-06-01 00:00:00 +0000
- title: 'Markov Chain Monte Carlo and Variational Inference: Bridging the Gap'
abstract: 'Recent advances in stochastic gradient variational inference have made it possible to perform variational Bayesian inference with posterior approximations containing auxiliary random variables. This enables us to explore a new synthesis of variational inference and Monte Carlo methods where we incorporate one or more steps of MCMC into our variational approximation. By doing so we obtain a rich class of inference algorithms bridging the gap between variational methods and MCMC, and offering the best of both worlds: fast posterior approximation through the maximization of an explicit objective, with the option of trading off additional computation for additional accuracy. We describe the theoretical foundations that make this possible and show some promising first results.'
volume: 37
URL: http://proceedings.mlr.press/v37/salimans15.html
PDF: http://proceedings.mlr.press/v37/salimans15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-salimans15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Salimans
given: Tim
- family: Kingma
given: Diederik
- family: Welling
given: Max
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1218-1226
id: salimans15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1218
lastpage: 1226
published: 2015-06-01 00:00:00 +0000
- title: 'Scalable Model Selection for Large-Scale Factorial Relational Models'
abstract: 'With a growing need to understand large-scale networks, factorial relational models, such as binary matrix factorization models (BMFs), have become important in many applications. Although BMFs have a natural capability to uncover overlapping group structures behind network data, existing inference techniques have issues of either high computational cost or lack of model selection capability, and this limits their applicability. For scalable model selection of BMFs, this paper proposes stochastic factorized asymptotic Bayesian (sFAB) inference that combines concepts in two recently-developed techniques: stochastic variational inference (SVI) and FAB inference. sFAB is a highly-efficient algorithm, having both scalability and an inherent model selection capability in a single inference framework. Empirical results show the superiority of sFAB/BMF in both accuracy and scalability over state-of-the-art inference methods for overlapping relational models.'
volume: 37
URL: http://proceedings.mlr.press/v37/liub15.html
PDF: http://proceedings.mlr.press/v37/liub15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-liub15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Liu
given: Chunchen
- family: Feng
given: Lu
- family: Fujimaki
given: Ryohei
- family: Muraoka
given: Yusuke
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1227-1235
id: liub15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1227
lastpage: 1235
published: 2015-06-01 00:00:00 +0000
- title: 'The Power of Randomization: Distributed Submodular Maximization on Massive Datasets'
abstract: 'A wide variety of problems in machine learning, including exemplar clustering, document summarization, and sensor placement, can be cast as constrained submodular maximization problems. Unfortunately, the resulting submodular optimization problems are often too large to be solved on a single machine. We consider a distributed, greedy algorithm that combines previous approaches with randomization. The result is an algorithm that is embarrassingly parallel and achieves provable, constant factor, worst-case approximation guarantees. In our experiments, we demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.'
volume: 37
URL: http://proceedings.mlr.press/v37/barbosa15.html
PDF: http://proceedings.mlr.press/v37/barbosa15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-barbosa15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Barbosa
given: Rafael
- family: Ene
given: Alina
- family: Nguyen
given: Huy
- family: Ward
given: Justin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1236-1244
id: barbosa15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1236
lastpage: 1244
published: 2015-06-01 00:00:00 +0000
- title: 'Dealing with small data: On the generalization of context trees'
abstract: 'Context trees (CT) are a widely used tool in machine learning for representing context-specific independences in conditional probability distributions. Parsimonious context trees (PCTs) are a recently proposed generalization of CTs that can enable statistically more efficient learning due to a higher structural flexibility, which is particularly useful for small-data settings. However, this comes at the cost of a computationally expensive structure learning algorithm, which is feasible only for domains with small alphabets and tree depths. In this work, we investigate to which degree CTs can be generalized to increase statistical efficiency while still keeping the learning computationally feasible. Approaching this goal from two different angles, we (i) propose algorithmic improvements to the PCT learning algorithm, and (ii) study further generalizations of CTs, which are inspired by PCTs, but trade structural flexibility for computational efficiency. By empirical studies both on simulated and real-world data, we demonstrate that the synergy of combining of both orthogonal approaches yields a substantial improvement in obtaining statistically efficient and computationally feasible generalizations of CTs.'
volume: 37
URL: http://proceedings.mlr.press/v37/eggeling15.html
PDF: http://proceedings.mlr.press/v37/eggeling15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-eggeling15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Eggeling
given: Ralf
- family: Koivisto
given: Mikko
- family: Grosse
given: Ivo
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1245-1253
id: eggeling15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1245
lastpage: 1253
published: 2015-06-01 00:00:00 +0000
- title: 'Non-Gaussian Discriminative Factor Models via the Max-Margin Rank-Likelihood'
abstract: 'We consider the problem of discriminative factor analysis for data that are in general non-Gaussian. A Bayesian model based on the ranks of the data is proposed. We first introduce a max-margin version of the rank-likelihood. A discriminative factor model is then developed, integrating the new max-margin rank-likelihood and (linear) Bayesian support vector machines, which are also built on the max-margin principle. The discriminative factor model is further extended to the nonlinear case through mixtures of local linear classifiers, via Dirichlet processes. Fully local conjugacy of the model yields efficient inference with both Markov Chain Monte Carlo and variational Bayes approaches. Extensive experiments on benchmark and real data demonstrate superior performance of the proposed model and its potential for applications in computational biology.'
volume: 37
URL: http://proceedings.mlr.press/v37/yuan15.html
PDF: http://proceedings.mlr.press/v37/yuan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yuan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yuan
given: Xin
- family: Henao
given: Ricardo
- family: Tsalik
given: Ephraim
- family: Langley
given: Raymond
- family: Carin
given: Lawrence
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1254-1263
id: yuan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1254
lastpage: 1263
published: 2015-06-01 00:00:00 +0000
- title: 'A Bayesian nonparametric procedure for comparing algorithms'
abstract: 'A fundamental task in machine learning is to compare the performance of multiple algorithms. This is typically performed by frequentist tests (usually the Friedman test followed by a series of multiple pairwise comparisons). This implies dealing with null hypothesis significance tests and p-values, although the shortcomings of such methods are well known. First, we propose a nonparametric Bayesian version of the Friedman test using a Dirichlet process (DP) based prior. Our derivations show that, from a Bayesian perspective, the Friedman test is an inference for a multivariate mean based on an ellipsoid inclusion test. Second, we derive a joint procedure for the analysis of the multiple comparisons which accounts for their dependencies and which is based on the posterior probability computed through the DP. The proposed approach allows verifying the null hypothesis, not only rejecting it. Third, we apply our test to perform algorithms racing, i.e., the problem of identifying the best algorithm among a large set of candidates. We show by simulation that our approach is competitive both in terms of accuracy and speed in identifying the best algorithm.'
volume: 37
URL: http://proceedings.mlr.press/v37/benavoli15.html
PDF: http://proceedings.mlr.press/v37/benavoli15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-benavoli15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Benavoli
given: Alessio
- family: Corani
given: Giorgio
- family: Mangili
given: Francesca
- family: Zaffalon
given: Marco
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1264-1272
id: benavoli15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1264
lastpage: 1272
published: 2015-06-01 00:00:00 +0000
- title: 'Convergence rate of Bayesian tensor estimator and its minimax optimality'
abstract: 'We investigate the statistical convergence rate of a Bayesian low-rank tensor estimator, and derive the minimax optimal rate for learning a low-rank tensor. Our problem setting is the regression problem where the regression coefficient forms a tensor structure. This problem setting occurs in many practical applications, such as collaborative filtering, multi-task learning, and spatio-temporal data analysis. The convergence rate of the Bayes tensor estimator is analyzed in terms of both in-sample and out-of-sample predictive accuracies. It is shown that a fast learning rate is achieved without any strong convexity of the observation. Moreover, we show that the method has adaptivity to the unknown rank of the true tensor, that is, the near optimal rate depending on the true rank is achieved even if it is not known a priori. Finally, we show the minimax optimal learning rate for the tensor estimation problem, and thus show that the derived bound of the Bayes estimator is tight and actually near minimax optimal.'
volume: 37
URL: http://proceedings.mlr.press/v37/suzuki15.html
PDF: http://proceedings.mlr.press/v37/suzuki15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-suzuki15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Suzuki
given: Taiji
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1273-1282
id: suzuki15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1273
lastpage: 1282
published: 2015-06-01 00:00:00 +0000
- title: 'On Identifying Good Options under Combinatorially Structured Feedback in Finite Noisy Environments'
abstract: 'We consider the problem of identifying a good option out of finite set of options under combinatorially structured, noisy feedback about the quality of the options in a sequential process: In each round, a subset of the options, from an available set of subsets, can be selected to receive noisy information about the quality of the options in the chosen subset. The goal is to identify the highest quality option, or a group of options of the highest quality, with a small error probability, while using the smallest number of measurements. The problem generalizes best-arm identification problems. By extending previous work, we design new algorithms that are shown to be able to exploit the combinatorial structure of the problem in a nontrivial fashion, while being unimprovable in special cases. The algorithms call a set multi-covering oracle, hence their performance and efficiency is strongly tied to whether the associated set multi-covering problem can be efficiently solved.'
volume: 37
URL: http://proceedings.mlr.press/v37/wub15.html
PDF: http://proceedings.mlr.press/v37/wub15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wub15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wu
given: Yifan
- family: Gyorgy
given: Andras
- family: Szepesvari
given: Csaba
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1283-1291
id: wub15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1283
lastpage: 1291
published: 2015-06-01 00:00:00 +0000
- title: 'Nested Sequential Monte Carlo Methods'
abstract: 'We propose nested sequential Monte Carlo (NSMC), a methodology to sample from sequences of probability distributions, even where the random variables are high-dimensional. NSMC generalises the SMC framework by requiring only approximate, properly weighted, samples from the SMC proposal distribution, while still resulting in a correct SMC algorithm. Furthermore, NSMC can in itself be used to produce such properly weighted samples. Consequently, one NSMC sampler can be used to construct an efficient high-dimensional proposal distribution for another NSMC sampler, and this nesting of the algorithm can be done to an arbitrary degree. This allows us to consider complex and high-dimensional models using SMC. We show results that motivate the efficacy of our approach on several filtering problems with dimensions in the order of 100 to 1000.'
volume: 37
URL: http://proceedings.mlr.press/v37/naesseth15.html
PDF: http://proceedings.mlr.press/v37/naesseth15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-naesseth15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Naesseth
given: Christian
- family: Lindsten
given: Fredrik
- family: Schon
given: Thomas
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1292-1301
id: naesseth15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1292
lastpage: 1301
published: 2015-06-01 00:00:00 +0000
- title: 'Sparse Variational Inference for Generalized GP Models'
abstract: 'Gaussian processes (GP) provide an attractive machine learning model due to their non-parametric form, their flexibility to capture many types of observation data, and their generic inference procedures. Sparse GP inference algorithms address the cubic complexity of GPs by focusing on a small set of pseudo-samples. To date, such approaches have focused on the simple case of Gaussian observation likelihoods. This paper develops a variational sparse solution for GPs under general likelihoods by providing a new characterization of the gradients required for inference in terms of individual observation likelihood terms. In addition, we propose a simple new approach for optimizing the sparse variational approximation using a fixed point computation. We demonstrate experimentally that the fixed point operator acts as a contraction in many cases and therefore leads to fast convergence. An experimental evaluation for count regression, classification, and ordinal regression illustrates the generality and advantages of the new approach.'
volume: 37
URL: http://proceedings.mlr.press/v37/sheth15.html
PDF: http://proceedings.mlr.press/v37/sheth15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-sheth15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sheth
given: Rishit
- family: Wang
given: Yuyang
- family: Khardon
given: Roni
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1302-1311
id: sheth15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1302
lastpage: 1311
published: 2015-06-01 00:00:00 +0000
- title: 'Universal Value Function Approximators'
abstract: 'Value functions are a core component of reinforcement learning. The main idea is to to construct a single function approximator V(s; theta) that estimates the long-term reward from any state s, using parameters θ. In this paper we introduce universal value function approximators (UVFAs) V(s,g;theta) that generalise not just over states s but also over goals g. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from s and g to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals.'
volume: 37
URL: http://proceedings.mlr.press/v37/schaul15.html
PDF: http://proceedings.mlr.press/v37/schaul15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-schaul15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Schaul
given: Tom
- family: Horgan
given: Daniel
- family: Gregor
given: Karol
- family: Silver
given: David
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1312-1320
id: schaul15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1312
lastpage: 1320
published: 2015-06-01 00:00:00 +0000
- title: 'Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games'
abstract: 'This paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L_p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteration). We show that we can achieve a stationary policy which is \frac2γ(1 - γ)^2 ε+ \frac1(1 - γ)^2ε’-optimal, where εis the value function approximation error and ε’ is the approximate greedy operator error. In addition, we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum stochastic games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes in a batch setting) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia.'
volume: 37
URL: http://proceedings.mlr.press/v37/perolat15.html
PDF: http://proceedings.mlr.press/v37/perolat15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-perolat15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Perolat
given: Julien
- family: Scherrer
given: Bruno
- family: Piot
given: Bilal
- family: Pietquin
given: Olivier
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1321-1329
id: perolat15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1321
lastpage: 1329
published: 2015-06-01 00:00:00 +0000
- title: 'On Greedy Maximization of Entropy'
abstract: 'Submodular function maximization is one of the key problems that arise in many machine learning tasks. Greedy selection algorithms are the proven choice to solve such problems, where prior theoretical work guarantees (1 - 1/e) approximation ratio. However, it has been empirically observed that greedy selection provides almost optimal solutions in practice. The main goal of this paper is to explore and answer why the greedy selection does significantly better than the theoretical guarantee of (1 - 1/e). Applications include, but are not limited to, sensor selection tasks which use both entropy and mutual information as a maximization criteria. We give a theoretical justification for the nearly optimal approximation ratio via detailed analysis of the curvature of these objective functions for Gaussian RBF kernels.'
volume: 37
URL: http://proceedings.mlr.press/v37/sharma15.html
PDF: http://proceedings.mlr.press/v37/sharma15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-sharma15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sharma
given: Dravyansh
- family: Kapoor
given: Ashish
- family: Deshpande
given: Amit
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1330-1338
id: sharma15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1330
lastpage: 1338
published: 2015-06-01 00:00:00 +0000
- title: 'Metadata Dependent Mondrian Processes'
abstract: 'Stochastic partition processes in a product space play an important role in modeling relational data. Recent studies on the Mondrian process have introduced more flexibility into the block structure in relational models. A side-effect of such high flexibility is that, in data sparsity scenarios, the model is prone to overfit. In reality, relational entities are always associated with meta information, such as user profiles in a social network. In this paper, we propose a metadata dependent Mondrian process (MDMP) to incorporate meta information into the stochastic partition process in the product space and the entity allocation process on the resulting block structure. MDMP can not only encourage homogeneous relational interactions within blocks but also discourage meta-label diversity within blocks. Regularized by meta information, MDMP becomes more robust in data sparsity scenarios and easier to converge in posterior inference. We apply MDMP to link prediction and rating prediction and demonstrate that MDMP is more effective than the baseline models in prediction accuracy with a more parsimonious model structure.'
volume: 37
URL: http://proceedings.mlr.press/v37/wangd15.html
PDF: http://proceedings.mlr.press/v37/wangd15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wangd15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wang
given: Yi
- family: Li
given: Bin
- family: Wang
given: Yang
- family: Chen
given: Fang
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1339-1347
id: wangd15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1339
lastpage: 1347
published: 2015-06-01 00:00:00 +0000
- title: 'Complex Event Detection using Semantic Saliency and Nearly-Isotonic SVM'
abstract: 'We aim to detect complex events in long Internet videos that may last for hours. A major challenge in this setting is that only a few shots in a long video are relevant to the event of interest while others are irrelevant or even misleading. Instead of indifferently pooling the shots, we first define a novel notion of semantic saliency that assesses the relevance of each shot with the event of interest. We then prioritize the shots according to their saliency scores since shots that are semantically more salient are expected to contribute more to the final event detector. Next, we propose a new isotonic regularizer that is able to exploit the semantic ordering information. The resulting nearly-isotonic SVM classifier exhibits higher discriminative power. Computationally, we develop an efficient implementation using the proximal gradient algorithm, and we prove new, closed-form proximal steps. We conduct extensive experiments on three real-world video datasets and confirm the effectiveness of the proposed approach.'
volume: 37
URL: http://proceedings.mlr.press/v37/changa15.html
PDF: http://proceedings.mlr.press/v37/changa15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-changa15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Chang
given: Xiaojun
- family: Yang
given: Yi
- family: Xing
given: Eric
- family: Yu
given: Yaoliang
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1348-1357
id: changa15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1348
lastpage: 1357
published: 2015-06-01 00:00:00 +0000
- title: 'Rebuilding Factorized Information Criterion: Asymptotically Accurate Marginal Likelihood'
abstract: 'Factorized information criterion (FIC) is a recently developed approximation technique for the marginal log-likelihood, which provides an automatic model selection framework for a few latent variable models (LVMs) with tractable inference algorithms. This paper reconsiders FIC and fills theoretical gaps of previous FIC studies. First, we reveal the core idea of FIC that allows generalization for a broader class of LVMs, including continuous LVMs, in contrast to previous FICs, which are applicable only to binary LVMs. Second, we investigate the model selection mechanism of the generalized FIC. Our analysis provides a formal justification of FIC as a model selection criterion for LVMs and also a systematic procedure for pruning redundant latent variables that have been removed heuristically in previous studies. Third, we provide an interpretation of FIC as a variational free energy and uncover previously-unknown their relationship. A demonstrative study on Bayesian principal component analysis is provided and numerical experiments support our theoretical results.'
volume: 37
URL: http://proceedings.mlr.press/v37/hayashi15.html
PDF: http://proceedings.mlr.press/v37/hayashi15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hayashi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hayashi
given: Kohei
- family: Maeda
given: Shin-ichi
- family: Fujimaki
given: Ryohei
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1358-1366
id: hayashi15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1358
lastpage: 1366
published: 2015-06-01 00:00:00 +0000
- title: 'Double Nyström Method: An Efficient and Accurate Nyström Scheme for Large-Scale Data Sets'
abstract: 'The Nyström method has been one of the most effective techniques for kernel-based approach that scales well to large data sets. Since its introduction, there has been a large body of work that improves the approximation accuracy while maintaining computational efficiency. In this paper, we present a novel Nyström method that improves both accuracy and efficiency based on a new theoretical analysis. We first provide a generalized sampling scheme, CAPS, that minimizes a novel error bound based on the subspace distance. We then present our double Nyström method that reduces the size of the decomposition in two stages. We show that our method is highly efficient and accurate compared to other state-of-the-art Nyström methods by evaluating them on a number of real data sets.'
volume: 37
URL: http://proceedings.mlr.press/v37/lima15.html
PDF: http://proceedings.mlr.press/v37/lima15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-lima15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lim
given: Woosang
- family: Kim
given: Minhwan
- family: Park
given: Haesun
- family: Jung
given: Kyomin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1367-1375
id: lima15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1367
lastpage: 1375
published: 2015-06-01 00:00:00 +0000
- title: 'The Composition Theorem for Differential Privacy'
abstract: 'Interactive querying of a database degrades the privacy level. In this paper we answer the fundamental question of characterizing the level of privacy degradation as a function of the number of adaptive interactions and the differential privacy levels maintained by the individual queries. Our solution is complete: the privacy degradation guarantee is true for every privacy mechanism, and further, we demonstrate a sequence of privacy mechanisms that do degrade in the characterized manner. The key innovation is the introduction of an operational interpretation (involving hypothesis testing) to differential privacy and the use of the corresponding data processing inequalities. Our result improves over the state of the art and has immediate applications to several problems studied in the literature.'
volume: 37
URL: http://proceedings.mlr.press/v37/kairouz15.html
PDF: http://proceedings.mlr.press/v37/kairouz15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-kairouz15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Kairouz
given: Peter
- family: Oh
given: Sewoong
- family: Viswanath
given: Pramod
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1376-1385
id: kairouz15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1376
lastpage: 1385
published: 2015-06-01 00:00:00 +0000
- title: 'Convex Formulation for Learning from Positive and Unlabeled Data'
abstract: 'We discuss binary classification from only from positive and unlabeled data (PU classification), which is conceivable in various real-world machine learning problems. Since unlabeled data consists of both positive and negative data, simply separating positive and unlabeled data yields a biased solution. Recently, it was shown that the bias can be canceled by using a particular non-convex loss such as the ramp loss. However, classifier training with a non-convex loss is not straightforward in practice. In this paper, we discuss a convex formulation for PU classification that can still cancel the bias. The key idea is to use different loss functions for positive and unlabeled samples. However, in this setup, the hinge loss is not permissible. As an alternative, we propose the double hinge loss. Theoretically, we prove that the estimators converge to the optimal solutions at the optimal parametric rate. Experimentally, we demonstrate that PU classification with the double hinge loss performs as accurate as the non-convex method, with a much lower computational cost.'
volume: 37
URL: http://proceedings.mlr.press/v37/plessis15.html
PDF: http://proceedings.mlr.press/v37/plessis15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-plessis15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Plessis
given: Marthinus Du
- family: Niu
given: Gang
- family: Sugiyama
given: Masashi
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1386-1394
id: plessis15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1386
lastpage: 1394
published: 2015-06-01 00:00:00 +0000
- title: 'Threshold Influence Model for Allocating Advertising Budgets'
abstract: 'We propose a new influence model for allocating budgets to advertising channels. Our model captures customer’s sensitivity to advertisements as a threshold behavior; a customer is expected to be influenced if the influence he receives exceeds his threshold. Over the threshold model, we discuss two optimization problems. The first one is the budget-constrained influence maximization. We propose two greedy algorithms based on different strategies, and analyze the performance when the influence is submodular. We then introduce a new characteristic to measure the cost-effectiveness of a marketing campaign, that is, the proportion of the resulting influence to the cost spent. We design an almost linear-time approximation algorithm to maximize the cost-effectiveness. Furthermore, we design a better-approximation algorithm based on linear programming for a special case. We conduct thorough experiments to confirm that our algorithms outperform baseline algorithms.'
volume: 37
URL: http://proceedings.mlr.press/v37/miyauchi15.html
PDF: http://proceedings.mlr.press/v37/miyauchi15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-miyauchi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Miyauchi
given: Atsushi
- family: Iwamasa
given: Yuni
- family: Fukunaga
given: Takuro
- family: Kakimura
given: Naonori
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1395-1404
id: miyauchi15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1395
lastpage: 1404
published: 2015-06-01 00:00:00 +0000
- title: 'Strongly Adaptive Online Learning'
abstract: 'Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal. We present a reduction that can transform standard low-regret algorithms to strongly adaptive. As a consequence, we derive simple, yet efficient, strongly adaptive algorithms for a handful of problems.'
volume: 37
URL: http://proceedings.mlr.press/v37/daniely15.html
PDF: http://proceedings.mlr.press/v37/daniely15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-daniely15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Daniely
given: Amit
- family: Gonen
given: Alon
- family: Shalev-Shwartz
given: Shai
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1405-1411
id: daniely15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1405
lastpage: 1411
published: 2015-06-01 00:00:00 +0000
- title: 'CUR Algorithm for Partially Observed Matrices'
abstract: 'CUR matrix decomposition computes the low rank approximation of a given matrix by using the actual rows and columns of the matrix. It has been a very useful tool for handling large matrices. One limitation with the existing algorithms for CUR matrix decomposition is that they cannot deal with entries in a \it partially observed matrix, while incomplete matrices are found in many real world applications. In this work, we alleviate this limitation by developing a CUR decomposition algorithm for partially observed matrices. In particular, the proposed algorithm computes the low rank approximation of the target matrix based on (i) the randomly sampled rows and columns, and (ii) a subset of observed entries that are randomly sampled from the matrix. Our analysis shows the relative error bound, measured by spectral norm, for the proposed algorithm when the target matrix is of full rank. We also show that only O(n r\ln r) observed entries are needed by the proposed algorithm to perfectly recover a rank r matrix of size n\times n, which improves the sample complexity of the existing algorithms for matrix completion. Empirical studies on both synthetic and real-world datasets verify our theoretical claims and demonstrate the effectiveness of the proposed algorithm.'
volume: 37
URL: http://proceedings.mlr.press/v37/xua15.html
PDF: http://proceedings.mlr.press/v37/xua15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-xua15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Xu
given: Miao
- family: Jin
given: Rong
- family: Zhou
given: Zhi-Hua
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1412-1421
id: xua15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1412
lastpage: 1421
published: 2015-06-01 00:00:00 +0000
- title: 'A Deterministic Analysis of Noisy Sparse Subspace Clustering for Dimensionality-reduced Data'
abstract: 'Subspace clustering groups data into several lowrank subspaces. In this paper, we propose a theoretical framework to analyze a popular optimization-based algorithm, Sparse Subspace Clustering (SSC), when the data dimension is compressed via some random projection algorithms. We show SSC provably succeeds if the random projection is a subspace embedding, which includes random Gaussian projection, uniform row sampling, FJLT, sketching, etc. Our analysis applies to the most general deterministic setting and is able to handle both adversarial and stochastic noise. It also results in the first algorithm for privacy-preserved subspace clustering.'
volume: 37
URL: http://proceedings.mlr.press/v37/wange15.html
PDF: http://proceedings.mlr.press/v37/wange15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wange15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wang
given: Yining
- family: Wang
given: Yu-Xiang
- family: Singh
given: Aarti
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1422-1431
id: wange15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1422
lastpage: 1431
published: 2015-06-01 00:00:00 +0000
- title: 'MRA-based Statistical Learning from Incomplete Rankings'
abstract: 'Statistical analysis of rank data describing preferences over small and variable subsets of a potentially large ensemble of items 1, ..., n is a very challenging problem. It is motivated by a wide variety of modern applications, such as recommender systems or search engines. However, very few inference methods have been documented in the literature to learn a ranking model from such incomplete rank data. The goal of this paper is twofold: it develops a rigorous mathematical framework for the problem of learning a ranking model from incomplete rankings and introduces a novel general statistical method to address it. Based on an original concept of multi-resolution analysis (MRA) of incomplete rankings, it finely adapts to any observation setting, leading to a statistical accuracy and an algorithmic complexity that depend directly on the complexity of the observed data. Beyond theoretical guarantees, we also provide experimental results that show its statistical performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/sibony15.html
PDF: http://proceedings.mlr.press/v37/sibony15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-sibony15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sibony
given: Eric
- family: Clemençon
given: Stéphan
- family: Jakubowicz
given: Jérémie
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1432-1441
id: sibony15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1432
lastpage: 1441
published: 2015-06-01 00:00:00 +0000
- title: 'Risk and Regret of Hierarchical Bayesian Learners'
abstract: 'Common statistical practice has shown that the full power of Bayesian methods is not realized until hierarchical priors are used, as these allow for greater “robustness” and the ability to “share statistical strength.” Yet it is an ongoing challenge to provide a learning-theoretically sound formalism of such notions that: offers practical guidance concerning when and how best to utilize hierarchical models; provides insights into what makes for a good hierarchical prior; and, when the form of the prior has been chosen, can guide the choice of hyperparameter settings. We present a set of analytical tools for understanding hierarchical priors in both the online and batch learning settings. We provide regret bounds under log-loss, which show how certain hierarchical models compare, in retrospect, to the best single model in the model class. We also show how to convert a Bayesian log-loss regret bound into a Bayesian risk bound for any bounded loss, a result which may be of independent interest. Risk and regret bounds for Student’s t and hierarchical Gaussian priors allow us to formalize the concepts of “robustness” and “sharing statistical strength.” Priors for feature selection are investigated as well. Our results suggest that the learning-theoretic benefits of using hierarchical priors can often come at little cost on practical problems.'
volume: 37
URL: http://proceedings.mlr.press/v37/hugginsb15.html
PDF: http://proceedings.mlr.press/v37/hugginsb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hugginsb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Huggins
given: Jonathan
- family: Tenenbaum
given: Josh
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1442-1451
id: hugginsb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1442
lastpage: 1451
published: 2015-06-01 00:00:00 +0000
- title: 'Towards a Learning Theory of Cause-Effect Inference'
abstract: 'We pose causal inference as the problem of learning to classify probability distributions. In particular, we assume access to a collection {(S_i,l_i)}_i=1^n, where each S_i is a sample drawn from the probability distribution of X_i \times Y_i, and l_i is a binary label indicating whether “X_i \to Y_i” or “X_i ←Y_i”. Given these data, we build a causal inference rule in two steps. First, we featurize each S_i using the kernel mean embedding associated with some characteristic kernel. Second, we train a binary classifier on such embeddings to distinguish between causal directions. We present generalization bounds showing the statistical consistency and learning rates of the proposed approach, and provide a simple implementation that achieves state-of-the-art cause-effect inference. Furthermore, we extend our ideas to infer causal relationships between more than two variables.'
volume: 37
URL: http://proceedings.mlr.press/v37/lopez-paz15.html
PDF: http://proceedings.mlr.press/v37/lopez-paz15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-lopez-paz15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lopez-Paz
given: David
- family: Muandet
given: Krikamol
- family: Schölkopf
given: Bernhard
- family: Tolstikhin
given: Iliya
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1452-1461
id: lopez-paz15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1452
lastpage: 1461
published: 2015-06-01 00:00:00 +0000
- title: 'DRAW: A Recurrent Neural Network For Image Generation'
abstract: 'This paper introduces the Deep Recurrent Attentive Writer (DRAW) architecture for image generation with neural networks. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational auto-encoding framework that allows for the iterative construction of complex images. The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it is able to generate images that are indistinguishable from real data with the naked eye.'
volume: 37
URL: http://proceedings.mlr.press/v37/gregor15.html
PDF: http://proceedings.mlr.press/v37/gregor15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gregor15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gregor
given: Karol
- family: Danihelka
given: Ivo
- family: Graves
given: Alex
- family: Rezende
given: Danilo
- family: Wierstra
given: Daan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1462-1471
id: gregor15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1462
lastpage: 1471
published: 2015-06-01 00:00:00 +0000
- title: 'Multiview Triplet Embedding: Learning Attributes in Multiple Maps'
abstract: 'For humans, it is usually easier to make statements about the similarity of objects in relative, rather than absolute terms. Moreover, subjective comparisons of objects can be based on a number of different and independent attributes. For example, objects can be compared based on their shape, color, etc. In this paper, we consider the problem of uncovering these hidden attributes given a set of relative distance judgments in the form of triplets. The attribute that was used to generate a particular triplet in this set is unknown. Such data occurs, e.g., in crowdsourcing applications where the triplets are collected from a large group of workers. We propose the Multiview Triplet Embedding (MVTE) algorithm that produces a number of low-dimensional maps, each corresponding to one of the hidden attributes. The method can be used to assess how many different attributes were used to create the triplets, as well as to assess the difficulty of a distance comparison task, and find objects that have multiple interpretations in relation to the other objects.'
volume: 37
URL: http://proceedings.mlr.press/v37/amid15.html
PDF: http://proceedings.mlr.press/v37/amid15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-amid15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Amid
given: Ehsan
- family: Ukkonen
given: Antti
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1472-1480
id: amid15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1472
lastpage: 1480
published: 2015-06-01 00:00:00 +0000
- title: 'Distributed Gaussian Processes'
abstract: 'To scale Gaussian processes (GPs) to large data sets we introduce the robust Bayesian Committee Machine (rBCM), a practical and scalable product-of-experts model for large-scale distributed GP regression. Unlike state-of-the-art sparse GP approximations, the rBCM is conceptually simple and does not rely on inducing or variational parameters. The key idea is to recursively distribute computations to independent computational units and, subsequently, recombine them to form an overall result. Efficient closed-form inference allows for straightforward parallelisation and distributed computations with a small memory footprint. The rBCM is independent of the computational graph and can be used on heterogeneous computing infrastructures, ranging from laptops to clusters. With sufficient computing resources our distributed GP model can handle arbitrarily large data sets.'
volume: 37
URL: http://proceedings.mlr.press/v37/deisenroth15.html
PDF: http://proceedings.mlr.press/v37/deisenroth15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-deisenroth15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Deisenroth
given: Marc
- family: Ng
given: Jun Wei
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1481-1490
id: deisenroth15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1481
lastpage: 1490
published: 2015-06-01 00:00:00 +0000
- title: 'Guaranteed Tensor Decomposition: A Moment Approach'
abstract: 'We develop a theoretical and computational framework to perform guaranteed tensor decomposition, which also has the potential to accomplish other tensor tasks such as tensor completion and denoising. We formulate tensor decomposition as a problem of measure estimation from moments. By constructing a dual polynomial, we demonstrate that measure optimization returns the correct CP decomposition under an incoherence condition on the rank-one factors. To address the computational challenge, we present a hierarchy of semidefinite programs based on sums-of-squares relaxations of the measure optimization problem. By showing that the constructed dual polynomial is a sum-of-squares modulo the sphere, we prove that the smallest SDP in the relaxation hierarchy is exact and the decomposition can be extracted from the solution under the same incoherence condition. One implication is that the tensor nuclear norm can be computed exactly using the smallest SDP as long as the rank-one factors of the tensor are incoherent. Numerical experiments are conducted to test the performance of the moment approach.'
volume: 37
URL: http://proceedings.mlr.press/v37/tanga15.html
PDF: http://proceedings.mlr.press/v37/tanga15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-tanga15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Tang
given: Gongguo
- family: Shah
given: Parikshit
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1491-1500
id: tanga15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1491
lastpage: 1500
published: 2015-06-01 00:00:00 +0000
- title: '\ell_1,p-Norm Regularization: Error Bounds and Convergence Rate Analysis of First-Order Methods'
abstract: 'Recently, \ell_1,p-regularization has been widely used to induce structured sparsity in the solutions to various optimization problems. Motivated by the desire to analyze the convergence rate of first-order methods, we show that for a large class of \ell_1,p-regularized problems, an error bound condition is satisfied when p∈[1,2] or p=∞but fails to hold for any p∈(2,∞). Based on this result, we show that many first-order methods enjoy an asymptotic linear rate of convergence when applied to \ell_1,p-regularized linear or logistic regression with p∈[1,2] or p=∞. By contrast, numerical experiments suggest that for the same class of problems with p∈(2,∞), the aforementioned methods may not converge linearly.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhoub15.html
PDF: http://proceedings.mlr.press/v37/zhoub15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhoub15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhou
given: Zirui
- family: Zhang
given: Qi
- family: So
given: Anthony Man-Cho
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1501-1510
id: zhoub15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1501
lastpage: 1510
published: 2015-06-01 00:00:00 +0000
- title: 'Consistent estimation of dynamic and multi-layer block models'
abstract: 'Significant progress has been made recently on theoretical analysis of estimators for the stochastic block model (SBM). In this paper, we consider the multi-graph SBM, which serves as a foundation for many application settings including dynamic and multi-layer networks. We explore the asymptotic properties of two estimators for the multi-graph SBM, namely spectral clustering and the maximum-likelihood estimate (MLE), as the number of layers of the multi-graph increases. We derive sufficient conditions for consistency of both estimators and propose a variational approximation to the MLE that is computationally feasible for large networks. We verify the sufficient conditions via simulation and demonstrate that they are practical. In addition, we apply the model to two real data sets: a dynamic social network and a multi-layer social network with several types of relations.'
volume: 37
URL: http://proceedings.mlr.press/v37/hanb15.html
PDF: http://proceedings.mlr.press/v37/hanb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hanb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Han
given: Qiuyi
- family: Xu
given: Kevin
- family: Airoldi
given: Edoardo
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1511-1520
id: hanb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1511
lastpage: 1520
published: 2015-06-01 00:00:00 +0000
- title: 'On the Rate of Convergence and Error Bounds for LSTD(λ)'
abstract: 'We consider LSTD(λ), the least-squares temporal-difference algorithm with eligibility traces algorithm proposed by Boyan (2002). It computes a linear approximation of the value function of a fixed policy in a large Markov Decision Process. Under a β-mixing assumption, we derive, for any value of λ∈(0,1), a high-probability bound on the rate of convergence of this algorithm to its limit. We deduce a high-probability bound on the error of this algorithm, that extends (and slightly improves) that derived by Lazaric et al. (2012) in the specific case where λ=0. In the context of temporal-difference algorithms with value function approximation, this analysis is to our knowledge the first to provide insight on the choice of the eligibility-trace parameter λwith respect to the approximation quality of the space and the number of samples.'
volume: 37
URL: http://proceedings.mlr.press/v37/tagorti15.html
PDF: http://proceedings.mlr.press/v37/tagorti15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-tagorti15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Tagorti
given: Manel
- family: Scherrer
given: Bruno
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1521-1529
id: tagorti15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1521
lastpage: 1529
published: 2015-06-01 00:00:00 +0000
- title: 'Variational Inference with Normalizing Flows'
abstract: 'The choice of the approximate posterior distribution is one of the core problems in variational inference. Most applications of variational inference employ simple families of posterior approximations in order to allow for efficient inference, focusing on mean-field or other simple structured approximations. This restriction has a significant impact on the quality of inferences made using variational methods. We introduce a new approach for specifying flexible, arbitrarily complex and scalable approximate posterior distributions. Our approximations are distributions constructed through a normalizing flow, whereby a simple initial density is transformed into a more complex one by applying a sequence of invertible transformations until a desired level of complexity is attained. We use this view of normalizing flows to develop categories of finite and infinitesimal flows and provide a unified view of approaches for constructing rich posterior approximations. We demonstrate that the theoretical advantages of having posteriors that better match the true posterior, combined with the scalability of amortized variational approaches, provides a clear improvement in performance and applicability of variational inference.'
volume: 37
URL: http://proceedings.mlr.press/v37/rezende15.html
PDF: http://proceedings.mlr.press/v37/rezende15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-rezende15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Rezende
given: Danilo
- family: Mohamed
given: Shakir
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1530-1538
id: rezende15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1530
lastpage: 1538
published: 2015-06-01 00:00:00 +0000
- title: 'Controversy in mechanistic modelling with Gaussian processes'
abstract: 'Parameter inference in mechanistic models based on non-affine differential equations is computationally onerous, and various faster alternatives based on gradient matching have been proposed. A particularly promising approach is based on nonparametric Bayesian modelling with Gaussian processes, which exploits the fact that a Gaussian process is closed under differentiation. However, two alternative paradigms have been proposed. The first paradigm, proposed at NIPS 2008 and AISTATS 2013, is based on a product of experts approach and a marginalization over the derivatives of the state variables. The second paradigm, proposed at ICML 2014, is based on a probabilistic generative model and a marginalization over the state variables. The claim has been made that this leads to better inference results. In the present article, we offer a new interpretation of the second paradigm, which highlights the underlying assumptions, approximations and limitations. In particular, we show that the second paradigm suffers from an intrinsic identifiability problem, which the first paradigm is not affected by.'
volume: 37
URL: http://proceedings.mlr.press/v37/macdonald15.html
PDF: http://proceedings.mlr.press/v37/macdonald15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-macdonald15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Macdonald
given: Benn
- family: Higham
given: Catherine
- family: Husmeier
given: Dirk
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1539-1547
id: macdonald15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1539
lastpage: 1547
published: 2015-06-01 00:00:00 +0000
- title: 'Convex Learning of Multiple Tasks and their Structure'
abstract: 'Reducing the amount of human supervision is a key problem in machine learning and a natural approach is that of exploiting the relations (structure) among different tasks. This is the idea at the core of multi-task learning. In this context a fundamental question is how to incorporate the tasks structure in the learning problem. We tackle this question by studying a general computational framework that allows to encode a-priori knowledge of the tasks structure in the form of a convex penalty; in this setting a variety of previously proposed methods can be recovered as special cases, including linear and non-linear approaches. Within this framework, we show that tasks and their structure can be efficiently learned considering a convex optimization problem that can be approached by means of block coordinate methods such as alternating minimization and for which we prove convergence to the global minimum.'
volume: 37
URL: http://proceedings.mlr.press/v37/ciliberto15.html
PDF: http://proceedings.mlr.press/v37/ciliberto15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ciliberto15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ciliberto
given: Carlo
- family: Mroueh
given: Youssef
- family: Poggio
given: Tomaso
- family: Rosasco
given: Lorenzo
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1548-1557
id: ciliberto15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1548
lastpage: 1557
published: 2015-06-01 00:00:00 +0000
- title: 'K-hyperplane Hinge-Minimax Classifier'
abstract: 'We explore a novel approach to upper bound the misclassification error for problems with data comprising a small number of positive samples and a large number of negative samples. We assign the hinge-loss to upper bound the misclassification error of the positive examples and use the minimax risk to upper bound the misclassification error with respect to the worst case distribution that generates the negative examples. This approach is computationally appealing since the majority of training examples (belonging to the negative class) are represented by the statistics of their distribution, in contrast to kernel SVM which produces a very large number of support vectors in such settings. We derive empirical risk bounds for linear and non-linear classification and show that they are dimensionally independent and decay as 1/\sqrtm for m samples. We propose an efficient algorithm for training an intersection of finite number of hyperplane and demonstrate its effectiveness on real data, including letter and scene recognition.'
volume: 37
URL: http://proceedings.mlr.press/v37/osadchy15.html
PDF: http://proceedings.mlr.press/v37/osadchy15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-osadchy15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Osadchy
given: Margarita
- family: Hazan
given: Tamir
- family: Keren
given: Daniel
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1558-1566
id: osadchy15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1558
lastpage: 1566
published: 2015-06-01 00:00:00 +0000
- title: 'Non-Stationary Approximate Modified Policy Iteration'
abstract: 'We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes. Running any instance of Modified Policy Iteration—a family of algorithms that can interpolate between Value and Policy Iteration—with an error εat each iteration is known to lead to stationary policies that are at least \frac2γε(1-γ)^2-optimal. Variations of Value and Policy Iteration, that build \ell-periodic non-stationary policies, have recently been shown to display a better \frac2γε(1-γ)(1-γ^\ell)-optimality guarantee. Our first contribution is to describe a new algorithmic scheme, Non-Stationary Modified Policy Iteration, a family of algorithms parameterized by two integers m \ge 0 and \ell \ge 1 that generalizes all the above mentionned algorithms. While m allows to interpolate between Value-Iteration-style and Policy-Iteration-style updates, \ell specifies the period of the non-stationary policy that is output. We show that this new family of algorithms also enjoys the improved \frac2γε(1-γ)(1-γ^\ell)-optimality guarantee. Perhaps more importantly, we show, by exhibiting an original problem instance, that this guarantee is tight for all m and \ell; this tightness was to our knowledge only proved two specific cases, Value Iteration (m=0,\ell=1) and Policy Iteration (m=∞,\ell=1).'
volume: 37
URL: http://proceedings.mlr.press/v37/lesner15.html
PDF: http://proceedings.mlr.press/v37/lesner15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-lesner15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lesner
given: Boris
- family: Scherrer
given: Bruno
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1567-1575
id: lesner15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1567
lastpage: 1575
published: 2015-06-01 00:00:00 +0000
- title: 'Entropy evaluation based on confidence intervals of frequency estimates : Application to the learning of decision trees'
abstract: 'Entropy gain is widely used for learning decision trees. However, as we go deeper downward the tree, the examples become rarer and the faithfulness of entropy decreases. Thus, misleading choices and over-fitting may occur and the tree has to be adjusted by using an early-stop criterion or post pruning algorithms. However, these methods still depends on the choices previously made, which may be unsatisfactory. We propose a new cumulative entropy function based on confidence intervals on frequency estimates that together considers the entropy of the probability distribution and the uncertainty around the estimation of its parameters. This function takes advantage of the ability of a possibility distribution to upper bound a family of probabilities previously estimated from a limited set of examples and of the link between possibilistic specificity order and entropy. The proposed measure has several advantages over the classical one. It performs significant choices of split and provides a statistically relevant stopping criterion that allows the learning of trees whose size is well-suited w.r.t. the available data. On the top of that, it also provides a reasonable estimator of the performances of a decision tree. Finally, we show that it can be used for designing a simple and efficient online learning algorithm.'
volume: 37
URL: http://proceedings.mlr.press/v37/serrurier15.html
PDF: http://proceedings.mlr.press/v37/serrurier15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-serrurier15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Serrurier
given: Mathieu
- family: Prade
given: Henri
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1576-1584
id: serrurier15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1576
lastpage: 1584
published: 2015-06-01 00:00:00 +0000
- title: 'Geometric Conditions for Subspace-Sparse Recovery'
abstract: 'Given a dictionary \Pi and a signal ξ= \Pi \mathbf x generated by a few \textitlinearly independent columns of \Pi, classical sparse recovery theory deals with the problem of uniquely recovering the sparse representation \mathbf x of ξ. In this work, we consider the more general case where ξlies in a low-dimensional subspace spanned by a few columns of \Pi, which are possibly \textitlinearly dependent. In this case, \mathbf x may not unique, and the goal is to recover any subset of the columns of \Pi that spans the subspace containing ξ. We call such a representation \mathbf x \textitsubspace-sparse. We study conditions under which existing pursuit methods recover a subspace-sparse representation. Such conditions reveal important geometric insights and have implications for the theory of classical sparse recovery as well as subspace clustering.'
volume: 37
URL: http://proceedings.mlr.press/v37/you15.html
PDF: http://proceedings.mlr.press/v37/you15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-you15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: You
given: Chong
- family: Vidal
given: Rene
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1585-1593
id: you15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1585
lastpage: 1593
published: 2015-06-01 00:00:00 +0000
- title: 'An Empirical Study of Stochastic Variational Inference Algorithms for the Beta Bernoulli Process'
abstract: 'Stochastic variational inference (SVI) is emerging as the most promising candidate for scaling inference in Bayesian probabilistic models to large datasets. However, the performance of these methods has been assessed primarily in the context of Bayesian topic models, particularly latent Dirichlet allocation (LDA). Deriving several new algorithms, and using synthetic, image and genomic datasets, we investigate whether the understanding gleaned from LDA applies in the setting of sparse latent factor models, specifically beta process factor analysis (BPFA). We demonstrate that the big picture is consistent: using Gibbs sampling within SVI to maintain certain posterior dependencies is extremely effective. However, we also show that different posterior dependencies are important in BPFA relative to LDA.'
volume: 37
URL: http://proceedings.mlr.press/v37/shahb15.html
PDF: http://proceedings.mlr.press/v37/shahb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-shahb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Shah
given: Amar
- family: Knowles
given: David
- family: Ghahramani
given: Zoubin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1594-1603
id: shahb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1594
lastpage: 1603
published: 2015-06-01 00:00:00 +0000
- title: 'Long Short-Term Memory Over Recursive Structures'
abstract: 'The chain-structured long short-term memory (LSTM) has showed to be effective in a wide range of problems such as speech recognition and machine translation. In this paper, we propose to extend it to tree structures, in which a memory cell can reflect the history memories of multiple child cells or multiple descendant cells in a recursive process. We call the model S-LSTM, which provides a principled way of considering long-distance interaction over hierarchies, e.g., language or image parse structures. We leverage the models for semantic composition to understand the meaning of text, a fundamental problem in natural language understanding, and show that it outperforms a state-of-the-art recursive model by replacing its composition layers with the S-LSTM memory blocks. We also show that utilizing the given structures is helpful in achieving a performance better than that without considering the structures.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhub15.html
PDF: http://proceedings.mlr.press/v37/zhub15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhub15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhu
given: Xiaodan
- family: Sobihani
given: Parinaz
- family: Guo
given: Hongyu
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1604-1612
id: zhub15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1604
lastpage: 1612
published: 2015-06-01 00:00:00 +0000
- title: 'Weight Uncertainty in Neural Network'
abstract: 'We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.'
volume: 37
URL: http://proceedings.mlr.press/v37/blundell15.html
PDF: http://proceedings.mlr.press/v37/blundell15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-blundell15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Blundell
given: Charles
- family: Cornebise
given: Julien
- family: Kavukcuoglu
given: Koray
- family: Wierstra
given: Daan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1613-1622
id: blundell15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1613
lastpage: 1622
published: 2015-06-01 00:00:00 +0000
- title: 'Learning Submodular Losses with the Lovasz Hinge'
abstract: 'Learning with non-modular losses is an important problem when sets of predictions are made simultaneously. The main tools for constructing convex surrogate loss functions for set prediction are margin rescaling and slack rescaling. In this work, we show that these strategies lead to tight convex surrogates iff the underlying loss function is increasing in the number of incorrect predictions. However, gradient or cutting-plane computation for these functions is NP-hard for non-supermodular loss functions. We propose instead a novel convex surrogate loss function for submodular losses, the Lovasz hinge, which leads to O(p log p) complexity with O(p) oracle accesses to the loss function to compute a gradient or cutting-plane. As a result, we have developed the first tractable convex surrogates in the literature for submodular losses. We demonstrate the utility of this novel convex surrogate through a real world image labeling task.'
volume: 37
URL: http://proceedings.mlr.press/v37/yub15.html
PDF: http://proceedings.mlr.press/v37/yub15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yub15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yu
given: Jiaqian
- family: Blaschko
given: Matthew
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1623-1631
id: yub15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1623
lastpage: 1631
published: 2015-06-01 00:00:00 +0000
- title: 'Coordinate Descent Converges Faster with the Gauss-Southwell Rule Than Random Selection'
abstract: 'There has been significant recent work on the theory and application of randomized coordinate descent algorithms, beginning with the work of Nesterov [SIAM J. Optim., 22(2), 2012], who showed that a random-coordinate selection rule achieves the same convergence rate as the Gauss-Southwell selection rule. This result suggests that we should never use the Gauss-Southwell rule, as it is typically much more expensive than random selection. However, the empirical behaviours of these algorithms contradict this theoretical result: in applications where the computational costs of the selection rules are comparable, the Gauss-Southwell selection rule tends to perform substantially better than random coordinate selection. We give a simple analysis of the Gauss-Southwell rule showing that—except in extreme cases—it’s convergence rate is faster than choosing random coordinates. Further, in this work we (i) show that exact coordinate optimization improves the convergence rate for certain sparse problems, (ii) propose a Gauss-Southwell-Lipschitz rule that gives an even faster convergence rate given knowledge of the Lipschitz constants of the partial derivatives, (iii) analyze the effect of approximate Gauss-Southwell rules, and (iv) analyze proximal-gradient variants of the Gauss-Southwell rule.'
volume: 37
URL: http://proceedings.mlr.press/v37/nutini15.html
PDF: http://proceedings.mlr.press/v37/nutini15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-nutini15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Nutini
given: Julie
- family: Schmidt
given: Mark
- family: Laradji
given: Issam
- family: Friedlander
given: Michael
- family: Koepke
given: Hoyt
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1632-1641
id: nutini15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1632
lastpage: 1641
published: 2015-06-01 00:00:00 +0000
- title: 'Hashing for Distributed Data'
abstract: 'Recently, hashing based approximate nearest neighbors search has attracted much attention. Extensive centralized hashing algorithms have been proposed and achieved promising performance. However, due to the large scale of many applications, the data is often stored or even collected in a distributed manner. Learning hash functions by aggregating all the data into a fusion center is infeasible because of the prohibitively expensive communication and computation overhead. In this paper, we develop a novel hashing model to learn hash functions in a distributed setting. We cast a centralized hashing model as a set of subproblems with consensus constraints. We find these subproblems can be analytically solved in parallel on the distributed compute nodes. Since no training data is transmitted across the nodes in the learning process, the communication cost of our model is independent to the data size. Extensive experiments on several large scale datasets containing up to 100 million samples demonstrate the efficacy of our method.'
volume: 37
URL: http://proceedings.mlr.press/v37/leng15.html
PDF: http://proceedings.mlr.press/v37/leng15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-leng15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Leng
given: Cong
- family: Wu
given: Jiaxiang
- family: Cheng
given: Jian
- family: Zhang
given: Xi
- family: Lu
given: Hanqing
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1642-1650
id: leng15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1642
lastpage: 1650
published: 2015-06-01 00:00:00 +0000
- title: 'Large-scale Distributed Dependent Nonparametric Trees'
abstract: 'Practical applications of Bayesian nonparametric (BNP) models have been limited, due to their high computational complexity and poor scaling on large data. In this paper, we consider dependent nonparametric trees (DNTs), a powerful infinite model that captures time-evolving hierarchies, and develop a large-scale distributed training system. Our major contributions include: (1) an effective memoized variational inference for DNTs, with a novel birth-merge strategy for exploring the unbounded tree space; (2) a model-parallel scheme for concurrent tree growing/pruning and efficient model alignment, through conflict-free model partitioning and lightweight synchronization; (3) a data-parallel scheme for variational parameter updates that allows distributed processing of massive data. Using 64 cores in 36 hours, our system learns a 10K-node DNT topic model on 8M documents that captures both high-frequency and long-tail topics. Our data and model scales are orders-of-magnitude larger than recent results on the hierarchical Dirichlet process, and the near-linear scalability indicates great potential for even bigger problem sizes.'
volume: 37
URL: http://proceedings.mlr.press/v37/hu15.html
PDF: http://proceedings.mlr.press/v37/hu15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hu15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hu
given: Zhiting
- family: Qirong
given: Ho
- family: Dubey
given: Avinava
- family: Xing
given: Eric
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1651-1659
id: hu15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1651
lastpage: 1659
published: 2015-06-01 00:00:00 +0000
- title: 'Qualitative Multi-Armed Bandits: A Quantile-Based Approach'
abstract: 'We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.'
volume: 37
URL: http://proceedings.mlr.press/v37/szorenyi15.html
PDF: http://proceedings.mlr.press/v37/szorenyi15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-szorenyi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Szorenyi
given: Balazs
- family: Busa-Fekete
given: Robert
- family: Weng
given: Paul
- family: Hüllermeier
given: Eyke
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1660-1668
id: szorenyi15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1660
lastpage: 1668
published: 2015-06-01 00:00:00 +0000
- title: 'Deep Edge-Aware Filters'
abstract: 'There are many edge-aware filters varying in their construction forms and filtering properties. It seems impossible to uniformly represent and accelerate them in a single framework. We made the attempt to learn a big and important family of edge-aware operators from data. Our method is based on a deep convolutional neural network with a gradient domain training procedure, which gives rise to a powerful tool to approximate various filters without knowing the original models and implementation details. The only difference among these operators in our system becomes merely the learned parameters. Our system enables fast approximation for complex edge-aware filters and achieves up to 200x acceleration, regardless of their originally very different implementation. Fast speed can also be achieved when creating new effects using spatially varying filter or filter combination, bearing out the effectiveness of our deep edge-aware filters.'
volume: 37
URL: http://proceedings.mlr.press/v37/xub15.html
PDF: http://proceedings.mlr.press/v37/xub15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-xub15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Xu
given: Li
- family: Ren
given: Jimmy
- family: Yan
given: Qiong
- family: Liao
given: Renjie
- family: Jia
given: Jiaya
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1669-1678
id: xub15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1669
lastpage: 1678
published: 2015-06-01 00:00:00 +0000
- title: 'A Convex Optimization Framework for Bi-Clustering'
abstract: 'We present a framework for biclustering and clustering where the observations are general labels. Our approach is based on the maximum likelihood estimator and its convex relaxation, and generalizes recent works in graph clustering to the biclustering setting. In addition to standard biclustering setting where one seeks to discover clustering structure simultaneously in two domain sets, we show that the same algorithm can be as effective when clustering structure only occurs in one domain. This allows for an alternative approach to clustering that is more natural in some scenarios. We present theoretical results that provide sufficient conditions for the recovery of the true underlying clusters under a generalized stochastic block model. These are further validated by our empirical results on both synthetic and real data.'
volume: 37
URL: http://proceedings.mlr.press/v37/limb15.html
PDF: http://proceedings.mlr.press/v37/limb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-limb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lim
given: Shiau Hong
- family: Chen
given: Yudong
- family: Xu
given: Huan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1679-1688
id: limb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1679
lastpage: 1688
published: 2015-06-01 00:00:00 +0000
- title: 'Is Feature Selection Secure against Training Data Poisoning?'
abstract: 'Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.'
volume: 37
URL: http://proceedings.mlr.press/v37/xiao15.html
PDF: http://proceedings.mlr.press/v37/xiao15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-xiao15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Xiao
given: Huang
- family: Biggio
given: Battista
- family: Brown
given: Gavin
- family: Fumera
given: Giorgio
- family: Eckert
given: Claudia
- family: Roli
given: Fabio
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1689-1698
id: xiao15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1689
lastpage: 1698
published: 2015-06-01 00:00:00 +0000
- title: 'Predictive Entropy Search for Bayesian Optimization with Unknown Constraints'
abstract: 'Unknown constraints arise in many types of expensive black-box optimization problems. Several methods have been proposed recently for performing Bayesian optimization with constraints, based on the expected improvement (EI) heuristic. However, EI can lead to pathologies when used with constraints. For example, in the case of decoupled constraints—i.e., when one can independently evaluate the objective or the constraints—EI can encounter a pathology that prevents exploration. Additionally, computing EI requires a current best solution, which may not exist if none of the data collected so far satisfy the constraints. By contrast, information-based approaches do not suffer from these failure modes. In this paper, we present a new information-based method called Predictive Entropy Search with Constraints (PESC). We analyze the performance of PESC and show that it compares favorably to EI-based approaches on synthetic and benchmark problems, as well as several real-world examples. We demonstrate that PESC is an effective algorithm that provides a promising direction towards a unified solution for constrained Bayesian optimization.'
volume: 37
URL: http://proceedings.mlr.press/v37/hernandez-lobatob15.html
PDF: http://proceedings.mlr.press/v37/hernandez-lobatob15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hernandez-lobatob15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hernandez-Lobato
given: Jose Miguel
- family: Gelbart
given: Michael
- family: Hoffman
given: Matthew
- family: Adams
given: Ryan
- family: Ghahramani
given: Zoubin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1699-1707
id: hernandez-lobatob15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1699
lastpage: 1707
published: 2015-06-01 00:00:00 +0000
- title: 'A Theoretical Analysis of Metric Hypothesis Transfer Learning'
abstract: 'We consider the problem of transferring some a priori knowledge in the context of supervised metric learning approaches. While this setting has been successfully applied in some empirical contexts, no theoretical evidence exists to justify this approach. In this paper, we provide a theoretical justification based on the notion of algorithmic stability adapted to the regularized metric learning setting. We propose an on-average-replace-two-stability model allowing us to prove fast generalization rates when an auxiliary source metric is used to bias the regularizer. Moreover, we prove a consistency result from which we show the interest of considering biased weighted regularized formulations and we provide a solution to estimate the associated weight. We also present some experiments illustrating the interest of the approach in standard metric learning tasks and in a transfer learning problem where few labelled data are available.'
volume: 37
URL: http://proceedings.mlr.press/v37/perrot15.html
PDF: http://proceedings.mlr.press/v37/perrot15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-perrot15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Perrot
given: Michaël
- family: Habrard
given: Amaury
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1708-1717
id: perrot15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1708
lastpage: 1717
published: 2015-06-01 00:00:00 +0000
- title: 'Generative Moment Matching Networks'
abstract: 'We consider the problem of learning deep generative models from data. We formulate a method that generates an independent sample via a single feedforward pass through a multilayer preceptron, as in the recently proposed generative adversarial networks (Goodfellow et al., 2014). Training a generative adversarial network, however, requires careful optimization of a difficult minimax program. Instead, we utilize a technique from statistical hypothesis testing known as maximum mean discrepancy (MMD), which leads to a simple objective that can be interpreted as matching all orders of statistics between a dataset and samples from the model, and can be trained by backpropagation. We further boost the performance of this approach by combining our generative network with an auto-encoder network, using MMD to learn to generate codes that can then be decoded to produce samples. We show that the combination of these techniques yields excellent generative models compared to baseline approaches as measured on MNIST and the Toronto Face Database.'
volume: 37
URL: http://proceedings.mlr.press/v37/li15.html
PDF: http://proceedings.mlr.press/v37/li15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-li15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Li
given: Yujia
- family: Swersky
given: Kevin
- family: Zemel
given: Rich
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1718-1727
id: li15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1718
lastpage: 1727
published: 2015-06-01 00:00:00 +0000
- title: 'Stay on path: PCA along graph paths'
abstract: 'We introduce a variant of (sparse) PCA in which the set of feasible support sets is determined by a graph. In particular, we consider the following setting: given a directed acyclic graph G on p vertices corresponding to variables, the non-zero entries of the extracted principal component must coincide with vertices lying along a path in G. From a statistical perspective, information on the underlying network may potentially reduce the number of observations required to recover the population principal component. We consider the canonical estimator which optimally exploits the prior knowledge by solving a non-convex quadratic maximization on the empirical covariance. We introduce a simple network and analyze the estimator under the spiked covariance model for sparse PCA. We show that side information potentially improves the statistical complexity. We propose two algorithms to approximate the solution of the constrained quadratic maximization, and recover a component with the desired properties. We empirically evaluate our schemes on synthetic and real datasets.'
volume: 37
URL: http://proceedings.mlr.press/v37/asteris15.html
PDF: http://proceedings.mlr.press/v37/asteris15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-asteris15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Asteris
given: Megasthenis
- family: Kyrillidis
given: Anastasios
- family: Dimakis
given: Alex
- family: Yi
given: Han-Gyol
- family: Chandrasekaran
given: Bharath
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1728-1736
id: asteris15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1728
lastpage: 1736
published: 2015-06-01 00:00:00 +0000
- title: 'Deep Learning with Limited Numerical Precision'
abstract: 'Training of large-scale deep neural networks is often constrained by the available computational resources. We study the effect of limited precision data representation and computation on neural network training. Within the context of low-precision fixed-point computations, we observe the rounding scheme to play a crucial role in determining the network’s behavior during training. Our results show that deep networks can be trained using only 16-bit wide fixed-point number representation when using stochastic rounding, and incur little to no degradation in the classification accuracy. We also demonstrate an energy-efficient hardware accelerator that implements low-precision fixed-point arithmetic with stochastic rounding'
volume: 37
URL: http://proceedings.mlr.press/v37/gupta15.html
PDF: http://proceedings.mlr.press/v37/gupta15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gupta15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gupta
given: Suyog
- family: Agrawal
given: Ankur
- family: Gopalakrishnan
given: Kailash
- family: Narayanan
given: Pritish
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1737-1746
id: gupta15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1737
lastpage: 1746
published: 2015-06-01 00:00:00 +0000
- title: 'Safe Screening for Multi-Task Feature Learning with Multiple Data Matrices'
abstract: 'Multi-task feature learning (MTFL) is a powerful technique in boosting the predictive performance by learning multiple related classification/regression/clustering tasks simultaneously. However, solving the MTFL problem remains challenging when the feature dimension is extremely large. In this paper, we propose a novel screening rule—that is based on the dual projection onto convex sets (DPC)—to quickly identify the inactive features—that have zero coefficients in the solution vectors across all tasks. One of the appealing features of DPC is that: it is safe in the sense that the detected inactive features are guaranteed to have zero coefficients in the solution vectors across all tasks. Thus, by removing the inactive features from the training phase, we may have substantial savings in the computational cost and memory usage without sacrificing accuracy. To the best of our knowledge, it is the first screening rule that is applicable to sparse models with multiple data matrices. A key challenge in deriving DPC is to solve a nonconvex problem. We show that we can solve for the global optimum efficiently via a properly chosen parametrization of the constraint set. Moreover, DPC has very low computational cost and can be integrated with any existing solvers. We have evaluated the proposed DPC rule on both synthetic and real data sets. The experiments indicate that DPC is very effective in identifying the inactive features—especially for high dimensional data—which leads to a speedup up to several orders of magnitude.'
volume: 37
URL: http://proceedings.mlr.press/v37/wangf15.html
PDF: http://proceedings.mlr.press/v37/wangf15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wangf15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wang
given: Jie
- family: Ye
given: Jieping
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1747-1756
id: wangf15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1747
lastpage: 1756
published: 2015-06-01 00:00:00 +0000
- title: 'Harmonic Exponential Families on Manifolds'
abstract: 'In a range of fields including the geosciences, molecular biology, robotics and computer vision, one encounters problems that involve random variables on manifolds. Currently, there is a lack of flexible probabilistic models on manifolds that are fast and easy to train. We define an extremely flexible class of exponential family distributions on manifolds such as the torus, sphere, and rotation groups, and show that for these distributions the gradient of the log-likelihood can be computed efficiently using a non-commutative generalization of the Fast Fourier Transform (FFT). We discuss applications to Bayesian camera motion estimation (where harmonic exponential families serve as conjugate priors), and modelling of the spatial distribution of earthquakes on the surface of the earth. Our experimental results show that harmonic densities yield a significantly higher likelihood than the best competing method, while being orders of magnitude faster to train.'
volume: 37
URL: http://proceedings.mlr.press/v37/cohenb15.html
PDF: http://proceedings.mlr.press/v37/cohenb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-cohenb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Cohen
given: Taco
- family: Welling
given: Max
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1757-1765
id: cohenb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1757
lastpage: 1765
published: 2015-06-01 00:00:00 +0000
- title: 'Training Deep Convolutional Neural Networks to Play Go'
abstract: 'Mastering the game of Go has remained a long-standing challenge to the field of AI. Modern computer Go programs rely on processing millions of possible future positions to play well, but intuitively a stronger and more ’humanlike’ way to play the game would be to rely on pattern recognition rather than brute force computation. Following this sentiment, we train deep convolutional neural networks to play Go by training them to predict the moves made by expert Go players. To solve this problem we introduce a number of novel techniques, including a method of tying weights in the network to ’hard code’ symmetries that are expected to exist in the target function, and demonstrate in an ablation study they considerably improve performance. Our final networks are able to achieve move prediction accuracies of 41.1% and 44.4% on two different Go datasets, surpassing previous state of the art on this task by significant margins. Additionally, while previous move prediction systems have not yielded strong Go playing programs, we show that the networks trained in this work acquired high levels of skill. Our convolutional neural networks can consistently defeat the well known Go program GNU Go and win some games against state of the art Go playing program Fuego while using a fraction of the play time.'
volume: 37
URL: http://proceedings.mlr.press/v37/clark15.html
PDF: http://proceedings.mlr.press/v37/clark15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-clark15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Clark
given: Christopher
- family: Storkey
given: Amos
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1766-1774
id: clark15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1766
lastpage: 1774
published: 2015-06-01 00:00:00 +0000
- title: 'Kernel Interpolation for Scalable Structured Gaussian Processes (KISS-GP)'
abstract: 'We introduce a new structured kernel interpolation (SKI) framework, which generalises and unifies inducing point methods for scalable Gaussian processes (GPs). SKI methods produce kernel approximations for fast computations through kernel interpolation. The SKI framework clarifies how the quality of an inducing point approach depends on the number of inducing (aka interpolation) points, interpolation strategy, and GP covariance kernel. SKI also provides a mechanism to create new scalable kernel methods, through choosing different kernel interpolation strategies. Using SKI, with local cubic kernel interpolation, we introduce KISS-GP, which is 1) more scalable than inducing point alternatives, 2) naturally enables Kronecker and Toeplitz algebra for substantial additional gains in scalability, without requiring any grid data, and 3) can be used for fast and expressive kernel learning. KISS-GP costs O(n) time and storage for GP inference. We evaluate KISS-GP for kernel matrix approximation, kernel learning, and natural sound modelling.'
volume: 37
URL: http://proceedings.mlr.press/v37/wilson15.html
PDF: http://proceedings.mlr.press/v37/wilson15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wilson15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wilson
given: Andrew
- family: Nickisch
given: Hannes
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1775-1784
id: wilson15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1775
lastpage: 1784
published: 2015-06-01 00:00:00 +0000
- title: 'Learning Deep Structured Models'
abstract: 'Many problems in real-world applications involve predicting several random variables that are statistically related. Markov random fields (MRFs) are a great mathematical tool to encode such dependencies. The goal of this paper is to combine MRFs with deep learning to estimate complex representations while taking into account the dependencies between the output random variables. Towards this goal, we propose a training algorithm that is able to learn structured models jointly with deep features that form the MRF potentials. Our approach is efficient as it blends learning and inference and makes use of GPU acceleration. We demonstrate the effectiveness of our algorithm in the tasks of predicting words from noisy images, as well as tagging of Flickr photographs. We show that joint learning of the deep features and the MRF parameters results in significant performance gains.'
volume: 37
URL: http://proceedings.mlr.press/v37/chenb15.html
PDF: http://proceedings.mlr.press/v37/chenb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-chenb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Chen
given: Liang-Chieh
- family: Schwing
given: Alexander
- family: Yuille
given: Alan
- family: Urtasun
given: Raquel
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1785-1794
id: chenb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1785
lastpage: 1794
published: 2015-06-01 00:00:00 +0000
- title: 'Community Detection Using Time-Dependent Personalized PageRank'
abstract: 'Local graph diffusions have proven to be valuable tools for solving various graph clustering problems. As such, there has been much interest recently in efficient local algorithms for computing them. We present an efficient local algorithm for approximating a graph diffusion that generalizes both the celebrated personalized PageRank and its recent competitor/companion - the heat kernel. Our algorithm is based on writing the diffusion vector as the solution of an initial value problem, and then using a waveform relaxation approach to approximate the solution. Our experimental results suggest that it produces rankings that are distinct and competitive with the ones produced by high quality implementations of personalized PageRank and localized heat kernel, and that our algorithm is a useful addition to the toolset of localized graph diffusions.'
volume: 37
URL: http://proceedings.mlr.press/v37/avron15.html
PDF: http://proceedings.mlr.press/v37/avron15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-avron15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Avron
given: Haim
- family: Horesh
given: Lior
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1795-1803
id: avron15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1795
lastpage: 1803
published: 2015-06-01 00:00:00 +0000
- title: 'Scalable Variational Inference in Log-supermodular Models'
abstract: 'We consider the problem of approximate Bayesian inference in log-supermodular models. These models encompass regular pairwise MRFs with binary variables, but allow to capture high order interactions, which are intractable for existing approximate inference techniques such as belief propagation, mean field and variants. We show that a recently proposed variational approach to inference in log-supermodular models – L-Field – reduces to the widely studied minimum norm problem for submodular minimization. This insight allows to leverage powerful existing tools, and allows solving the variational problem orders of magnitude more efficiently than previously possible. We then provide another natural interpretation of L-Field, demonstrating that it exactly minimizes a specific type of Renyi divergence measure. This insight sheds light on the nature of the variational approximations produced by L-Field. Furthermore, we show how to perform parallel inference as message passing in a suitable factor graph at a linear convergence rate, without having to sum up over all the configurations of the factor. Finally, we apply our approach to a challenging image segmentation task. Our experiments confirm scalability of our approach, high quality of the marginals and the benefit of incorporating higher order potentials.'
volume: 37
URL: http://proceedings.mlr.press/v37/djolonga15.html
PDF: http://proceedings.mlr.press/v37/djolonga15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-djolonga15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Djolonga
given: Josip
- family: Krause
given: Andreas
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1804-1813
id: djolonga15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1804
lastpage: 1813
published: 2015-06-01 00:00:00 +0000
- title: 'Variational Inference for Gaussian Process Modulated Poisson Processes'
abstract: 'We present the first fully variational Bayesian inference scheme for continuous Gaussian-process-modulated Poisson processes. Such point processes are used in a variety of domains, including neuroscience, geo-statistics and astronomy, but their use is hindered by the computational cost of existing inference schemes. Our scheme: requires no discretisation of the domain; scales linearly in the number of observed events; and is many orders of magnitude faster than previous sampling based approaches. The resulting algorithm is shown to outperform standard methods on synthetic examples, coal mining disaster data and in the prediction of Malaria incidences in Kenya.'
volume: 37
URL: http://proceedings.mlr.press/v37/lloyd15.html
PDF: http://proceedings.mlr.press/v37/lloyd15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-lloyd15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lloyd
given: Chris
- family: Gunter
given: Tom
- family: Osborne
given: Michael
- family: Roberts
given: Stephen
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1814-1822
id: lloyd15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1814
lastpage: 1822
published: 2015-06-01 00:00:00 +0000
- title: 'Scalable Deep Poisson Factor Analysis for Topic Modeling'
abstract: 'A new framework for topic modeling is developed, based on deep graphical models, where interactions between topics are inferred through deep latent binary hierarchies. The proposed multi-layer model employs a deep sigmoid belief network or restricted Boltzmann machine, the bottom binary layer of which selects topics for use in a Poisson factor analysis model. Under this setting, topics live on the bottom layer of the model, while the deep specification serves as a flexible prior for revealing topic structure. Scalable inference algorithms are derived by applying Bayesian conditional density filtering algorithm, in addition to extending recently proposed work on stochastic gradient thermostats. Experimental results on several corpora show that the proposed approach readily handles very large collections of text documents, infers structured topic representations, and obtains superior test perplexities when compared with related models.'
volume: 37
URL: http://proceedings.mlr.press/v37/gan15.html
PDF: http://proceedings.mlr.press/v37/gan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gan
given: Zhe
- family: Chen
given: Changyou
- family: Henao
given: Ricardo
- family: Carlson
given: David
- family: Carin
given: Lawrence
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1823-1832
id: gan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1823
lastpage: 1832
published: 2015-06-01 00:00:00 +0000
- title: 'Hidden Markov Anomaly Detection'
abstract: 'We introduce a new anomaly detection methodology for data with latent dependency structure. As a particular instantiation, we derive a hidden Markov anomaly detector that extends the regular one-class support vector machine. We optimize the approach, which is non-convex, via a DC (difference of convex functions) algorithm, and show that the parameter v can be conveniently used to control the number of outliers in the model. The empirical evaluation on artificial and real data from the domains of computational biology and computational sustainability shows that the approach can achieve significantly higher anomaly detection performance than the regular one-class SVM.'
volume: 37
URL: http://proceedings.mlr.press/v37/goernitz15.html
PDF: http://proceedings.mlr.press/v37/goernitz15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-goernitz15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Goernitz
given: Nico
- family: Braun
given: Mikio
- family: Kloft
given: Marius
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1833-1842
id: goernitz15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1833
lastpage: 1842
published: 2015-06-01 00:00:00 +0000
- title: 'Robust Estimation of Transition Matrices in High Dimensional Heavy-tailed Vector Autoregressive Processes'
abstract: 'Gaussian vector autoregressive (VAR) processes have been extensively studied in the literature. However, Gaussian assumptions are stringent for heavy-tailed time series that frequently arises in finance and economics. In this paper, we develop a unified framework for modeling and estimating heavy-tailed VAR processes. In particular, we generalize the Gaussian VAR model by an elliptical VAR model that naturally accommodates heavy-tailed time series. Under this model, we develop a quantile-based robust estimator for the transition matrix of the VAR process. We show that the proposed estimator achieves parametric rates of convergence in high dimensions. This is the first work in analyzing heavy-tailed high dimensional VAR processes. As an application of the proposed framework, we investigate Granger causality in the elliptical VAR process, and show that the robust transition matrix estimator induces sign-consistent estimators of Granger causality. The empirical performance of the proposed methodology is demonstrated by both synthetic and real data. We show that the proposed estimator is robust to heavy tails, and exhibit superior performance in stock price prediction.'
volume: 37
URL: http://proceedings.mlr.press/v37/qiu15.html
PDF: http://proceedings.mlr.press/v37/qiu15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-qiu15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Qiu
given: Huitong
- family: Xu
given: Sheng
- family: Han
given: Fang
- family: Liu
given: Han
- family: Caffo
given: Brian
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1843-1851
id: qiu15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1843
lastpage: 1851
published: 2015-06-01 00:00:00 +0000
- title: 'Convex Calibrated Surrogates for Hierarchical Classification'
abstract: 'Hierarchical classification problems are multiclass supervised learning problems with a pre-defined hierarchy over the set of class labels. In this work, we study the consistency of hierarchical classification algorithms with respect to a natural loss, namely the tree distance metric on the hierarchy tree of class labels, via the usage of calibrated surrogates. We first show that the Bayes optimal classifier for this loss classifies an instance according to the deepest node in the hierarchy such that the total conditional probability of the subtree rooted at the node is greater than \frac12. We exploit this insight to develop new consistent algorithm for hierarchical classification, that makes use of an algorithm known to be consistent for the “multiclass classification with reject option (MCRO)” problem as a sub-routine. Our experiments on a number of benchmark datasets show that the resulting algorithm, which we term OvA-Cascade, gives improved performance over other state-of-the-art hierarchical classification algorithms.'
volume: 37
URL: http://proceedings.mlr.press/v37/ramaswamy15.html
PDF: http://proceedings.mlr.press/v37/ramaswamy15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ramaswamy15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ramaswamy
given: Harish
- family: Tewari
given: Ambuj
- family: Agarwal
given: Shivani
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1852-1860
id: ramaswamy15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1852
lastpage: 1860
published: 2015-06-01 00:00:00 +0000
- title: 'Probabilistic Backpropagation for Scalable Learning of Bayesian Neural Networks'
abstract: 'Large multilayer neural networks trained with backpropagation have recently achieved state-of-the-art results in a wide range of problems. However, using backprop for neural net learning still has some disadvantages, e.g., having to tune a large number of hyperparameters to the data, lack of calibrated probabilistic predictions, and a tendency to overfit the training data. In principle, the Bayesian approach to learning neural networks does not have these problems. However, existing Bayesian techniques lack scalability to large dataset and network sizes. In this work we present a novel scalable method for learning Bayesian neural networks, called probabilistic backpropagation (PBP). Similar to classical backpropagation, PBP works by computing a forward propagation of probabilities through the network and then doing a backward computation of gradients. A series of experiments on ten real-world datasets show that PBP is significantly faster than other techniques, while offering competitive predictive abilities. Our experiments also show that PBP provides accurate estimates of the posterior variance on the network weights.'
volume: 37
URL: http://proceedings.mlr.press/v37/hernandez-lobatoc15.html
PDF: http://proceedings.mlr.press/v37/hernandez-lobatoc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hernandez-lobatoc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hernandez-Lobato
given: Jose Miguel
- family: Adams
given: Ryan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1861-1869
id: hernandez-lobatoc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1861
lastpage: 1869
published: 2015-06-01 00:00:00 +0000
- title: 'Active Nearest Neighbors in Changing Environments'
abstract: 'While classic machine learning paradigms assume training and test data are generated from the same process, domain adaptation addresses the more realistic setting in which the learner has large quantities of labeled data from some source task but limited or no labeled data from the target task it is attempting to learn. In this work, we give the first formal analysis showing that using active learning for domain adaptation yields a way to address the statistical challenges inherent in this setting. We propose a novel nonparametric algorithm, ANDA, that combines an active nearest neighbor querying strategy with nearest neighbor prediction. We provide analyses of its querying behavior and of finite sample convergence rates of the resulting classifier under covariate shift. Our experiments show that ANDA successfully corrects for dataset bias in multi-class image categorization.'
volume: 37
URL: http://proceedings.mlr.press/v37/berlind15.html
PDF: http://proceedings.mlr.press/v37/berlind15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-berlind15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Berlind
given: Christopher
- family: Urner
given: Ruth
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1870-1879
id: berlind15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1870
lastpage: 1879
published: 2015-06-01 00:00:00 +0000
- title: 'Bipartite Edge Prediction via Transductive Learning over Product Graphs'
abstract: 'This paper addresses the problem of predicting the missing edges of a bipartite graph where each side of the vertices has its own intrinsic structure. We propose a new optimization framework to map the two sides of the intrinsic structures onto the manifold structure of the edges via a graph product, and to reduce the original problem to vertex label propagation over the product graph. This framework enjoys flexible choices in the formulation of graph products, and supports a rich family of graph transduction schemes with scalable inference. Experiments on benchmark datasets for collaborative filtering, citation network analysis and prerequisite prediction of online courses show advantageous performance of the proposed approach over other state-of-the-art methods.'
volume: 37
URL: http://proceedings.mlr.press/v37/liuc15.html
PDF: http://proceedings.mlr.press/v37/liuc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-liuc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Liu
given: Hanxiao
- family: Yang
given: Yiming
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1880-1888
id: liuc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1880
lastpage: 1888
published: 2015-06-01 00:00:00 +0000
- title: 'Trust Region Policy Optimization'
abstract: 'In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified scheme, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.'
volume: 37
URL: http://proceedings.mlr.press/v37/schulman15.html
PDF: http://proceedings.mlr.press/v37/schulman15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-schulman15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Schulman
given: John
- family: Levine
given: Sergey
- family: Abbeel
given: Pieter
- family: Jordan
given: Michael
- family: Moritz
given: Philipp
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1889-1897
id: schulman15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1889
lastpage: 1897
published: 2015-06-01 00:00:00 +0000
- title: 'Discovering Temporal Causal Relations from Subsampled Data'
abstract: 'Granger causal analysis has been an important tool for causal analysis for time series in various fields, including neuroscience and economics, and recently it has been extended to include instantaneous effects between the time series to explain the contemporaneous dependence in the residuals. In this paper, we assume that the time series at the true causal frequency follow the vector autoregressive model. We show that when the data resolution becomes lower due to subsampling, neither the original Granger causal analysis nor the extended one is able to discover the underlying causal relations. We then aim to answer the following question: can we estimate the temporal causal relations at the right causal frequency from the subsampled data? Traditionally this suffers from the identifiability problems: under the Gaussianity assumption of the data, the solutions are generally not unique. We prove that, however, if the noise terms are non-Gaussian, the underlying model for the high frequency data is identifiable from subsampled data under mild conditions. We then propose an Expectation-Maximization (EM) approach and a variational inference approach to recover temporal causal relations from such subsampled data. Experimental results on both simulated and real data are reported to illustrate the performance of the proposed approaches.'
volume: 37
URL: http://proceedings.mlr.press/v37/gongb15.html
PDF: http://proceedings.mlr.press/v37/gongb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gongb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gong
given: Mingming
- family: Zhang
given: Kun
- family: Schoelkopf
given: Bernhard
- family: Tao
given: Dacheng
- family: Geiger
given: Philipp
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1898-1906
id: gongb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1898
lastpage: 1906
published: 2015-06-01 00:00:00 +0000
- title: 'Preference Completion: Large-scale Collaborative Ranking from Pairwise Comparisons'
abstract: 'In this paper we consider the collaborative ranking setting: a pool of users each provides a set of pairwise preferences over a small subset of the set of d possible items; from these we need to predict each user’s preferences for items s/he has not yet seen. We do so via fitting a rank r score matrix to the pairwise data, and provide two main contributions: (a) We show that an algorithm based on convex optimization provides good generalization guarantees once each user provides as few as O(r \log^2 d) pairwise comparisons — essentially matching the sample complexity required in the related matrix completion setting (which uses actual numerical as opposed to pairwise information), and also matching a lower bound we establish here. (b) We develop a large-scale non-convex implementation, which we call AltSVM, which trains a factored form of the matrix via alternating minimization (which we show reduces to alternating SVM problems), and scales and parallelizes very well to large problem settings. It also outperforms common baselines on many moderately large popular collaborative filtering datasets in both NDCG and other measures of ranking performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/park15.html
PDF: http://proceedings.mlr.press/v37/park15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-park15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Park
given: Dohyung
- family: Neeman
given: Joe
- family: Zhang
given: Jin
- family: Sanghavi
given: Sujay
- family: Dhillon
given: Inderjit
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1907-1916
id: park15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1907
lastpage: 1916
published: 2015-06-01 00:00:00 +0000
- title: 'Causal Inference by Identification of Vector Autoregressive Processes with Hidden Components'
abstract: 'A widely applied approach to causal inference from a time series X, often referred to as “(linear) Granger causal analysis”, is to simply regress present on past and interpret the regression matrix \hatB causally. However, if there is an unmeasured time series Z that influences X, then this approach can lead to wrong causal conclusions, i.e., distinct from those one would draw if one had additional information such as Z. In this paper we take a different approach: We assume that X together with some hidden Z forms a first order vector autoregressive (VAR) process with transition matrix A, and argue why it is more valid to interpret A causally instead of \hatB. Then we examine under which conditions the most important parts of A are identifiable or almost identifiable from only X. Essentially, sufficient conditions are (1) non-Gaussian, independent noise or (2) no influence from X to Z. We present two estimation algorithms that are tailored towards conditions (1) and (2), respectively, and evaluate them on synthetic and real-world data. We discuss how to check the model using X.'
volume: 37
URL: http://proceedings.mlr.press/v37/geiger15.html
PDF: http://proceedings.mlr.press/v37/geiger15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-geiger15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Geiger
given: Philipp
- family: Zhang
given: Kun
- family: Schoelkopf
given: Bernhard
- family: Gong
given: Mingming
- family: Janzing
given: Dominik
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1917-1925
id: geiger15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1917
lastpage: 1925
published: 2015-06-01 00:00:00 +0000
- title: 'On Symmetric and Asymmetric LSHs for Inner Product Search'
abstract: 'We consider the problem of designing locality sensitive hashes (LSH) for inner product similarity, and of the power of asymmetric hashes in this context. Shrivastava and Li (2014a) argue that there is no symmetric LSH for the problem and propose an asymmetric LSH based on different mappings for query and database points. However, we show there does exist a simple symmetric LSH that enjoys stronger guarantees and better empirical performance than the asymmetric LSH they suggest. We also show a variant of the settings where asymmetry is in-fact needed, but there a different asymmetric LSH is required.'
volume: 37
URL: http://proceedings.mlr.press/v37/neyshabur15.html
PDF: http://proceedings.mlr.press/v37/neyshabur15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-neyshabur15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Neyshabur
given: Behnam
- family: Srebro
given: Nathan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1926-1934
id: neyshabur15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1926
lastpage: 1934
published: 2015-06-01 00:00:00 +0000
- title: 'The Kendall and Mallows Kernels for Permutations'
abstract: 'We show that the widely used Kendall tau correlation coefficient is a positive definite kernel for permutations. It offers a computationally attractive alternative to more complex kernels on the symmetric group to learn from rankings, or to learn to rank. We show how to extend it to partial rankings or rankings with uncertainty, and demonstrate promising results on high-dimensional classification problems in biomedical applications.'
volume: 37
URL: http://proceedings.mlr.press/v37/jiao15.html
PDF: http://proceedings.mlr.press/v37/jiao15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-jiao15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Jiao
given: Yunlong
- family: Vert
given: Jean-Philippe
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1935-1944
id: jiao15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1935
lastpage: 1944
published: 2015-06-01 00:00:00 +0000
- title: 'Bayesian Multiple Target Localization'
abstract: 'We consider the problem of quickly localizing multiple targets by asking questions of the form “How many targets are within this set" while obtaining noisy answers. This setting is a generalization to multiple targets of the game of 20 questions in which only a single target is queried. We assume that the targets are points on the real line, or in a two dimensional plane for the experiments, drawn independently from a known distribution. We evaluate the performance of a policy using the expected entropy of the posterior distribution after a fixed number of questions with noisy answers. We derive a lower bound for the value of this problem and study a specific policy, named the dyadic policy. We show that this policy achieves a value which is no more than twice this lower bound when answers are noise-free, and show a more general constant factor approximation guarantee for the noisy setting. We present an empirical evaluation of this policy on simulated data for the problem of detecting multiple instances of the same object in an image. Finally, we present experiments on localizing multiple faces simultaneously on real images.'
volume: 37
URL: http://proceedings.mlr.press/v37/rajan15.html
PDF: http://proceedings.mlr.press/v37/rajan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-rajan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Rajan
given: Purnima
- family: Han
given: Weidong
- family: Sznitman
given: Raphael
- family: Frazier
given: Peter
- family: Jedynak
given: Bruno
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1945-1953
id: rajan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1945
lastpage: 1953
published: 2015-06-01 00:00:00 +0000
- title: 'Submodularity in Data Subset Selection and Active Learning'
abstract: 'We study the problem of selecting a subset of big data to train a classifier while incurring minimal performance loss. We show the connection of submodularity to the data likelihood functions for Naive Bayes (NB) and Nearest Neighbor (NN) classifiers, and formulate the data subset selection problems for these classifiers as constrained submodular maximization. Furthermore, we apply this framework to active learning and propose a novel scheme filtering active submodular selection (FASS), where we combine the uncertainty sampling method with a submodular data subset selection framework. We extensively evaluate the proposed framework on text categorization and handwritten digit recognition tasks with four different classifiers, including Deep Neural Network (DNN) based classifiers. Empirical results indicate that the proposed framework yields significant improvement over the state-of-the-art algorithms on all classifiers.'
volume: 37
URL: http://proceedings.mlr.press/v37/wei15.html
PDF: http://proceedings.mlr.press/v37/wei15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wei15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wei
given: Kai
- family: Iyer
given: Rishabh
- family: Bilmes
given: Jeff
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1954-1963
id: wei15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1954
lastpage: 1963
published: 2015-06-01 00:00:00 +0000
- title: 'Variational Generative Stochastic Networks with Collaborative Shaping'
abstract: 'We develop an approach to training generative models based on unrolling a variational auto-encoder into a Markov chain, and shaping the chain’s trajectories using a technique inspired by recent work in Approximate Bayesian computation. We show that the global minimizer of the resulting objective is achieved when the generative model reproduces the target distribution. To allow finer control over the behavior of the models, we add a regularization term inspired by techniques used for regularizing certain types of policy search in reinforcement learning. We present empirical results on the MNIST and TFD datasets which show that our approach offers state-of-the-art performance, both quantitatively and from a qualitative point of view.'
volume: 37
URL: http://proceedings.mlr.press/v37/bachman15.html
PDF: http://proceedings.mlr.press/v37/bachman15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-bachman15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Bachman
given: Philip
- family: Precup
given: Doina
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1964-1972
id: bachman15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1964
lastpage: 1972
published: 2015-06-01 00:00:00 +0000
- title: 'Adding vs. Averaging in Distributed Primal-Dual Optimization'
abstract: 'Distributed optimization methods for large-scale machine learning suffer from a communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and accurately aggregating partial work from different machines. In this paper, we present a novel generalization of the recent communication-efficient primal-dual framework (COCOA) for distributed optimization. Our framework, COCOA+, allows for additive combination of local updates to the global parameters at each iteration, whereas previous schemes only allow conservative averaging. We give stronger (primal-dual) convergence rate guarantees for both COCOA as well as our new variants, and generalize the theory for both methods to cover non-smooth convex loss functions. We provide an extensive experimental comparison that shows the markedly improved performance of COCOA+ on several real-world distributed datasets, especially when scaling up the number of machines.'
volume: 37
URL: http://proceedings.mlr.press/v37/mab15.html
PDF: http://proceedings.mlr.press/v37/mab15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-mab15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ma
given: Chenxin
- family: Smith
given: Virginia
- family: Jaggi
given: Martin
- family: Jordan
given: Michael
- family: Richtarik
given: Peter
- family: Takac
given: Martin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1973-1982
id: mab15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1973
lastpage: 1982
published: 2015-06-01 00:00:00 +0000
- title: 'Feature-Budgeted Random Forest'
abstract: 'We seek decision rules for \it prediction-time cost reduction, where complete data is available for training, but during prediction-time, each feature can only be acquired for an additional cost. We propose a novel random forest algorithm to minimize prediction error for a user-specified \it average feature acquisition budget. While random forests yield strong generalization performance, they do not explicitly account for feature costs and furthermore require low correlation among trees, which amplifies costs. Our random forest grows trees with low acquisition cost and high strength based on greedy minimax cost-weighted-impurity splits. Theoretically, we establish near-optimal acquisition cost guarantees for our algorithm. Empirically, on a number of benchmark datasets we demonstrate competitive accuracy-cost curves against state-of-the-art prediction-time algorithms.'
volume: 37
URL: http://proceedings.mlr.press/v37/nan15.html
PDF: http://proceedings.mlr.press/v37/nan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-nan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Nan
given: Feng
- family: Wang
given: Joseph
- family: Saligrama
given: Venkatesh
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1983-1991
id: nan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1983
lastpage: 1991
published: 2015-06-01 00:00:00 +0000
- title: 'Entropic Graph-based Posterior Regularization'
abstract: 'Graph smoothness objectives have achieved great success in semi-supervised learning but have not yet been applied extensively to unsupervised generative models. We define a new class of entropic graph-based posterior regularizers that augment a probabilistic model by encouraging pairs of nearby variables in a regularization graph to have similar posterior distributions. We present a three-way alternating optimization algorithm with closed-form updates for performing inference on this joint model and learning its parameters. This method admits updates linear in the degree of the regularization graph, exhibits monotone convergence and is easily parallelizable. We are motivated by applications in computational biology in which temporal models such as hidden Markov models are used to learn a human-interpretable representation of genomic data. On a synthetic problem, we show that our method outperforms existing methods for graph-based regularization and a comparable strategy for incorporating long-range interactions using existing methods for approximate inference. Using genome-scale functional genomics data, we integrate genome 3D interaction data into existing models for genome annotation and demonstrate significant improvements in predicting genomic activity.'
volume: 37
URL: http://proceedings.mlr.press/v37/libbrecht15.html
PDF: http://proceedings.mlr.press/v37/libbrecht15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-libbrecht15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Libbrecht
given: Maxwell
- family: Hoffman
given: Michael
- family: Bilmes
given: Jeff
- family: Noble
given: William
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 1992-2001
id: libbrecht15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 1992
lastpage: 2001
published: 2015-06-01 00:00:00 +0000
- title: 'Unsupervised Riemannian Metric Learning for Histograms Using Aitchison Transformations'
abstract: 'Many applications in machine learning handle bags of features or histograms rather than simple vectors. In that context, defining a proper geometry to compare histograms can be crucial for many machine learning algorithms. While one might be tempted to use a default metric such as the Euclidean metric, empirical evidence shows this may not be the best choice when dealing with observations that lie in the probability simplex. Additionally, it might be desirable to choose a metric adaptively based on data. We consider in this paper the problem of learning a Riemannian metric on the simplex given unlabeled histogram data. We follow the approach of Lebanon(2006), who proposed to estimate such a metric within a parametric family by maximizing the inverse volume of a given data set of points under that metric. The metrics we consider on the multinomial simplex are pull-back metrics of the Fisher information parameterized by operations within the simplex known as Aitchison(1982) transformations. We propose an algorithmic approach to maximize inverse volumes using sampling and contrastive divergences. We provide experimental evidence that the metric obtained under our proposal outperforms alternative approaches.'
volume: 37
URL: http://proceedings.mlr.press/v37/le15.html
PDF: http://proceedings.mlr.press/v37/le15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-le15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Le
given: Tam
- family: Cuturi
given: Marco
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2002-2011
id: le15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2002
lastpage: 2011
published: 2015-06-01 00:00:00 +0000
- title: 'Low-Rank Matrix Recovery from Row-and-Column Affine Measurements'
abstract: 'We propose and study a row-and-column affine measurement scheme for low-rank matrix recovery. Each measurement is a linear combination of elements in one row or one column of a matrix X. This setting arises naturally in applications from different domains. However, current algorithms developed for standard matrix recovery problems do not perform well in our case, hence the need for developing new algorithms and theory for our problem. We propose a simple algorithm for the problem based on Singular Value Decomposition (SVD) and least-squares (LS), which we term alg. We prove that (a simplified version of) our algorithm can recover X exactly with the minimum possible number of measurements in the noiseless case. In the general noisy case, we prove performance guarantees on the reconstruction accuracy under the Frobenius norm. In simulations, our row-and-column design and alg algorithm show improved speed, and comparable and in some cases better accuracy compared to standard measurements designs and algorithms. Our theoretical and experimental results suggest that the proposed row-and-column affine measurements scheme, together with our recovery algorithm, may provide a powerful framework for affine matrix reconstruction.'
volume: 37
URL: http://proceedings.mlr.press/v37/zuk15.html
PDF: http://proceedings.mlr.press/v37/zuk15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zuk15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zuk
given: Or
- family: Wagner
given: Avishai
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2012-2020
id: zuk15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2012
lastpage: 2020
published: 2015-06-01 00:00:00 +0000
- title: 'Algorithms for the Hard Pre-Image Problem of String Kernels and the General Problem of String Prediction'
abstract: 'We address the pre-image problem encountered in structured output prediction and the one of finding a string maximizing the prediction function of various kernel-based classifiers and regressors. We demonstrate that these problems reduce to a common combinatorial problem valid for many string kernels. For this problem, we propose an upper bound on the prediction function which has low computational complexity and which can be used in a branch and bound search algorithm to obtain optimal solutions. We also show that for many string kernels, the complexity of the problem increases significantly when the kernel is normalized. On the optical word recognition task, the exact solution of the pre-image problem is shown to significantly improve the prediction accuracy in comparison with an approximation found by the best known heuristic. On the task of finding a string maximizing the prediction function of kernel-based classifiers and regressors, we highlight that existing methods can be biased toward long strings that contain many repeated symbols. We demonstrate that this bias is removed when using normalized kernels. Finally, we present results for the discovery of lead compounds in drug discovery. The source code can be found at https://github.com/a-ro/preimage'
volume: 37
URL: http://proceedings.mlr.press/v37/giguere15.html
PDF: http://proceedings.mlr.press/v37/giguere15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-giguere15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Giguère
given: Sébastien
- family: Rolland
given: Amélie
- family: Laviolette
given: Francois
- family: Marchand
given: Mario
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2021-2029
id: giguere15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2021
lastpage: 2029
published: 2015-06-01 00:00:00 +0000
- title: 'A Multitask Point Process Predictive Model'
abstract: 'Point process data are commonly observed in fields like healthcare and social science. Designing predictive models for such event streams is an under-explored problem, due to often scarce training data. In this work we propose a multitask point process model, leveraging information from all tasks via a hierarchical Gaussian process (GP). Nonparametric learning functions implemented by a GP, which map from past events to future rates, allow analysis of flexible arrival patterns. To facilitate efficient inference, we propose a sparse construction for this hierarchical model, and derive a variational Bayes method for learning and inference. Experimental results are shown on both synthetic data and an application on real electronic health records.'
volume: 37
URL: http://proceedings.mlr.press/v37/lian15.html
PDF: http://proceedings.mlr.press/v37/lian15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-lian15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lian
given: Wenzhao
- family: Henao
given: Ricardo
- family: Rao
given: Vinayak
- family: Lucas
given: Joseph
- family: Carin
given: Lawrence
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2030-2038
id: lian15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2030
lastpage: 2038
published: 2015-06-01 00:00:00 +0000
- title: 'A Hybrid Approach for Probabilistic Inference using Random Projections'
abstract: 'We introduce a new meta-algorithm for probabilistic inference in graphical models based on random projections. The key idea is to use approximate inference algorithms for an (exponentially) large number of samples, obtained by randomly projecting the original statistical model using universal hash functions. In the case where the approximate inference algorithm is a variational approximation, this approach can be viewed as interpolating between sampling-based and variational techniques. The number of samples used controls the trade-off between the accuracy of the approximate inference algorithm and the variance of the estimator. We show empirically that by using random projections, we can improve the accuracy of common approximate inference algorithms.'
volume: 37
URL: http://proceedings.mlr.press/v37/zhuc15.html
PDF: http://proceedings.mlr.press/v37/zhuc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-zhuc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Zhu
given: Michael
- family: Ermon
given: Stefano
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2039-2047
id: zhuc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2039
lastpage: 2047
published: 2015-06-01 00:00:00 +0000
- title: 'Show, Attend and Tell: Neural Image Caption Generation with Visual Attention'
abstract: 'Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to automatically learn to fix its gaze on salient objects while generating the corresponding words in the output sequence. We validate the use of attention with state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.'
volume: 37
URL: http://proceedings.mlr.press/v37/xuc15.html
PDF: http://proceedings.mlr.press/v37/xuc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-xuc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Xu
given: Kelvin
- family: Ba
given: Jimmy
- family: Kiros
given: Ryan
- family: Cho
given: Kyunghyun
- family: Courville
given: Aaron
- family: Salakhudinov
given: Ruslan
- family: Zemel
given: Rich
- family: Bengio
given: Yoshua
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2048-2057
id: xuc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2048
lastpage: 2057
published: 2015-06-01 00:00:00 +0000
- title: 'Learning to Search Better than Your Teacher'
abstract: 'Methods for learning to search for structured prediction typically imitate a reference policy, with existing theoretical guarantees demonstrating low regret compared to that reference. This is unsatisfactory in many applications where the reference policy is suboptimal and the goal of learning is to improve upon it. Can learning to search work even when the reference is poor? We provide a new learning to search algorithm, LOLS, which does well relative to the reference policy, but additionally guarantees low regret compared to deviations from the learned policy: a local-optimality guarantee. Consequently, LOLS can improve upon the reference policy, unlike previous algorithms. This enables us to develop structured contextual bandits, a partial information structured prediction setting with many potential applications.'
volume: 37
URL: http://proceedings.mlr.press/v37/changb15.html
PDF: http://proceedings.mlr.press/v37/changb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-changb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Chang
given: Kai-Wei
- family: Krishnamurthy
given: Akshay
- family: Agarwal
given: Alekh
- family: Daume
given: Hal
- family: Langford
given: John
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2058-2066
id: changb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2058
lastpage: 2066
published: 2015-06-01 00:00:00 +0000
- title: 'Gated Feedback Recurrent Neural Networks'
abstract: 'In this work, we propose a novel recurrent neural network (RNN) architecture. The proposed RNN, gated-feedback RNN (GF-RNN), extends the existing approach of stacking multiple recurrent layers by allowing and controlling signals flowing from upper recurrent layers to lower layers using a global gating unit for each pair of layers. The recurrent signals exchanged between layers are gated adaptively based on the previous hidden states and the current input. We evaluated the proposed GF-RNN with different types of recurrent units, such as tanh, long short-term memory and gated recurrent units, on the tasks of character-level language modeling and Python program evaluation. Our empirical evaluation of different RNN units, revealed that in both tasks, the GF-RNN outperforms the conventional approaches to build deep stacked RNNs. We suggest that the improvement arises because the GF-RNN can adaptively assign different layers to different timescales and layer-to-layer interactions (including the top-down ones which are not usually present in a stacked RNN) by learning to gate these interactions.'
volume: 37
URL: http://proceedings.mlr.press/v37/chung15.html
PDF: http://proceedings.mlr.press/v37/chung15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-chung15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Chung
given: Junyoung
- family: Gulcehre
given: Caglar
- family: Cho
given: Kyunghyun
- family: Bengio
given: Yoshua
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2067-2075
id: chung15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2067
lastpage: 2075
published: 2015-06-01 00:00:00 +0000
- title: 'Context-based Unsupervised Data Fusion for Decision Making'
abstract: 'Big Data received from sources such as social media, in-stream monitoring systems, networks, and markets is often mined for discovering patterns, detecting anomalies, and making decisions or predictions. In distributed learning and real-time processing of Big Data, ensemble-based systems in which a fusion center (FC) is used to combine the local decisions of several classifiers, have shown to be superior to single expert systems. However, optimal design of the FC requires knowledge of the accuracy of the individual classifiers which, in many cases, is not available. Moreover, in many applications supervised training of the FC is not feasible since the true labels of the data set are not available. In this paper, we propose an unsupervised joint estimation-detection scheme to estimate the accuracies of the local classifiers as functions of data context and to fuse the local decisions of the classifiers. Numerical results show the dramatic improvement of the proposed method as compared with the state of the art approaches.'
volume: 37
URL: http://proceedings.mlr.press/v37/soltanmohammadi15.html
PDF: http://proceedings.mlr.press/v37/soltanmohammadi15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-soltanmohammadi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Soltanmohammadi
given: Erfan
- family: Naraghi-Pour
given: Mort
- family: Schaar
given: Mihaela
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2076-2084
id: soltanmohammadi15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2076
lastpage: 2084
published: 2015-06-01 00:00:00 +0000
- title: 'Phrase-based Image Captioning'
abstract: 'Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely linear model to embed an image representation (generated from a previously trained Convolutional Neural Network) into a multimodal space that is common to the images and the phrases that are used to described them. The system is then able to infer phrases from a given image sample. Based on the sentence description statistics, we propose a simple language model that can produce relevant descriptions for a given test image using the phrases inferred. Our approach, which is considerably simpler than state-of-the-art models, achieves comparable results in two popular datasets for the task: Flickr30k and the recently proposed Microsoft COCO.'
volume: 37
URL: http://proceedings.mlr.press/v37/lebret15.html
PDF: http://proceedings.mlr.press/v37/lebret15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-lebret15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lebret
given: Remi
- family: Pinheiro
given: Pedro
- family: Collobert
given: Ronan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2085-2094
id: lebret15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2085
lastpage: 2094
published: 2015-06-01 00:00:00 +0000
- title: 'Celeste: Variational inference for a generative model of astronomical images'
abstract: 'We present a new, fully generative model of optical telescope image sets, along with a variational procedure for inference. Each pixel intensity is treated as a Poisson random variable, with a rate parameter dependent on latent properties of stars and galaxies. Key latent properties are themselves random, with scientific prior distributions constructed from large ancillary data sets. We check our approach on synthetic images. We also run it on images from a major sky survey, where it exceeds the performance of the current state-of-the-art method for locating celestial bodies and measuring their colors.'
volume: 37
URL: http://proceedings.mlr.press/v37/regier15.html
PDF: http://proceedings.mlr.press/v37/regier15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-regier15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Regier
given: Jeffrey
- family: Miller
given: Andrew
- family: McAuliffe
given: Jon
- family: Adams
given: Ryan
- family: Hoffman
given: Matt
- family: Lang
given: Dustin
- family: Schlegel
given: David
- family: Prabhat
given: Mr
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2095-2103
id: regier15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2095
lastpage: 2103
published: 2015-06-01 00:00:00 +0000
- title: 'Distributional Rank Aggregation, and an Axiomatic Analysis'
abstract: 'The rank aggregation problem has been studied with varying desiderata in varied communities such as Theoretical Computer Science, Statistics, Information Retrieval and Social Welfare Theory. We introduce a variant of this problem we call distributional rank aggregation, where the ranking data is only available via the induced distribution over the set of all permutations. We provide a novel translation of the usual social welfare theory axioms to this setting. As we show this allows for a more quantitative characterization of these axioms: which then are not only less prone to misinterpretation, but also allow simpler proofs for some key impossibility theorems. Most importantly, these quantitative characterizations lead to natural and novel relaxations of these axioms, which as we show, allow us to get around celebrated impossibility results in social choice theory. We are able to completely characterize the class of positional scoring rules with respect to our axioms and show that Borda Count is optimal in a certain sense.'
volume: 37
URL: http://proceedings.mlr.press/v37/prasad15.html
PDF: http://proceedings.mlr.press/v37/prasad15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-prasad15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Prasad
given: Adarsh
- family: Pareek
given: Harsh
- family: Ravikumar
given: Pradeep
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2104-2112
id: prasad15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2104
lastpage: 2112
published: 2015-06-01 00:00:00 +0000
- title: 'Gradient-based Hyperparameter Optimization through Reversible Learning'
abstract: 'Tuning hyperparameters of learning algorithms is hard because gradients are usually unavailable. We compute exact gradients of cross-validation performance with respect to all hyperparameters by chaining derivatives backwards through the entire training procedure. These gradients allow us to optimize thousands of hyperparameters, including step-size and momentum schedules, weight initialization distributions, richly parameterized regularization schemes, and neural network architectures. We compute hyperparameter gradients by exactly reversing the dynamics of stochastic gradient descent with momentum.'
volume: 37
URL: http://proceedings.mlr.press/v37/maclaurin15.html
PDF: http://proceedings.mlr.press/v37/maclaurin15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-maclaurin15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Maclaurin
given: Dougal
- family: Duvenaud
given: David
- family: Adams
given: Ryan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2113-2122
id: maclaurin15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2113
lastpage: 2122
published: 2015-06-01 00:00:00 +0000
- title: 'Bimodal Modelling of Source Code and Natural Language'
abstract: 'We consider the problem of building probabilistic models that jointly model short natural language utterances and source code snippets. The aim is to bring together recent work on statistical modelling of source code and work on bimodal models of images and natural language. The resulting models are useful for a variety of tasks that involve natural language and source code. We demonstrate their performance on two retrieval tasks: retrieving source code snippets given a natural language query, and retrieving natural language descriptions given a source code query (i.e., source code captioning). The experiments show there to be promise in this direction, and that modelling the structure of source code is helpful towards the retrieval tasks.'
volume: 37
URL: http://proceedings.mlr.press/v37/allamanis15.html
PDF: http://proceedings.mlr.press/v37/allamanis15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-allamanis15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Allamanis
given: Miltos
- family: Tarlow
given: Daniel
- family: Gordon
given: Andrew
- family: Wei
given: Yi
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2123-2132
id: allamanis15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2123
lastpage: 2132
published: 2015-06-01 00:00:00 +0000
- title: 'Cheap Bandits'
abstract: 'We consider stochastic sequential learning problems where the learner can observe the average reward of several actions. Such a setting is interesting in many applications involving monitoring and surveillance, where the set of the actions to observe represent some (geographical) area. The importance of this setting is that in these applications, it is actually cheaper to observe average reward of a group of actions rather than the reward of a single action. We show that when the reward is smooth over a given graph representing the neighboring actions, we can maximize the cumulative reward of learning while minimizing the sensing cost. In this paper we propose CheapUCB, an algorithm that matches the regret guarantees of the known algorithms for this setting and at the same time guarantees a linear cost again over them. As a by-product of our analysis, we establish a Ω(\sqrt(dT)) lower bound on the cumulative regret of spectral bandits for a class of graphs with effective dimension d.'
volume: 37
URL: http://proceedings.mlr.press/v37/hanawal15.html
PDF: http://proceedings.mlr.press/v37/hanawal15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hanawal15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hanawal
given: Manjesh
- family: Saligrama
given: Venkatesh
- family: Valko
given: Michal
- family: Munos
given: Remi
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2133-2142
id: hanawal15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2133
lastpage: 2142
published: 2015-06-01 00:00:00 +0000
- title: 'Subsampling Methods for Persistent Homology'
abstract: 'Persistent homology is a multiscale method for analyzing the shape of sets and functions from point cloud data arising from an unknown distribution supported on those sets. When the size of the sample is large, direct computation of the persistent homology is prohibitive due to the combinatorial nature of the existing algorithms. We propose to compute the persistent homology of several subsamples of the data and then combine the resulting estimates. We study the risk of two estimators and we prove that the subsampling approach carries stable topological information while achieving a great reduction in computational complexity.'
volume: 37
URL: http://proceedings.mlr.press/v37/chazal15.html
PDF: http://proceedings.mlr.press/v37/chazal15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-chazal15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Chazal
given: Frederic
- family: Fasy
given: Brittany
- family: Lecci
given: Fabrizio
- family: Michel
given: Bertrand
- family: Rinaldo
given: Alessandro
- family: Wasserman
given: Larry
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2143-2151
id: chazal15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2143
lastpage: 2151
published: 2015-06-01 00:00:00 +0000
- title: 'An embarrassingly simple approach to zero-shot learning'
abstract: 'Zero-shot learning consists in learning how to recognize new concepts by just having a description of them. Many sophisticated approaches have been proposed to address the challenges this problem comprises. In this paper we describe a zero-shot learning approach that can be implemented in just one line of code, yet it is able to outperform state of the art approaches on standard datasets. The approach is based on a more general framework which models the relationships between features, attributes, and classes as a two linear layers network, where the weights of the top layer are not learned but are given by the environment. We further provide a learning bound on the generalization error of this kind of approaches, by casting them as domain adaptation methods. In experiments carried out on three standard real datasets, we found that our approach is able to perform significantly better than the state of art on all of them, obtaining a ratio of improvement up to 17%.'
volume: 37
URL: http://proceedings.mlr.press/v37/romera-paredes15.html
PDF: http://proceedings.mlr.press/v37/romera-paredes15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-romera-paredes15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Romera-Paredes
given: Bernardino
- family: Torr
given: Philip
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2152-2161
id: romera-paredes15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2152
lastpage: 2161
published: 2015-06-01 00:00:00 +0000
- title: 'Binary Embedding: Fundamental Limits and Fast Algorithm'
abstract: 'Binary embedding is a nonlinear dimension reduction methodology where high dimensional data are embedded into the Hamming cube while preserving the structure of the original space. Specifically, for an arbitrary N distinct points in \mathbbS^p-1, our goal is to encode each point using m-dimensional binary strings such that we can reconstruct their geodesic distance up to δuniform distortion. Existing binary embedding algorithms either lack theoretical guarantees or suffer from running time O(mp). We make three contributions: (1) we establish a lower bound that shows any binary embedding oblivious to the set of points requires m =Ω(\frac1δ^2\logN) bits and a similar lower bound for non-oblivious embeddings into Hamming distance; (2) we propose a novel fast binary embedding algorithm with provably optimal bit complexity m = O(\frac1 δ^2\logN) and near linear running time O(p \log p) whenever \log N ≪δ\sqrtp, with a slightly worse running time for larger \log N; (3) we also provide an analytic result about embedding a general set of points K ⊆\mathbbS^p-1 with even infinite size. Our theoretical findings are supported through experiments on both synthetic and real data sets.'
volume: 37
URL: http://proceedings.mlr.press/v37/yi15.html
PDF: http://proceedings.mlr.press/v37/yi15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yi
given: Xinyang
- family: Caramanis
given: Constantine
- family: Price
given: Eric
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2162-2170
id: yi15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2162
lastpage: 2170
published: 2015-06-01 00:00:00 +0000
- title: 'Scalable Bayesian Optimization Using Deep Neural Networks'
abstract: 'Bayesian optimization is an effective methodology for the global optimization of functions with expensive evaluations. It relies on querying a distribution over functions defined by a relatively cheap surrogate model. An accurate model for this distribution over functions is critical to the effectiveness of the approach, and is typically fit using Gaussian processes (GPs). However, since GPs scale cubically with the number of observations, it has been challenging to handle objectives whose optimization requires many evaluations, and as such, massively parallelizing the optimization. In this work, we explore the use of neural networks as an alternative to GPs to model distributions over functions. We show that performing adaptive basis function regression with a neural network as the parametric form performs competitively with state-of-the-art GP-based approaches, but scales linearly with the number of data rather than cubically. This allows us to achieve a previously intractable degree of parallelism, which we apply to large scale hyperparameter optimization, rapidly finding competitive models on benchmark object recognition tasks using convolutional networks, and image caption generation using neural language models.'
volume: 37
URL: http://proceedings.mlr.press/v37/snoek15.html
PDF: http://proceedings.mlr.press/v37/snoek15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-snoek15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Snoek
given: Jasper
- family: Rippel
given: Oren
- family: Swersky
given: Kevin
- family: Kiros
given: Ryan
- family: Satish
given: Nadathur
- family: Sundaram
given: Narayanan
- family: Patwary
given: Mostofa
- family: Prabhat
given: Mr
- family: Adams
given: Ryan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2171-2180
id: snoek15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2171
lastpage: 2180
published: 2015-06-01 00:00:00 +0000
- title: 'How Hard is Inference for Structured Prediction?'
abstract: 'Structured prediction tasks in machine learning involve the simultaneous prediction of multiple labels. This is often done by maximizing a score function on the space of labels, which decomposes as a sum of pairwise elements, each depending on two specific labels. The goal of this paper is to develop a theoretical explanation of the empirical effectiveness of heuristic inference algorithms for solving such structured prediction problems. We study the minimum-achievable expected Hamming error in such problems, highlighting the case of 2D grid graphs, which are common in machine vision applications. Our main theorems provide tight upper and lower bounds on this error, as well as a polynomial-time algorithm that achieves the bound.'
volume: 37
URL: http://proceedings.mlr.press/v37/globerson15.html
PDF: http://proceedings.mlr.press/v37/globerson15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-globerson15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Globerson
given: Amir
- family: Roughgarden
given: Tim
- family: Sontag
given: David
- family: Yildirim
given: Cafer
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2181-2190
id: globerson15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2181
lastpage: 2190
published: 2015-06-01 00:00:00 +0000
- title: 'Online Time Series Prediction with Missing Data'
abstract: 'We consider the problem of time series prediction in the presence of missing data. We cast the problem as an online learning problem in which the goal of the learner is to minimize prediction error. We then devise an efficient algorithm for the problem, which is based on autoregressive model, and does not assume any structure on the missing data nor on the mechanism that generates the time series. We show that our algorithm’s performance asymptotically approaches the performance of the best AR predictor in hindsight, and corroborate the theoretic results with an empirical study on synthetic and real-world data.'
volume: 37
URL: http://proceedings.mlr.press/v37/anava15.html
PDF: http://proceedings.mlr.press/v37/anava15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-anava15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Anava
given: Oren
- family: Hazan
given: Elad
- family: Zeevi
given: Assaf
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2191-2199
id: anava15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2191
lastpage: 2199
published: 2015-06-01 00:00:00 +0000
- title: 'Proteins, Particles, and Pseudo-Max-Marginals: A Submodular Approach'
abstract: 'Variants of max-product (MP) belief propagation effectively find modes of many complex graphical models, but are limited to discrete distributions. Diverse particle max-product (D-PMP) robustly approximates max-product updates in continuous MRFs using stochastically sampled particles, but previous work was specialized to tree-structured models. Motivated by the challenging problem of protein side chain prediction, we extend D-PMP in several key ways to create a generic MAP inference algorithm for loopy models. We define a modified diverse particle selection objective that is provably submodular, leading to an efficient greedy algorithm with rigorous optimality guarantees, and corresponding max-marginal error bounds. We further incorporate tree-reweighted variants of the MP algorithm to allow provable verification of global MAP recovery in many models. Our general-purpose Matlab library is applicable to a wide range of pairwise graphical models, and we validate our approach using optical flow benchmarks. We further demonstrate superior side chain prediction accuracy compared to baseline algorithms from the state-of-the-art Rosetta package.'
volume: 37
URL: http://proceedings.mlr.press/v37/pacheco15.html
PDF: http://proceedings.mlr.press/v37/pacheco15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-pacheco15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Pacheco
given: Jason
- family: Sudderth
given: Erik
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2200-2208
id: pacheco15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2200
lastpage: 2208
published: 2015-06-01 00:00:00 +0000
- title: 'A Fast Variational Approach for Learning Markov Random Field Language Models'
abstract: 'Language modelling is a fundamental building block of natural language processing. However, in practice the size of the vocabulary limits the distributions applicable for this task: specifically, one has to either resort to local optimization methods, such as those used in neural language models, or work with heavily constrained distributions. In this work, we take a step towards overcoming these difficulties. We present a method for global-likelihood optimization of a Markov random field language model exploiting long-range contexts in time independent of the corpus size. We take a variational approach to optimizing the likelihood and exploit underlying symmetries to greatly simplify learning. We demonstrate the efficiency of this method both for language modelling and for part-of-speech tagging.'
volume: 37
URL: http://proceedings.mlr.press/v37/jernite15.html
PDF: http://proceedings.mlr.press/v37/jernite15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-jernite15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Jernite
given: Yacine
- family: Rush
given: Alexander
- family: Sontag
given: David
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2209-2217
id: jernite15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2209
lastpage: 2217
published: 2015-06-01 00:00:00 +0000
- title: 'Removing systematic errors for exoplanet search via latent causes'
abstract: 'We describe a method for removing the effect of confounders in order to reconstruct a latent quantity of interest. The method, referred to as \em half-sibling regression, is inspired by recent work in causal inference using additive noise models. We provide a theoretical justification and illustrate the potential of the method in a challenging astronomy application.'
volume: 37
URL: http://proceedings.mlr.press/v37/scholkopf15.html
PDF: http://proceedings.mlr.press/v37/scholkopf15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-scholkopf15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Schölkopf
given: Bernhard
- family: Hogg
given: David
- family: Wang
given: Dun
- family: Foreman-Mackey
given: Dan
- family: Janzing
given: Dominik
- family: Simon-Gabriel
given: Carl-Johann
- family: Peters
given: Jonas
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2218-2226
id: scholkopf15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2218
lastpage: 2226
published: 2015-06-01 00:00:00 +0000
- title: 'Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes'
abstract: 'In this paper we propose an efficient, scalable non-parametric Gaussian process model for inference on Poisson point processes. Our model does not resort to gridding the domain or to introducing latent thinning points. Unlike competing models that scale as O(n^3) over n data points, our model has a complexity O(nk^2) where k << n. We propose a MCMC sampler and show that the model obtained is faster, more accurate and generates less correlated samples than competing approaches on both synthetic and real-life data. Finally, we show that our model easily handles data sizes not considered thus far by alternate approaches.'
volume: 37
URL: http://proceedings.mlr.press/v37/samo15.html
PDF: http://proceedings.mlr.press/v37/samo15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-samo15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Samo
given: Yves-Laurent Kom
- family: Roberts
given: Stephen
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2227-2236
id: samo15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2227
lastpage: 2236
published: 2015-06-01 00:00:00 +0000
- title: 'Correlation Clustering in Data Streams'
abstract: 'In this paper, we address the problem of \emphcorrelation clustering in the dynamic data stream model. The stream consists of updates to the edge weights of a graph on n nodes and the goal is to find a node-partition such that the end-points of negative-weight edges are typically in different clusters whereas the end-points of positive-weight edges are typically in the same cluster. We present polynomial-time, O(n⋅\textpolylog n)-space approximation algorithms for natural problems that arise. We first develop data structures based on linear sketches that allow the “quality” of a given node-partition to be measured. We then combine these data structures with convex programming and sampling techniques to solve the relevant approximation problem. However the standard LP and SDP formulations are not obviously solvable in O(n⋅\textpolylog n)-space. Our work presents space-efficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. Note that the improved space and running-time bounds achieved from streaming algorithms are also useful for offline settings such as MapReduce models.'
volume: 37
URL: http://proceedings.mlr.press/v37/ahn15.html
PDF: http://proceedings.mlr.press/v37/ahn15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ahn15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ahn
given: KookJin
- family: Cormode
given: Graham
- family: Guha
given: Sudipto
- family: McGregor
given: Andrew
- family: Wirth
given: Anthony
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2237-2246
id: ahn15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2237
lastpage: 2246
published: 2015-06-01 00:00:00 +0000
- title: 'Learning Scale-Free Networks by Dynamic Node Specific Degree Prior'
abstract: 'Learning network structure underlying data is an important problem in machine learning. This paper presents a novel degree prior to study the inference of scale-free networks, which are widely used to model social and biological networks. In particular, this paper formulates scale-free network inference using Gaussian Graphical model (GGM) regularized by a node degree prior. Our degree prior not only promotes a desirable global degree distribution, but also exploits the estimated degree of an individual node and the relative strength of all the edges of a single node. To fulfill this, this paper proposes a ranking-based method to dynamically estimate the degree of a node, which makes the resultant optimization problem challenging to solve. To deal with this, this paper presents a novel ADMM (alternating direction method of multipliers) procedure. Our experimental results on both synthetic and real data show that our prior not only yields a scale-free network, but also produces many more correctly predicted edges than existing scale-free inducing prior, hub-inducing prior and the l_1 norm.'
volume: 37
URL: http://proceedings.mlr.press/v37/tangb15.html
PDF: http://proceedings.mlr.press/v37/tangb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-tangb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Tang
given: Qingming
- family: Sun
given: Siqi
- family: Xu
given: Jinbo
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2247-2255
id: tangb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2247
lastpage: 2255
published: 2015-06-01 00:00:00 +0000
- title: 'Deep Unsupervised Learning using Nonequilibrium Thermodynamics'
abstract: 'A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable. Here, we develop an approach that simultaneously achieves both flexibility and tractability. The essential idea, inspired by non-equilibrium statistical physics, is to systematically and slowly destroy structure in a data distribution through an iterative forward diffusion process. We then learn a reverse diffusion process that restores structure in data, yielding a highly flexible and tractable generative model of the data. This approach allows us to rapidly learn, sample from, and evaluate probabilities in deep generative models with thousands of layers or time steps, as well as to compute conditional and posterior probabilities under the learned model. We additionally release an open source reference implementation of the algorithm.'
volume: 37
URL: http://proceedings.mlr.press/v37/sohl-dickstein15.html
PDF: http://proceedings.mlr.press/v37/sohl-dickstein15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-sohl-dickstein15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sohl-Dickstein
given: Jascha
- family: Weiss
given: Eric
- family: Maheswaranathan
given: Niru
- family: Ganguli
given: Surya
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2256-2265
id: sohl-dickstein15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2256
lastpage: 2265
published: 2015-06-01 00:00:00 +0000
- title: 'Modeling Order in Neural Word Embeddings at Scale'
abstract: 'Natural Language Processing (NLP) systems commonly leverage bag-of-words co-occurrence techniques to capture semantic and syntactic word relationships. The resulting word-level distributed representations often ignore morphological information, though character-level embeddings have proven valuable to NLP tasks. We propose a new neural language model incorporating both word order and character order in its embedding. The model produces several vector spaces with meaningful substructure, as evidenced by its performance of 85.8% on a recent word-analogy task, exceeding best published syntactic word-analogy scores by a 58% error margin. Furthermore, the model includes several parallel training methods, most notably allowing a skip-gram network with 160 billion parameters to be trained overnight on 3 multi-core CPUs, 14x larger than the previous largest neural network.'
volume: 37
URL: http://proceedings.mlr.press/v37/trask15.html
PDF: http://proceedings.mlr.press/v37/trask15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-trask15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Trask
given: Andrew
- family: Gilmore
given: David
- family: Russell
given: Matthew
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2266-2275
id: trask15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2266
lastpage: 2275
published: 2015-06-01 00:00:00 +0000
- title: 'Distributed Inference for Dirichlet Process Mixture Models'
abstract: 'Bayesian nonparametric mixture models based on the Dirichlet process (DP) have been widely used for solving problems like clustering, density estimation and topic modelling. These models make weak assumptions about the underlying process that generated the observed data. Thus, when more data are collected, the complexity of these models can change accordingly. These theoretical properties often lead to superior predictive performance when compared to traditional finite mixture models. However, despite the increasing amount of data available, the application of Bayesian nonparametric mixture models is so far limited to relatively small data sets. In this paper, we propose an efficient distributed inference algorithm for the DP and the HDP mixture model. The proposed method is based on a variant of the slice sampler for DPs. Since this sampler does not involve a pre-determined truncation, the stationary distribution of the sampling algorithm is unbiased. We provide both local thread-level and distributed machine-level parallel implementations and study the performance of this sampler through an extensive set of experiments on image and text data. When compared to existing inference algorithms, the proposed method exhibits state-of-the-art accuracy and strong scalability with up to 512 cores.'
volume: 37
URL: http://proceedings.mlr.press/v37/gea15.html
PDF: http://proceedings.mlr.press/v37/gea15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gea15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ge
given: Hong
- family: Chen
given: Yutian
- family: Wan
given: Moquan
- family: Ghahramani
given: Zoubin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2276-2284
id: gea15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2276
lastpage: 2284
published: 2015-06-01 00:00:00 +0000
- title: 'Compressing Neural Networks with the Hashing Trick'
abstract: 'As deep nets are increasingly used in applications suited for mobile devices, a fundamental dilemma becomes apparent: the trend in deep learning is to grow models to absorb ever-increasing data set sizes; however mobile devices are designed with very little memory and cannot store such large models. We present a novel network architecture, HashedNets, that exploits inherent redundancy in neural networks to achieve drastic reductions in model sizes. HashedNets uses a low-cost hash function to randomly group connection weights into hash buckets, and all connections within the same hash bucket share a single parameter value. These parameters are tuned to adjust to the HashedNets weight sharing architecture with standard backprop during training. Our hashing procedure introduces no additional memory overhead, and we demonstrate on several benchmark data sets that HashedNets shrink the storage requirements of neural networks substantially while mostly preserving generalization performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/chenc15.html
PDF: http://proceedings.mlr.press/v37/chenc15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-chenc15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Chen
given: Wenlin
- family: Wilson
given: James
- family: Tyree
given: Stephen
- family: Weinberger
given: Kilian
- family: Chen
given: Yixin
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2285-2294
id: chenc15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2285
lastpage: 2294
published: 2015-06-01 00:00:00 +0000
- title: 'Intersecting Faces: Non-negative Matrix Factorization With New Guarantees'
abstract: 'Non-negative matrix factorization (NMF) is a natural model of admixture and is widely used in science and engineering. A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on how well these methods work. Recently a surge of research have focused on a very restricted class of NMFs, called separable NMF, where provably correct algorithms have been developed. In this paper, we propose the notion of subset-separable NMF, which substantially generalizes the property of separability. We show that subset-separability is a natural necessary condition for the factorization to be unique or to have minimum volume. We developed the Face-Intersect algorithm which provably and efficiently solves subset-separable NMF under natural conditions, and we prove that our algorithm is robust to small noise. We explored the performance of Face-Intersect on simulations and discuss settings where it empirically outperformed the state-of-art methods. Our work is a step towards finding provably correct algorithms that solve large classes of NMF problems.'
volume: 37
URL: http://proceedings.mlr.press/v37/geb15.html
PDF: http://proceedings.mlr.press/v37/geb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-geb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ge
given: Rong
- family: Zou
given: James
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2295-2303
id: geb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2295
lastpage: 2303
published: 2015-06-01 00:00:00 +0000
- title: 'Scaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix'
abstract: 'Second-order optimization methods, such as natural gradient, are difficult to apply to high-dimensional problems, because they require approximately solving large linear systems. We present FActorized Natural Gradient (FANG), an approximation to natural gradient descent where the Fisher matrix is approximated with a Gaussian graphical model whose precision matrix can be computed efficiently. We analyze the Fisher matrix for a small RBM and derive an extremely sparse graphical model which is a good match to the covariance of the sufficient statistics. Our experiments indicate that FANG allows RBMs to be trained more efficiently compared with stochastic gradient descent. Additionally, our analysis yields insight into the surprisingly good performance of the “centering trick” for training RBMs.'
volume: 37
URL: http://proceedings.mlr.press/v37/grosse15.html
PDF: http://proceedings.mlr.press/v37/grosse15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-grosse15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Grosse
given: Roger
- family: Salakhudinov
given: Ruslan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2304-2313
id: grosse15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2304
lastpage: 2313
published: 2015-06-01 00:00:00 +0000
- title: 'A Deeper Look at Planning as Learning from Replay'
abstract: 'In reinforcement learning, the notions of experience replay, and of planning as learning from replayed experience, have long been used to find good policies with minimal training data. Replay can be seen either as model-based reinforcement learning, where the store of past experiences serves as the model, or as a way to avoid a conventional model of the environment altogether. In this paper, we look more deeply at how replay blurs the line between model-based and model-free methods. First, we show for the first time an exact equivalence between the sequence of value functions found by a model-based policy-evaluation method and by a model-free method with replay. Second, we present a general replay method that can mimic a spectrum of methods ranging from the explicitly model-free (TD(0)) to the explicitly model-based (linear Dyna). Finally, we use insights gained from these relationships to design a new model-based reinforcement learning algorithm for linear function approximation. This method, which we call forgetful LSTD(lambda), improves upon regular LSTD(lambda) because it extends more naturally to online control, and improves upon linear Dyna because it is a multi-step method, enabling it to perform well even in non-Markov problems or, equivalently, in problems with significant function approximation.'
volume: 37
URL: http://proceedings.mlr.press/v37/vanseijen15.html
PDF: http://proceedings.mlr.press/v37/vanseijen15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-vanseijen15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Vanseijen
given: Harm
- family: Sutton
given: Rich
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2314-2322
id: vanseijen15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2314
lastpage: 2322
published: 2015-06-01 00:00:00 +0000
- title: 'Optimal and Adaptive Algorithms for Online Boosting'
abstract: 'We study online boosting, the task of converting any weak online learner into a strong online learner. Based on a novel and natural definition of weak online learnability, we develop two online boosting algorithms. The first algorithm is an online version of boost-by-majority. By proving a matching lower bound, we show that this algorithm is essentially optimal in terms of the number of weak learners and the sample complexity needed to achieve a specified accuracy. The second algorithm is adaptive and parameter-free, albeit not optimal.'
volume: 37
URL: http://proceedings.mlr.press/v37/beygelzimer15.html
PDF: http://proceedings.mlr.press/v37/beygelzimer15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-beygelzimer15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Beygelzimer
given: Alina
- family: Kale
given: Satyen
- family: Luo
given: Haipeng
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2323-2331
id: beygelzimer15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2323
lastpage: 2331
published: 2015-06-01 00:00:00 +0000
- title: 'Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems'
abstract: 'Stochastic gradient descent (SGD) on a low-rank factorization is commonly employed to speed up matrix problems including matrix completion, subspace tracking, and SDP relaxation. In this paper, we exhibit a step size scheme for SGD on a low-rank least-squares problem, and we prove that, under broad sampling conditions, our method converges globally from a random starting point within O(ε^-1 n \log n) steps with constant probability for constant-rank problems. Our modification of SGD relates it to stochastic power iteration. We also show some experiments to illustrate the runtime and convergence of the algorithm.'
volume: 37
URL: http://proceedings.mlr.press/v37/sa15.html
PDF: http://proceedings.mlr.press/v37/sa15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-sa15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sa
given: Christopher De
- family: Re
given: Christopher
- family: Olukotun
given: Kunle
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2332-2341
id: sa15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2332
lastpage: 2341
published: 2015-06-01 00:00:00 +0000
- title: 'An Empirical Exploration of Recurrent Network Architectures'
abstract: 'The Recurrent Neural Network (RNN) is an extremely powerful sequence model that is often difficult to train. The Long Short-Term Memory (LSTM) is a specific RNN architecture whose design makes it much easier to train. While wildly successful in practice, the LSTM’s architecture appears to be ad-hoc so it is not clear if it is optimal, and the significance of its individual components is unclear. In this work, we aim to determine whether the LSTM architecture is optimal or whether much better architectures exist. We conducted a thorough architecture search where we evaluated over ten thousand different RNN architectures, and identified an architecture that outperforms both the LSTM and the recently-introduced Gated Recurrent Unit (GRU) on some but not all tasks. We found that adding a bias of 1 to the LSTM’s forget gate closes the gap between the LSTM and the GRU.'
volume: 37
URL: http://proceedings.mlr.press/v37/jozefowicz15.html
PDF: http://proceedings.mlr.press/v37/jozefowicz15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-jozefowicz15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Jozefowicz
given: Rafal
- family: Zaremba
given: Wojciech
- family: Sutskever
given: Ilya
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2342-2350
id: jozefowicz15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2342
lastpage: 2350
published: 2015-06-01 00:00:00 +0000
- title: 'Complete Dictionary Recovery Using Nonconvex Optimization'
abstract: 'We consider the problem of recovering a complete (i.e., square and invertible) dictionary mb A_0, from mb Y = mb A_0 mb X_0 with mb Y ∈\mathbb R^n \times p. This recovery setting is central to the theoretical understanding of dictionary learning. We give the first efficient algorithm that provably recovers mb A_0 when mb X_0 has O(n) nonzeros per column, under suitable probability model for mb X_0. Prior results provide recovery guarantees when mb X_0 has only O(\sqrtn) nonzeros per column. Our algorithm is based on nonconvex optimization with a spherical constraint, and hence is naturally phrased in the language of manifold optimization. Our proofs give a geometric characterization of the high-dimensional objective landscape, which shows that with high probability there are no spurious local minima. Experiments with synthetic data corroborate our theory. Full version of this paper is available online: \urlhttp://arxiv.org/abs/1504.06785.'
volume: 37
URL: http://proceedings.mlr.press/v37/sund15.html
PDF: http://proceedings.mlr.press/v37/sund15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-sund15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Sun
given: Ju
- family: Qu
given: Qing
- family: Wright
given: John
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2351-2360
id: sund15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2351
lastpage: 2360
published: 2015-06-01 00:00:00 +0000
- title: 'Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret'
abstract: 'Lifelong reinforcement learning provides a promising framework for developing versatile agents that can accumulate knowledge over a lifetime of experience and rapidly learn new tasks by building upon prior knowledge. However, current lifelong learning methods exhibit non-vanishing regret as the amount of experience increases, and include limitations that can lead to suboptimal or unsafe control policies. To address these issues, we develop a lifelong policy gradient learner that operates in an adversarial setting to learn multiple tasks online while enforcing safety constraints on the learned policies. We demonstrate, for the first time, sublinear regret for lifelong policy search, and validate our algorithm on several benchmark dynamical systems and an application to quadrotor control.'
volume: 37
URL: http://proceedings.mlr.press/v37/ammar15.html
PDF: http://proceedings.mlr.press/v37/ammar15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ammar15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ammar
given: Haitham Bou
- family: Tutunov
given: Rasul
- family: Eaton
given: Eric
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2361-2369
id: ammar15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2361
lastpage: 2369
published: 2015-06-01 00:00:00 +0000
- title: 'PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent'
abstract: 'Stochastic Dual Coordinate Descent (DCD) is one of the most efficient ways to solve the family of L2-regularized empirical risk minimization problems, including linear SVM, logistic regression, and many others. The vanilla implementation of DCD is quite slow; however, by maintaining primal variables while updating dual variables, the time complexity of DCD can be significantly reduced. Such a strategy forms the core algorithm in the widely-used LIBLINEAR package. In this paper, we parallelize the DCD algorithms in LIBLINEAR. In recent research, several synchronized parallel DCD algorithms have been proposed, however, they fail to achieve good speedup in the shared memory multi-core setting. In this paper, we propose a family of parallel asynchronous stochastic dual coordinate descent algorithms (PASSCoDe). Each thread repeatedly selects a random dual variable and conducts coordinate updates using the primal variables that are stored in the shared memory. We analyze the convergence properties of DCD when different locking/atomic mechanisms are applied. For implementation with atomic operations, we show linear convergence under mild conditions. For implementation without any atomic operations or locking, we present a novel error analysis for PASSCoDe under the multi-core environment, showing that the converged solution is the exact solution for a primal problem with a perturbed regularizer. Experimental results show that our methods are much faster than previous parallel coordinate descent solvers.'
volume: 37
URL: http://proceedings.mlr.press/v37/hsieha15.html
PDF: http://proceedings.mlr.press/v37/hsieha15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hsieha15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hsieh
given: Cho-Jui
- family: Yu
given: Hsiang-Fu
- family: Dhillon
given: Inderjit
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2370-2379
id: hsieha15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2370
lastpage: 2379
published: 2015-06-01 00:00:00 +0000
- title: 'High Confidence Policy Improvement'
abstract: 'We present a batch reinforcement learning (RL) algorithm that provides probabilistic guarantees about the quality of each policy that it proposes, and which has no hyper-parameter that requires expert tuning. Specifically, the user may select any performance lower-bound and confidence level and our algorithm will ensure that the probability that it returns a policy with performance below the lower bound is at most the specified confidence level. We then propose an incremental algorithm that executes our policy improvement algorithm repeatedly to generate multiple policy improvements. We show the viability of our approach with a simple 4 x 4 gridworld and the standard mountain car problem, as well as with a digital marketing application that uses real world data.'
volume: 37
URL: http://proceedings.mlr.press/v37/thomas15.html
PDF: http://proceedings.mlr.press/v37/thomas15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-thomas15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Thomas
given: Philip
- family: Theocharous
given: Georgios
- family: Ghavamzadeh
given: Mohammad
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2380-2388
id: thomas15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2380
lastpage: 2388
published: 2015-06-01 00:00:00 +0000
- title: 'Fixed-point algorithms for learning determinantal point processes'
abstract: 'Determinantal point processes (DPPs) offer an elegant tool for encoding probabilities over subsets of a ground set. Discrete DPPs are parametrized by a positive semidefinite matrix (called the DPP kernel), and estimating this kernel is key to learning DPPs from observed data. We consider the task of learning the DPP kernel, and develop for it a surprisingly simple yet effective new algorithm. Our algorithm offers the following benefits over previous approaches: (a) it is much simpler; (b) it yields equally good and sometimes even better local maxima; and (c) it runs an order of magnitude faster on large problems. We present experimental results on both real and simulated data to illustrate the numerical performance of our technique.'
volume: 37
URL: http://proceedings.mlr.press/v37/mariet15.html
PDF: http://proceedings.mlr.press/v37/mariet15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-mariet15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Mariet
given: Zelda
- family: Sra
given: Suvrit
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2389-2397
id: mariet15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2389
lastpage: 2397
published: 2015-06-01 00:00:00 +0000
- title: 'Consistent Multiclass Algorithms for Complex Performance Measures'
abstract: 'This paper presents new consistent algorithms for multiclass learning with complex performance measures, defined by arbitrary functions of the confusion matrix. This setting includes as a special case all loss-based performance measures, which are simply linear functions of the confusion matrix, but also includes more complex performance measures such as the multiclass G-mean and micro F_1 measures. We give a general framework for designing consistent algorithms for such performance measures by viewing the learning problem as an optimization problem over the set of feasible confusion matrices, and give two specific instantiations based on the Frank-Wolfe method for concave performance measures and on the bisection method for ratio-of-linear performance measures. The resulting algorithms are provably consistent and outperform a multiclass version of the state-of-the-art SVMperf method in experiments; for large multiclass problems, the algorithms are also orders of magnitude faster than SVMperf.'
volume: 37
URL: http://proceedings.mlr.press/v37/narasimhanb15.html
PDF: http://proceedings.mlr.press/v37/narasimhanb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-narasimhanb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Narasimhan
given: Harikrishna
- family: Ramaswamy
given: Harish
- family: Saha
given: Aadirupa
- family: Agarwal
given: Shivani
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2398-2407
id: narasimhanb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2398
lastpage: 2407
published: 2015-06-01 00:00:00 +0000
- title: 'Optimizing Neural Networks with Kronecker-factored Approximate Curvature'
abstract: 'We propose an efficient method for approximating natural gradient descent in neural networks which we call Kronecker-factored Approximate Curvature (K-FAC). K-FAC is based on an efficiently invertible approximation of a neural network’s Fisher information matrix which is neither diagonal nor low-rank, and in some cases is completely non-sparse. It is derived by approximating various large blocks of the Fisher (corresponding to entire layers) as being the Kronecker product of two much smaller matrices. While only several times more expensive to compute than the plain stochastic gradient, the updates produced by K-FAC make much more progress optimizing the objective, which results in an algorithm that can be much faster than stochastic gradient descent with momentum in practice. And unlike some previously proposed approximate natural-gradient/Newton methods which use high-quality non-diagonal curvature matrices (such as Hessian-free optimization), K-FAC works very well in highly stochastic optimization regimes. This is because the cost of storing and inverting K-FAC’s approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.'
volume: 37
URL: http://proceedings.mlr.press/v37/martens15.html
PDF: http://proceedings.mlr.press/v37/martens15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-martens15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Martens
given: James
- family: Grosse
given: Roger
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2408-2417
id: martens15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2408
lastpage: 2417
published: 2015-06-01 00:00:00 +0000
- title: 'A Convex Exemplar-based Approach to MAD-Bayes Dirichlet Process Mixture Models'
abstract: 'MAD-Bayes (MAP-based Asymptotic Derivations) has been recently proposed as a general technique to derive scalable algorithm for Bayesian Nonparametric models. However, the combinatorial nature of objective functions derived from MAD-Bayes results in hard optimization problem, for which current practice employs heuristic algorithms analogous to k-means to find local minimum. In this paper, we consider the exemplar-based version of MAD-Bayes formulation for DP and Hierarchical DP (HDP) mixture model. We show that an exemplar-based MAD-Bayes formulation can be relaxed to a convex structural-regularized program that, under cluster-separation conditions, shares the same optimal solution to its combinatorial counterpart. An algorithm based on Alternating Direction Method of Multiplier (ADMM) is then proposed to solve such program. In our experiments on several benchmark data sets, the proposed method finds optimal solution of the combinatorial problem and significantly improves existing methods in terms of the exemplar-based objective.'
volume: 37
URL: http://proceedings.mlr.press/v37/yen15.html
PDF: http://proceedings.mlr.press/v37/yen15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yen15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yen
given: En-Hsu
- family: Lin
given: Xin
- family: Zhong
given: Kai
- family: Ravikumar
given: Pradeep
- family: Dhillon
given: Inderjit
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2418-2426
id: yen15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2418
lastpage: 2426
published: 2015-06-01 00:00:00 +0000
- title: 'Multi-instance multi-label learning in the presence of novel class instances'
abstract: 'Multi-instance multi-label learning (MIML) is a framework for learning in the presence of label ambiguity. In MIML, experts provide labels for groups of instances (bags), instead of directly providing a label for every instance. When labeling efforts are focused on a set of target classes, instances outside this set will not be appropriately modeled. For example, ornithologists label bird audio recordings with a list of species present. Other additional sound instances, e.g., a rain drop or a moving vehicle sound, are not labeled. The challenge is due to the fact that for a given bag, the presence or absence of novel instances is latent. In this paper, this problem is addressed using a discriminative probabilistic model that accounts for novel instances. We propose an exact and efficient implementation of the maximum likelihood approach to determine the model parameters and consequently learn an instance-level classifier for all classes including the novel class. Experiments on both synthetic and real datasets illustrate the effectiveness of the proposed approach.'
volume: 37
URL: http://proceedings.mlr.press/v37/pham15.html
PDF: http://proceedings.mlr.press/v37/pham15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-pham15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Pham
given: Anh
- family: Raich
given: Raviv
- family: Fern
given: Xiaoli
- family: Arriaga
given: Jesús Pérez
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2427-2435
id: pham15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2427
lastpage: 2435
published: 2015-06-01 00:00:00 +0000
- title: 'Entropy-Based Concentration Inequalities for Dependent Variables'
abstract: 'We provide new concentration inequalities for functions of dependent variables. The work extends that of Janson (2004), which proposes concentration inequalities using a combination of the Laplace transform and the idea of fractional graph coloring, as well as many works that derive concentration inequalities using the entropy method (see, e.g., (Boucheron et al., 2003)). We give inequalities for fractionally sub-additive and fractionally self-bounding functions. In the way, we prove a new Talagrand concentration inequality for fractionally sub-additive functions of dependent variables. The results allow us to envision the derivation of generalization bounds for various applications where dependent variables naturally appear, such as in bipartite ranking.'
volume: 37
URL: http://proceedings.mlr.press/v37/ralaivola15.html
PDF: http://proceedings.mlr.press/v37/ralaivola15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-ralaivola15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Ralaivola
given: Liva
- family: Amini
given: Massih-Reza
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2436-2444
id: ralaivola15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2436
lastpage: 2444
published: 2015-06-01 00:00:00 +0000
- title: 'PU Learning for Matrix Completion'
abstract: 'In this paper, we consider the matrix completion problem when the observations are one-bit measurements of some underlying matrix M , and in particular the observed samples consist only of ones and no zeros. This problem is motivated by modern applications such as recommender systems and social networks where only “likes” or “friendships” are observed. The problem is an instance of PU (positive-unlabeled) learning, i.e. learning from only positive and unlabeled examples that has been studied in the context of binary classification. Under the assumption that M has bounded nuclear norm, we provide recovery guarantees for two different observation models: 1) M parameterizes a distribution that generates a binary matrix, 2) M is thresholded to obtain a binary matrix. For the first case, we propose a “shifted matrix completion” method that recovers M using only a subset of indices corresponding to ones; for the second case, we propose a “biased matrix completion” method that recovers the (thresholded) binary matrix. Both methods yield strong error bounds — if M ∈R^n \times n, the error is bounded as O(1-ρ) , where 1-ρdenotes the fraction of ones observed. This implies a sample complexity of O(n log n) ones to achieve a small error, when M is dense and n is large. We extend our analysis to the inductive matrix completion problem, where rows and columns of M have associated features. We develop efficient and scalable optimization procedures for both the proposed methods and demonstrate their effectiveness for link prediction (on real-world networks consisting of over 2 million nodes and 90 million links) and semi-supervised clustering tasks.'
volume: 37
URL: http://proceedings.mlr.press/v37/hsiehb15.html
PDF: http://proceedings.mlr.press/v37/hsiehb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-hsiehb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Hsieh
given: Cho-Jui
- family: Natarajan
given: Nagarajan
- family: Dhillon
given: Inderjit
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2445-2453
id: hsiehb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2445
lastpage: 2453
published: 2015-06-01 00:00:00 +0000
- title: 'An Asynchronous Distributed Proximal Gradient Method for Composite Convex Optimization'
abstract: 'We propose a distributed first-order augmented Lagrangian (DFAL) algorithm to minimize the sum of composite convex functions, where each term in the sum is a private cost function belonging to a node, and only nodes connected by an edge can directly communicate with each other. This optimization model abstracts a number of applications in distributed sensing and machine learning. We show that any limit point of DFAL iterates is optimal; and for any eps > 0, an eps-optimal and eps-feasible solution can be computed within O(log(1/eps)) DFAL iterations, which require O(\psi_\textmax^1.5/d_\textmin ⋅1/ε) proximal gradient computations and communications per node in total, where \psi_\textmax denotes the largest eigenvalue of the graph Laplacian, and d_\textmin is the minimum degree of the graph. We also propose an asynchronous version of DFAL by incorporating randomized block coordinate descent methods; and demonstrate the efficiency of DFAL on large scale sparse-group LASSO problems.'
volume: 37
URL: http://proceedings.mlr.press/v37/aybat15.html
PDF: http://proceedings.mlr.press/v37/aybat15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-aybat15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Aybat
given: Necdet
- family: Wang
given: Zi
- family: Iyengar
given: Garud
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2454-2462
id: aybat15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2454
lastpage: 2462
published: 2015-06-01 00:00:00 +0000
- title: 'Sparse Subspace Clustering with Missing Entries'
abstract: 'We consider the problem of clustering incomplete data drawn from a union of subspaces. Classical subspace clustering methods are not applicable to this problem because the data are incomplete, while classical low-rank matrix completion methods may not be applicable because data in multiple subspaces may not be low rank. This paper proposes and evaluates two new approaches for subspace clustering and completion. The first one generalizes the sparse subspace clustering algorithm so that it can obtain a sparse representation of the data using only the observed entries. The second one estimates a suitable kernel matrix by assuming a random model for the missing entries and obtains the sparse representation from this kernel. Experiments on synthetic and real data show the advantages and disadvantages of the proposed methods, which all outperform the natural approach (low-rank matrix completion followed by sparse subspace clustering) when the data matrix is high-rank or the percentage of missing entries is large.'
volume: 37
URL: http://proceedings.mlr.press/v37/yangf15.html
PDF: http://proceedings.mlr.press/v37/yangf15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-yangf15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Yang
given: Congyuan
- family: Robinson
given: Daniel
- family: Vidal
given: Rene
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2463-2472
id: yangf15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2463
lastpage: 2472
published: 2015-06-01 00:00:00 +0000
- title: 'Moderated and Drifting Linear Dynamical Systems'
abstract: 'We consider linear dynamical systems, particularly coupled linear oscillators, where the parameters represent meaningful values in a domain theory and thus learning what affects them contributes to explanation. Rather than allow perturbations of latent states, we assume that temporal variation beyond noise is explained by parameter drift, and variation across coupled systems is a function of moderating variables. This change of focus reduces opportunities for efficient inference, and we propose sampling procedures to learn and fit the models. We test our approach on a real dataset of physiological measures of heterosexual couples engaged in a conversation about a potentially emotional topic, with body mass index (BMI) being considered as a moderator. We evaluate several models on their ability to predict future conversation dynamics (the last 20% of the data for each test couple), with shared parameters being learned using held out data. As proof of concept, we validate the hypothesis that BMI affects the conversation dynamic in the experimentally chosen topic.'
volume: 37
URL: http://proceedings.mlr.press/v37/guan15.html
PDF: http://proceedings.mlr.press/v37/guan15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-guan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Guan
given: Jinyan
- family: Simek
given: Kyle
- family: Brau
given: Ernesto
- family: Morrison
given: Clayton
- family: Butler
given: Emily
- family: Barnard
given: Kobus
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2473-2482
id: guan15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2473
lastpage: 2482
published: 2015-06-01 00:00:00 +0000
- title: 'Boosted Categorical Restricted Boltzmann Machine for Computational Prediction of Splice Junctions'
abstract: 'Splicing refers to the elimination of non-coding regions in transcribed pre-messenger ribonucleic acid (RNA). Discovering splice sites is an important machine learning task that helps us not only to identify the basic units of genetic heredity but also to understand how different proteins are produced. Existing methods for splicing prediction have produced promising results, but often show limited robustness and accuracy. In this paper, we propose a deep belief network-based methodology for computational splice junction prediction. Our proposal includes a novel method for training restricted Boltzmann machines for class-imbalanced prediction. The proposed method addresses the limitations of conventional contrastive divergence and provides regularization for datasets that have categorical features. We tested our approach using public human genome datasets and obtained significantly improved accuracy and reduced runtime compared to state-of-the-art alternatives. The proposed approach was less sensitive to the length of input sequences and more robust for handling false splicing signals. Furthermore, we could discover non-canonical splicing patterns that were otherwise difficult to recognize using conventional methods. Given the efficiency and robustness of our methodology, we anticipate that it can be extended to the discovery of primary structural patterns of other subtle genomic elements.'
volume: 37
URL: http://proceedings.mlr.press/v37/leeb15.html
PDF: http://proceedings.mlr.press/v37/leeb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-leeb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Lee
given: Taehoon
- family: Yoon
given: Sungroh
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2483-2492
id: leeb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2483
lastpage: 2492
published: 2015-06-01 00:00:00 +0000
- title: 'Privacy for Free: Posterior Sampling and Stochastic Gradient Monte Carlo'
abstract: 'We consider the problem of Bayesian learning on sensitive datasets and present two simple but somewhat surprising results that connect Bayesian learning to “differential privacy”, a cryptographic approach to protect individual-level privacy while permitting database-level utility. Specifically, we show that under standard assumptions, getting one sample from a posterior distribution is differentially private “for free”; and this sample as a statistical estimator is often consistent, near optimal, and computationally tractable. Similarly but separately, we show that a recent line of work that use stochastic gradient for Hybrid Monte Carlo (HMC) sampling also preserve differentially privacy with minor or no modifications of the algorithmic procedure at all, these observations lead to an “anytime” algorithm for Bayesian learning under privacy constraint. We demonstrate that it performs much better than the state-of-the-art differential private methods on synthetic and real datasets.'
volume: 37
URL: http://proceedings.mlr.press/v37/wangg15.html
PDF: http://proceedings.mlr.press/v37/wangg15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-wangg15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Wang
given: Yu-Xiang
- family: Fienberg
given: Stephen
- family: Smola
given: Alex
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2493-2502
id: wangg15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2493
lastpage: 2502
published: 2015-06-01 00:00:00 +0000
- title: 'A trust-region method for stochastic variational inference with applications to streaming data'
abstract: 'Stochastic variational inference allows for fast posterior inference in complex Bayesian models. However, the algorithm is prone to local optima which can make the quality of the posterior approximation sensitive to the choice of hyperparameters and initialization. We address this problem by replacing the natural gradient step of stochastic varitional inference with a trust-region update. We show that this leads to generally better results and reduced sensitivity to hyperparameters. We also describe a new strategy for variational inference on streaming data and show that here our trust-region method is crucial for getting good performance.'
volume: 37
URL: http://proceedings.mlr.press/v37/theis15.html
PDF: http://proceedings.mlr.press/v37/theis15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-theis15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Theis
given: Lucas
- family: Hoffman
given: Matt
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2503-2511
id: theis15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2503
lastpage: 2511
published: 2015-06-01 00:00:00 +0000
- title: 'Inference in a Partially Observed Queuing Model with Applications in Ecology'
abstract: 'We consider the problem of inference in a probabilistic model for transient populations where we wish to learn about arrivals, departures, and population size over all time, but the only available data are periodic counts of the population size at specific observation times. The underlying model arises in queueing theory (as an M/G/inf queue) and also in ecological models for short-lived animals such as insects. Our work applies to both systems. Previous work in the ecology literature focused on maximum likelihood estimation and made a simplifying independence assumption that prevents inference over unobserved random variables such as arrivals and departures. The contribution of this paper is to formulate a latent variable model and develop a novel Gibbs sampler based on Markov bases to perform inference using the correct, but intractable, likelihood function. We empirically validate the convergence behavior of our sampler and demonstrate the ability of our model to make much finer-grained inferences than the previous approach.'
volume: 37
URL: http://proceedings.mlr.press/v37/winner15.html
PDF: http://proceedings.mlr.press/v37/winner15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-winner15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Winner
given: Kevin
- family: Bernstein
given: Garrett
- family: Sheldon
given: Dan
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2512-2520
id: winner15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2512
lastpage: 2520
published: 2015-06-01 00:00:00 +0000
- title: 'Deterministic Independent Component Analysis'
abstract: 'We study independent component analysis with noisy observations. We present, for the first time in the literature, consistent, polynomial-time algorithms to recover non-Gaussian source signals and the mixing matrix with a reconstruction error that vanishes at a 1/\sqrtT rate using T observations and scales only polynomially with the natural parameters of the problem. Our algorithms and analysis also extend to deterministic source signals whose empirical distributions are approximately independent.'
volume: 37
URL: http://proceedings.mlr.press/v37/huangb15.html
PDF: http://proceedings.mlr.press/v37/huangb15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-huangb15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Huang
given: Ruitong
- family: Gyorgy
given: Andras
- family: Szepesvári
given: Csaba
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2521-2530
id: huangb15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2521
lastpage: 2530
published: 2015-06-01 00:00:00 +0000
- title: 'On the Optimality of Multi-Label Classification under Subset Zero-One Loss for Distributions Satisfying the Composition Property'
abstract: 'The benefit of exploiting label dependence in multi-label classification is known to be closely dependent on the type of loss to be minimized. In this paper, we show that the subsets of labels that appear as irreducible factors in the factorization of the conditional distribution of the label set given the input features play a pivotal role for multi-label classification in the context of subset Zero-One loss minimization, as they divide the learning task into simpler independent multi-class problems. We establish theoretical results to characterize and identify these irreducible label factors for any given probability distribution satisfying the Composition property. The analysis lays the foundation for generic multi-label classification and optimal feature subset selection procedures under this subclass of distributions. Our conclusions are supported by carefully designed experiments on synthetic and benchmark data.'
volume: 37
URL: http://proceedings.mlr.press/v37/gasse15.html
PDF: http://proceedings.mlr.press/v37/gasse15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gasse15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gasse
given: Maxime
- family: Aussem
given: Alexandre
- family: Elghazel
given: Haytham
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2531-2539
id: gasse15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2531
lastpage: 2539
published: 2015-06-01 00:00:00 +0000
- title: 'Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization'
abstract: 'We develop a family of accelerated stochastic algorithms that optimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear least-squares regression, across a wide range of problem settings. To achieve this, we establish a framework, based on the classical proximal point algorithm, useful for accelerating recent fast stochastic algorithms in a black-box fashion. Empirically, we demonstrate that the resulting algorithms exhibit notions of stability that are advantageous in practice. Both in theory and in practice, the provided algorithms reap the computational benefits of adding a large strongly convex regularization term, without incurring a corresponding bias to the original ERM problem.'
volume: 37
URL: http://proceedings.mlr.press/v37/frostig15.html
PDF: http://proceedings.mlr.press/v37/frostig15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-frostig15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Frostig
given: Roy
- family: Ge
given: Rong
- family: Kakade
given: Sham
- family: Sidford
given: Aaron
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2540-2548
id: frostig15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2540
lastpage: 2548
published: 2015-06-01 00:00:00 +0000
- title: 'A New Generalized Error Path Algorithm for Model Selection'
abstract: 'Model selection with cross validation (CV) is very popular in machine learning. However, CV with grid and other common search strategies cannot guarantee to find the model with minimum CV error, which is often the ultimate goal of model selection. Recently, various solution path algorithms have been proposed for several important learning algorithms including support vector classification, Lasso, and so on. However, they still do not guarantee to find the model with minimum CV error.In this paper, we first show that the solution paths produced by various algorithms have the property of piecewise linearity. Then, we prove that a large class of error (or loss) functions are piecewise constant, linear, or quadratic w.r.t. the regularization parameter, based on the solution path. Finally, we propose a new generalized error path algorithm (GEP), and prove that it will find the model with minimum CV error for the entire range of the regularization parameter. The experimental results on a variety of datasets not only confirm our theoretical findings, but also show that the best model with our GEP has better generalization error on the test data, compared to the grid search, manual search, and random search.'
volume: 37
URL: http://proceedings.mlr.press/v37/gu15.html
PDF: http://proceedings.mlr.press/v37/gu15.pdf
edit: https://github.com/mlresearch/v37/edit/gh-pages/_posts/2015-06-01-gu15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 32nd International Conference on Machine Learning'
publisher: 'PMLR'
author:
- family: Gu
given: Bin
- family: Ling
given: Charles
editor:
- family: Bach
given: Francis
- family: Blei
given: David
address: Lille, France
page: 2549-2558
id: gu15
issued:
date-parts:
- 2015
- 6
- 1
firstpage: 2549
lastpage: 2558
published: 2015-06-01 00:00:00 +0000