- title: 'Nonparametric Bayesian Factor Analysis for Dynamic Count Matrices'
abstract: 'A gamma process dynamic Poisson factor analysis model is proposed to factorize a dynamic count matrix, whose columns are sequentially observed count vectors. The model builds a novel Markov chain that sends the latent gamma random variables at time (t-1) as the shape parameters of those at time t, which are linked to observed or latent counts under the Poisson likelihood. The significant challenge of inferring the gamma shape parameters is fully addressed, using unique data augmentation and marginalization techniques for the negative binomial distribution. The same nonparametric Bayesian model also applies to the factorization of a dynamic binary matrix, via a Bernoulli-Poisson link that connects a binary observation to a latent count, with closed-form conditional posteriors for the latent counts and efficient computation for sparse observations. We apply the model to text and music analysis, with state-of-the-art results.'
volume: 38
URL: http://proceedings.mlr.press/v38/acharya15.html
PDF: http://proceedings.mlr.press/v38/acharya15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-acharya15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Acharya
given: Ayan
- family: Ghosh
given: Joydeep
- family: Zhou
given: Mingyuan
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1-9
id: acharya15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1
lastpage: 9
published: 2015-02-21 00:00:00 +0000
- title: 'Parameter Estimation of Generalized Linear Models without Assuming their Link Function'
abstract: 'Canonical generalized linear models (GLM) are completely specified by a finite dimensional vector and a monotonically increasing function called the link function. Standard parameter estimation techniques hold the link function fixed and optimizes over the parameter vector. We propose a parameter-recovery facilitating, jointly-convex, regularized loss functional that is optimized globally over the vector as well as the link function, with best rates possible under a first order oracle model. This widens the scope of GLMs to cases where the link function is unknown.'
volume: 38
URL: http://proceedings.mlr.press/v38/acharyya15.html
PDF: http://proceedings.mlr.press/v38/acharyya15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-acharyya15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Acharyya
given: Sreangsu
- family: Ghosh
given: Joydeep
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 10-18
id: acharyya15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 10
lastpage: 18
published: 2015-02-21 00:00:00 +0000
- title: 'Spectral Gap Error Bounds for Improving CUR Matrix Decomposition and the Nyström Method'
abstract: 'The CUR matrix decomposition and the related Nyström method build low-rank approximations of data matrices by selecting a small number of representative rows and columns of the data. Here, we introduce novel \emphspectral gap error bounds that judiciously exploit the potentially rapid spectrum decay in the input matrix, a most common occurrence in machine learning and data analysis. Our error bounds are much tighter than existing ones for matrices with rapid spectrum decay, and they justify the use of a constant amount of oversampling relative to the rank parameter k, i.e, when the number of columns/rows is \ell=k+ O(1). We demonstrate our analysis on a novel deterministic algorithm, \emphStableCUR, which additionally eliminates a previously unrecognized source of potential instability in CUR decompositions. While our algorithm accepts any method of row and column selection, we implement it with a recent column selection scheme with strong singular value bounds. Empirical results on various classes of real world data matrices demonstrate that our algorithm is as efficient as and often outperforms competing algorithms.'
volume: 38
URL: http://proceedings.mlr.press/v38/anderson15.html
PDF: http://proceedings.mlr.press/v38/anderson15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-anderson15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Anderson
given: David
- family: Du
given: Simon
- family: Mahoney
given: Michael
- family: Melgaard
given: Christopher
- family: Wu
given: Kunming
- family: Gu
given: Ming
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 19-27
id: anderson15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 19
lastpage: 27
published: 2015-02-21 00:00:00 +0000
- title: 'Global Multi-armed Bandits with Hölder Continuity'
abstract: 'Standard Multi-Armed Bandit (MAB) problems assume that the arms are independent. However, in many application scenarios, the information obtained by playing an arm provides information about the remainder of the arms. Hence, in such applications, this informativeness can and should be exploited to enable faster convergence to the optimal solution. In this paper, formalize a new class of multi-armed bandit methods, Global Multi-armed Bandit (GMAB), in which arms are globally informative through a global parameter, i.e., choosing an arm reveals information about all the arms. We propose a greedy policy for the GMAB which always selects the arm with the highest estimated expected reward, and prove that it achieves bounded parameter-dependent regret. Hence, this policy selects suboptimal arms only finitely many times, and after a finite number of initial time steps, the optimal arm is selected in all of the remaining time steps with probability one. In addition, we also study how the informativeness of the arms about each other’s rewards affects the speed of learning. Specifically, we prove that the parameter-free (worst-case) regret is sublinear in time, and decreases with the infor- mativeness of the arms. We also prove a sublinear in time Bayesian risk bound for the GMAB which reduces to the well-known Bayesian risk bound for linearly parameterized bandits when the arms are fully informa- tive. GMABs have applications ranging from drug dosage control to dynamic pricing.'
volume: 38
URL: http://proceedings.mlr.press/v38/atan15.html
PDF: http://proceedings.mlr.press/v38/atan15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-atan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Atan
given: Onur
- family: Tekin
given: Cem
- family: Schaar
given: Mihaela
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 28-36
id: atan15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 28
lastpage: 36
published: 2015-02-21 00:00:00 +0000
- title: 'Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures'
abstract: 'We consider the problem of clustering data points in high dimensions, i.e., when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with two non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (LDA). In addition to cluster assignments, the method returns an estimate of the set of features relevant for clustering. Our results indicate that the sample complexity of clustering depends on the sparsity of the relevant feature set, while only scaling logarithmically with the ambient dimension. Further, we require much milder assumptions than existing work on clustering in high dimensions. In particular, we do not require spherical clusters nor necessitate mean separation along relevant dimensions.'
volume: 38
URL: http://proceedings.mlr.press/v38/azizyan15.html
PDF: http://proceedings.mlr.press/v38/azizyan15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-azizyan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Azizyan
given: Martin
- family: Singh
given: Aarti
- family: Wasserman
given: Larry
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 37-45
id: azizyan15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 37
lastpage: 45
published: 2015-02-21 00:00:00 +0000
- title: 'Unifying Local Consistency and MAX SAT Relaxations for Scalable Inference with Rounding Guarantees'
abstract: 'We prove the equivalence of first-order local consistency relaxations and the MAX SAT relaxation of Goemans and Williamson (1994) for a class of MRFs we refer to as logical MRFs. This allows us to combine the advantages of each into a single MAP inference technique: solving the local consistency relaxation with any of a number of highly scalable message-passing algorithms, and then obtaining a high-quality discrete solution via a guaranteed rounding procedure when the relaxation is not tight. Logical MRFs are a general class of models that can incorporate many common dependencies, such as logical implications and mixtures of supermodular and submodular potentials. They can be used for many structured prediction tasks, including natural language processing, computer vision, and computational social science. We show that our new inference technique can improve solution quality by as much as 20% without sacrificing speed on problems with over one million dependencies.'
volume: 38
URL: http://proceedings.mlr.press/v38/bach15.html
PDF: http://proceedings.mlr.press/v38/bach15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-bach15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bach
given: Stephen
- family: Huang
given: Bert
- family: Getoor
given: Lise
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 46-55
id: bach15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 46
lastpage: 55
published: 2015-02-21 00:00:00 +0000
- title: 'Near-optimal max-affine estimators for convex regression'
abstract: 'This paper considers least squares estimators for regression problems over convex, uniformly bounded, uniformly Lipschitz function classes minimizing the empirical risk over max-affine functions (the maximum of finitely many affine functions). Based on new results on nonlinear nonparametric regression and on the approximation accuracy of max-affine functions, these estimators are proved to achieve the optimal rate of convergence up to logarithmic factors. Preliminary experiments indicate that a simple randomized approximation to the optimal estimator is competitive with state-of-the-art alternatives.'
volume: 38
URL: http://proceedings.mlr.press/v38/balazs15.html
PDF: http://proceedings.mlr.press/v38/balazs15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-balazs15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Balazs
given: Gabor
- family: György
given: András
- family: Szepesvari
given: Csaba
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 56-64
id: balazs15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 56
lastpage: 64
published: 2015-02-21 00:00:00 +0000
- title: 'Convex Multi-Task Learning by Clustering'
abstract: 'We consider the problem of multi-task learning in which tasks belong to hidden clusters. We formulate the learning problem as a novel convex optimization problem in which linear classifiers are combinations of (a small number of) some basis. Our formulation jointly learns both the basis and the linear combination. We propose a scalable optimization algorithm for finding the optimal solution. Our new methods outperform existing state-of-the-art methods on multi-task sentiment classification tasks.'
volume: 38
URL: http://proceedings.mlr.press/v38/barzilai15.html
PDF: http://proceedings.mlr.press/v38/barzilai15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-barzilai15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Barzilai
given: Aviad
- family: Crammer
given: Koby
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 65-73
id: barzilai15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 65
lastpage: 73
published: 2015-02-21 00:00:00 +0000
- title: 'Gaussian Processes for Bayesian hypothesis tests on regression functions'
abstract: 'Gaussian processes have been used in different application domains such as classification, regression etc. In this paper we show that they can also be employed as a universal tool for developing a large variety of Bayesian statistical hypothesis tests for regression functions. In particular, we will use GPs for testing whether (i) two functions are equal; (ii) a function is monotone (even accounting for seasonality effects); (iii) a function is periodic; (iv) two functions are proportional. By simulation studies, we will show that, beside being more flexible, GP tests are also competitive in terms of performance with state-of-art algorithms.'
volume: 38
URL: http://proceedings.mlr.press/v38/benavoli15.html
PDF: http://proceedings.mlr.press/v38/benavoli15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-benavoli15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Benavoli
given: Alessio
- family: Mangili
given: Francesca
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 74-82
id: benavoli15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 74
lastpage: 82
published: 2015-02-21 00:00:00 +0000
- title: 'Sparse Solutions to Nonnegative Linear Systems and Applications'
abstract: 'We give an efficient algorithm for finding sparse approximate solutions to linear systems of equations with nonnegative coefficients. Unlike most known results for sparse recovery, we do not require \emphany assumption on the matrix other than non-negativity. Our algorithm is combinatorial in nature, inspired by techniques for the “set cover” problem, as well as the multiplicative weight update method. We then present a natural application to learning mixture models in the PAC framework. For learning a mixture of k axis-aligned Gaussians in d dimensions, we give an algorithm that outputs a mixture of O(k/ε^3) Gaussians that is ε-close in statistical distance to the true distribution, without any separation assumptions. The time and sample complexity is roughly O(kd/ε^3)^d. This is polynomial when d is constant – precisely the regime in which known methods fail to identify the components efficiently. Given that non-negativity is a natural assumption, we believe that our result may find use in other settings in which we wish to approximately “explain” data using a small number of a (large) candidate set of components.'
volume: 38
URL: http://proceedings.mlr.press/v38/bhaskara15.html
PDF: http://proceedings.mlr.press/v38/bhaskara15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-bhaskara15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bhaskara
given: Aditya
- family: Suresh
given: Ananda
- family: Zadimoghaddam
given: Morteza
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 83-92
id: bhaskara15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 83
lastpage: 92
published: 2015-02-21 00:00:00 +0000
- title: 'Generalized Linear Models for Aggregated Data'
abstract: 'Databases in domains such as healthcare are routinely released to the public in aggregated form. Unfortunately, naive modeling with aggregated data may significantly diminish the accuracy of inferences at the individual level. This paper addresses the scenario where features are provided at the individual level, but the target variables are only available as histogram aggregates or order statistics. We consider a limiting case of generalized linear modeling when the target variables are only known up to permutation, and explore how this relates to permutation testing; a standard technique for assessing statistical dependency. Based on this relationship, we propose a simple algorithm to estimate the model parameters and individual level inferences via alternating imputation and standard generalized linear model fitting. Our results suggest the effectiveness of the proposed approach when, in the original data, permutation testing accurately ascertains the veracity of the linear relationship. The framework is extended to general histogram data with larger bins - with order statistics such as the median as a limiting case. Our experimental results on simulated data and aggregated healthcare data suggest a diminishing returns property with respect to the granularity of the histogram - when a linear relationship holds in the original data, the targets can be predicted accurately given relatively coarse histograms.'
volume: 38
URL: http://proceedings.mlr.press/v38/bhowmik15.html
PDF: http://proceedings.mlr.press/v38/bhowmik15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-bhowmik15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Bhowmik
given: Avradeep
- family: Ghosh
given: Joydeep
- family: Koyejo
given: Oluwasanmi
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 93-101
id: bhowmik15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 93
lastpage: 101
published: 2015-02-21 00:00:00 +0000
- title: 'Accurate and conservative estimates of MRF log-likelihood using reverse annealing'
abstract: 'Markov random fields (MRFs) are difficult to evaluate as generative models because computing the test log-probabilities requires the intractable partition function. Annealed importance sampling (AIS) is widely used to estimate MRF partition functions, and often yields quite accurate results. However, AIS is prone to overestimate the log-likelihood with little indication that anything is wrong. We present the Reverse AIS Estimator (RAISE), a stochastic lower bound on the log-likelihood of an approximation to the original MRF model. RAISE requires only the same MCMC transition operators as standard AIS. Experimental results indicate that RAISE agrees closely with AIS log-probability estimates for RBMs, DBMs, and DBNs, but typically errs on the side of underestimating, rather than overestimating, the log-likelihood.'
volume: 38
URL: http://proceedings.mlr.press/v38/burda15.html
PDF: http://proceedings.mlr.press/v38/burda15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-burda15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Burda
given: Yuri
- family: Grosse
given: Roger
- family: Salakhutdinov
given: Ruslan
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 102-110
id: burda15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 102
lastpage: 110
published: 2015-02-21 00:00:00 +0000
- title: 'Stochastic Spectral Descent for Restricted Boltzmann Machines'
abstract: 'Restricted Boltzmann Machines (RBMs) are widely used as building blocks for deep learning models. Learning typically proceeds by using stochastic gradient descent, and the gradients are estimated with sampling methods. However, the gradient estimation is a computational bottleneck, so better use of the gradients will speed up the descent algorithm. To this end, we first derive upper bounds on the RBM cost function, then show that descent methods can have natural ad- vantages by operating in the L∞and Shatten-∞norm. We introduce a new method called “Stochastic Spectral Descent” that updates parameters in the normed space. Empirical results show dramatic improvements over stochastic gradient descent, and have only have a fractional increase on the per-iteration cost.'
volume: 38
URL: http://proceedings.mlr.press/v38/carlson15.html
PDF: http://proceedings.mlr.press/v38/carlson15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-carlson15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Carlson
given: David
- family: Cevher
given: Volkan
- family: Carin
given: Lawrence
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 111-119
id: carlson15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 111
lastpage: 119
published: 2015-02-21 00:00:00 +0000
- title: 'Implementable confidence sets in high dimensional regression'
abstract: 'We consider the setting of linear regression in high dimension. We focus on the problem of constructing adaptive and honest confidence sets for the sparse parameter θ, i.e. we want to construct a confidence set for theta that contains theta with high probability, and that is as small as possible. The l_2 diameter of a such confidence set should depend on the sparsity S of θ- the larger S, the wider the confidence set. However, in practice, S is unknown. This paper focuses on constructing a confidence set for θwhich contains θwith high probability, whose diameter is adaptive to the unknown sparsity S, and which is implementable in practice.'
volume: 38
URL: http://proceedings.mlr.press/v38/carpentier15.html
PDF: http://proceedings.mlr.press/v38/carpentier15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-carpentier15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Carpentier
given: Alexandra
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 120-128
id: carpentier15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 120
lastpage: 128
published: 2015-02-21 00:00:00 +0000
- title: 'Online Ranking with Top-1 Feedback'
abstract: 'We consider a setting where a system learns to rank a fixed set of m items. The goal is produce good item rankings for users with diverse interests who interact online with the system for T rounds. We consider a novel top-1 feedback model: at the end of each round, the relevance score for only the top ranked object is revealed. However, the performance of the system is judged on the entire ranked list. We provide a comprehensive set of results regarding learnability under this challenging setting. For PairwiseLoss and DCG, two popular ranking measures, we prove that the minimax regret is Θ(T^2/3). Moreover, the minimax regret is achievable using an efficient strategy that only spends O(m \log m) time per round. The same efficient strategy achieves O(T^2/3) regret for Precision@k. Surprisingly, we show that for normalized versions of these ranking measures, i.e., AUC, NDCG & MAP, no online ranking algorithm can have sublinear regret.'
volume: 38
URL: http://proceedings.mlr.press/v38/chaudhuri15.html
PDF: http://proceedings.mlr.press/v38/chaudhuri15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-chaudhuri15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chaudhuri
given: Sougata
- family: Tewari
given: Ambuj
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 129-137
id: chaudhuri15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 129
lastpage: 137
published: 2015-02-21 00:00:00 +0000
- title: 'One-bit Compressed Sensing with the k-Support Norm'
abstract: 'In one-bit compressed sensing (1-bit CS), one attempts to estimate a structured parameter (signal) only using the sign of suitable linear measurements. In this paper, we investigate 1-bit CS problems for sparse signals using the recently proposed k-support norm. We show that the new estimator has a closed-form solution, so no optimization is needed. We establish consistency and recovery guarantees of the estimator for both Gaussian and sub-Gaussian random measurements. For Gaussian measurements, our estimator is comparable to the best known in the literature, along with guarantees on support recovery. For sub-Gaussian measurements, our estimator has an irreducible error which, unlike existing results, can be controlled by scaling the measurement vectors. In both cases, our analysis covers the setting of model misspecification, i.e., when the true sparsity is unknown. Experimental results illustrate several strengths of the new estimator.'
volume: 38
URL: http://proceedings.mlr.press/v38/chen15a.html
PDF: http://proceedings.mlr.press/v38/chen15a.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-chen15a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Sheng
- family: Banerjee
given: Arindam
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 138-146
id: chen15a
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 138
lastpage: 146
published: 2015-02-21 00:00:00 +0000
- title: 'Efficient Second-Order Gradient Boosting for Conditional Random Fields'
abstract: 'Conditional random fields (CRFs) are an important class of models for accurate structured prediction, but effective design of the feature functions is a major challenge when applying CRF models to real world data. Gradient boosting, which is used to automatically induce and select feature functions, is a natural candidate solution to the problem. However, it is non-trivial to derive gradient boosting algorithms for CRFs due to the dense Hessian matrices introduced by variable dependencies. Existing approaches thus use only first-order information when optimizing likelihood, and hence face convergence issues. We incorporate second-order information by deriving a Markov Chain mixing rate bound to quantify the dependencies, and introduce a gradient boosting algorithm that iteratively optimizes an adaptive upper bound of the objective function. The resulting algorithm induces and selects features for CRFs via functional space optimization, with provable convergence guarantees. Experimental results on three real world datasets demonstrate that the mixing rate based upper bound is effective for learning CRFs with non-linear potentials.'
volume: 38
URL: http://proceedings.mlr.press/v38/chen15b.html
PDF: http://proceedings.mlr.press/v38/chen15b.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-chen15b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Tianqi
- family: Singh
given: Sameer
- family: Taskar
given: Ben
- family: Guestrin
given: Carlos
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 147-155
id: chen15b
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 147
lastpage: 155
published: 2015-02-21 00:00:00 +0000
- title: 'Filtered Search for Submodular Maximization with Controllable Approximation Bounds'
abstract: 'Most existing submodular maximization algorithms provide theoretical guarantees with approximation bounds. However, in many cases, users may be interested in an anytime algorithm that can offer a flexible trade-off between computation time and optimality guarantees. In this paper, we propose a filtered search (FS) framework that allows the user to set an arbitrary approximation bound guarantee with a “tunable knob”, from 0 (arbitrarily bad) to 1 (globally optimal). FS naturally handles monotone and non-monotone functions as well as unconstrained problems and problems with cardinality, matroid, and knapsack constraints. Further, it can also be applied to (non-negative) non-submodular functions and still gives controllable approximation bounds based on their submodularity ratio. Finally, FS encompasses the greedy algorithm as a special case. Our framework is based on theory in A* search, but is substantially more efficient because it only requires heuristics that are critically admissible (CA) rather than admissible—a condition that gives more effective pruning and is substantially easier to implement.'
volume: 38
URL: http://proceedings.mlr.press/v38/chen15c.html
PDF: http://proceedings.mlr.press/v38/chen15c.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-chen15c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Wenlin
- family: Chen
given: Yixin
- family: Weinberger
given: Kilian
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 156-164
id: chen15c
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 156
lastpage: 164
published: 2015-02-21 00:00:00 +0000
- title: 'Predictive Inverse Optimal Control for Linear-Quadratic-Gaussian Systems'
abstract: 'Predictive inverse optimal control is a powerful approach for estimating the control policy of an agent from observed control demonstrations. Its usefulness has been established in a number of large-scale sequential decision settings characterized by complete state observability. However, many real decisions are made in situations where the state is not fully known to the agent making decisions. Though extensions of predictive inverse optimal control to partially observable Markov decision processes have been developed, their applicability has been limited by the complexities of inference in those representations. In this work, we extend predictive inverse optimal control to the linear- quadratic-Gaussian control setting. We establish close connections between optimal control laws for this setting and the probabilistic predictions under our approach. We demonstrate the effectiveness and benefit in estimating control policies that are influenced by partial observability on both synthetic and real datasets.'
volume: 38
URL: http://proceedings.mlr.press/v38/chen15d.html
PDF: http://proceedings.mlr.press/v38/chen15d.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-chen15d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Xiangli
- family: Ziebart
given: Brian
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 165-173
id: chen15d
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 165
lastpage: 173
published: 2015-02-21 00:00:00 +0000
- title: 'Exact Bayesian Learning of Ancestor Relations in Bayesian Networks'
abstract: 'Ancestor relations in Bayesian networks (BNs) encode long-range causal relations among random variables. In this paper, we develop dynamic programming (DP) algorithms to compute the exact posterior probabilities of ancestor relations in Bayesian networks. Previous algorithm by Parviainen and Koivisto (2011) evaluates all possible ancestor relations in time O(n3^n) and space O(3^n). However, their algorithm assumes an order-modular prior over DAGs that does not respect Markov equivalence. The resulting posteriors would bias towards DAGs consistent with more linear orders. To adhere to the uniform prior, we develop a new DP algorithm that computes the exact posteriors of all possible ancestor relations in time O(n^25^n-1) and space O(3^n). We also discuss the extension of our algorithm to computing the posteriors of s >p >t relations, i.e., a directed path from s to t via p. We apply our algorithm to a biological data set for discovering protein signaling pathways.'
volume: 38
URL: http://proceedings.mlr.press/v38/chen15e.html
PDF: http://proceedings.mlr.press/v38/chen15e.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-chen15e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Chen
given: Yetian
- family: Meng
given: Lingjian
- family: Tian
given: Jin
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 174-182
id: chen15e
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 174
lastpage: 182
published: 2015-02-21 00:00:00 +0000
- title: 'Model Selection for Topic Models via Spectral Decomposition'
abstract: 'Topic models have achieved significant successes in analyzing large-scale text corpus. In practical applications, we are always confronted with the challenge of model selection, i.e., how to appropriately set the number of topics. Following the recent advances in topic models via tensor decomposition, we make a first attempt to provide theoretical analysis on model selection in latent Dirichlet allocation. With mild conditions, we derive the upper bound and lower bound on the number of topics given a text collection of finite size. Experimental results demonstrate that our bounds are correct and tight. Furthermore, using Gaussian mixture model as an example, we show that our methodology can be easily generalized to model selection analysis in other latent models.'
volume: 38
URL: http://proceedings.mlr.press/v38/cheng15.html
PDF: http://proceedings.mlr.press/v38/cheng15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-cheng15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Cheng
given: Dehua
- family: He
given: Xinran
- family: Liu
given: Yan
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 183-191
id: cheng15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 183
lastpage: 191
published: 2015-02-21 00:00:00 +0000
- title: 'The Loss Surfaces of Multilayer Networks'
abstract: 'We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network through the prism of the results from random matrix theory. We show that for large-size decoupled networks the lowest critical values of the random loss function form a layered structure and they are located in a well-defined band lower-bounded by the global minimum. The number of local minima outside that band diminishes exponentially with the size of the network. We empirically verify that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of high dependencies in real networks. We conjecture that both simulated annealing and SGD converge to the band of low critical points, and that all critical points found there are local minima of high quality measured by the test error. This emphasizes a major difference between large- and small-size networks where for the latter poor quality local minima have non-zero probability of being recovered. Finally, we prove that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.'
volume: 38
URL: http://proceedings.mlr.press/v38/choromanska15.html
PDF: http://proceedings.mlr.press/v38/choromanska15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-choromanska15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Choromanska
given: Anna
- family: Henaff
given: MIkael
- family: Mathieu
given: Michael
- family: Ben Arous
given: Gerard
- family: LeCun
given: Yann
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 192-204
id: choromanska15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 192
lastpage: 204
published: 2015-02-21 00:00:00 +0000
- title: 'Averaged Least-Mean-Squares: Bias-Variance Trade-offs and Optimal Sampling Distributions'
abstract: 'We consider the least-squares regression problem and provide a detailed asymptotic analysis of the performance of averaged constant-step-size stochastic gradient descent. In the strongly-convex case, we provide an asymptotic expansion up to explicit exponentially decaying terms. Our analysis leads to new insights into stochastic approximation algorithms: (a) it gives a tighter bound on the allowed step-size; (b) the generalization error may be divided into a variance term which is decaying as O(1/n), independently of the step-size g, and a bias term that decays as O(1/g^2 n^2); (c) when allowing non-uniform sampling of examples over a dataset, the choice of a good sampling density depends on the trade-off between bias and variance: when the variance term dominates, optimal sampling densities do not lead to much gain, while when the bias term dominates, we can choose larger step-sizes that lead to significant improvements.'
volume: 38
URL: http://proceedings.mlr.press/v38/defossez15.html
PDF: http://proceedings.mlr.press/v38/defossez15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-defossez15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Defossez
given: Alexandre
- family: Bach
given: Francis
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 205-213
id: defossez15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 205
lastpage: 213
published: 2015-02-21 00:00:00 +0000
- title: 'A Topic Modeling Approach to Ranking'
abstract: 'We propose a topic modeling approach to the prediction of preferences in pairwise comparisons. We develop a new generative model for pairwise comparisons that accounts for multiple shared latent rankings that are prevalent in a population of users. This new model also captures inconsistent user behavior in a natural way. We show how the estimation of latent rankings in the new generative model can be formally reduced to the estimation of topics in a statistically equivalent topic modeling problem. We leverage recent advances in the topic modeling literature to develop an algorithm that can learn shared latent rankings with provable consistency as well as sample and computational complexity guarantees. We demonstrate that the new approach is empirically competitive with the current state-of-the-art approaches in predicting preferences on some semi-synthetic and real world datasets.'
volume: 38
URL: http://proceedings.mlr.press/v38/ding15.html
PDF: http://proceedings.mlr.press/v38/ding15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-ding15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ding
given: Weicong
- family: Ishwar
given: Prakash
- family: Saligrama
given: Venkatesh
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 214-222
id: ding15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 214
lastpage: 222
published: 2015-02-21 00:00:00 +0000
- title: 'A totally unimodular view of structured sparsity'
abstract: 'This paper describes a simple framework for structured sparse recovery based on convex optimization. We show that many structured sparsity models can be naturally represented by linear matrix inequalities on the support of the unknown parameters, where the constraint matrix has a totally unimodular (TU) structure. For such structured models, tight convex relaxations can be obtained in polynomial time via linear programming. Our modeling framework unifies the prevalent structured sparsity norms in the literature, introduces new interesting ones, and renders their tightness and tractability arguments transparent.'
volume: 38
URL: http://proceedings.mlr.press/v38/elhalabi15.html
PDF: http://proceedings.mlr.press/v38/elhalabi15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-elhalabi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: El Halabi
given: Marwa
- family: Cevher
given: Volkan
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 223-231
id: elhalabi15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 223
lastpage: 231
published: 2015-02-21 00:00:00 +0000
- title: 'Back to the Past: Source Identification in Diffusion Networks from Partially Observed Cascades'
abstract: 'When a piece of malicious information becomes rampant in an information diffusion network, can we identify the source node that originally introduced the piece into the network and infer the time when it initiated this? Being able to do so is critical for curtailing the spread of malicious information, and reducing the potential losses incurred. This is a very challenging problem since typically only incomplete traces are observed and we need to unroll the incomplete traces into the past in order to pinpoint the source. In this paper, we tackle this problem by developing a two-stage framework, which first learns a continuous-time diffusion network based on historical diffusion traces and then identifies the source of an incomplete diffusion trace by maximizing the likelihood of the trace under the learned model. Experiments on both large synthetic and real-world data show that our framework can effectively go back to the past, and pinpoint the source node and its initiation time significantly more accurately than previous state-of-the-arts.'
volume: 38
URL: http://proceedings.mlr.press/v38/farajtabar15.html
PDF: http://proceedings.mlr.press/v38/farajtabar15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-farajtabar15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Farajtabar
given: Mehrdad
- family: Gomez Rodriguez
given: Manuel
- family: Zamani
given: Mohammad
- family: Du
given: Nan
- family: Zha
given: Hongyuan
- family: Song
given: Le
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 232-240
id: farajtabar15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 232
lastpage: 240
published: 2015-02-21 00:00:00 +0000
- title: 'Graph Approximation and Clustering on a Budget'
abstract: 'We consider the problem of learning from a similarity matrix (such as spectral clustering and low-dimensional embedding), when computing pairwise similarities are costly, and only a limited number of entries can be observed. We provide a theoretical analysis using standard notions of graph approximation, significantly generalizing previous results, which focused on spectral clustering with two clusters. We also propose a new algorithmic approach based on adaptive sampling, which experimentally matches or improves on previous methods, while being considerably more general and computationally cheaper.'
volume: 38
URL: http://proceedings.mlr.press/v38/fetaya15.html
PDF: http://proceedings.mlr.press/v38/fetaya15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-fetaya15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Fetaya
given: Ethan
- family: Shamir
given: Ohad
- family: Ullman
given: Shimon
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 241-249
id: fetaya15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 241
lastpage: 249
published: 2015-02-21 00:00:00 +0000
- title: 'A Sufficient Statistics Construction of Exponential Family Le ́vy Measure Densities for Nonparametric Conjugate Models'
abstract: 'Conjugate pairs of distributions over infinite dimensional spaces are prominent in machine learning, particularly due to the widespread adoption of Bayesian nonparametric method- ologies for a host of models and applications. Much of the existing literature in the learning community focuses on processes possessing some form of computationally tractable con- jugacy as is the case for the beta process and the gamma process (and, via normalization, the Dirichlet process). For these processes, conjugacy is proved via statistical machinery tailored to the particular model. We seek to address the problem of obtaining a general construction of prior distributions over infi- nite dimensional spaces possessing distribu- tional properties amenable to conjugacy. Our result is achieved by generalizing Hjort’s con- struction of the beta process via appropriate utilization of sufficient statistics for exponen- tial families.'
volume: 38
URL: http://proceedings.mlr.press/v38/finn15.html
PDF: http://proceedings.mlr.press/v38/finn15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-finn15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Finn
given: Robert
- family: Kulis
given: Brian
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 250-258
id: finn15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 250
lastpage: 258
published: 2015-02-21 00:00:00 +0000
- title: 'Computational Complexity of Linear Large Margin Classification With Ramp Loss'
abstract: 'Minimizing the binary classification error with a linear model leads to an NP-hard problem. In practice, surrogate loss functions are used, in particular loss functions leading to large margin classification such as the hinge loss and the ramp loss. The intuitive large margin concept is theoretically supported by generalization bounds linking the expected classification error to the empirical margin error and the complexity of the considered hypotheses class. This article addresses the fundamental question about the computational complexity of determining whether there is a hypotheses class with a hypothesis such that the upper bound on the generalization error is below a certain value. Results of this type are important for model comparison and selection. This paper takes a first step and proves that minimizing a basic margin-bound is NP-hard when considering linear hypotheses and the rho-margin loss function, which generalizes the ramp loss. This result directly implies the hardness of ramp loss minimization.'
volume: 38
URL: http://proceedings.mlr.press/v38/frejstrupmaibing15.html
PDF: http://proceedings.mlr.press/v38/frejstrupmaibing15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-frejstrupmaibing15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Frejstrup Maibing
given: Søren
- family: Igel
given: Christian
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 259-267
id: frejstrupmaibing15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 259
lastpage: 267
published: 2015-02-21 00:00:00 +0000
- title: 'Learning Deep Sigmoid Belief Networks with Data Augmentation'
abstract: 'Deep directed generative models are developed. The multi-layered model is designed by stacking sigmoid belief networks, with sparsity-encouraging priors placed on the model parameters. Learning and inference of layer-wise model parameters are implemented in a Bayesian setting. By exploring the idea of data augmentation and introducing auxiliary Polya-Gamma variables, simple and efficient Gibbs sampling and mean-field variational Bayes (VB) inference are implemented. To address large-scale datasets, an online version of VB is also developed. Experimental results are presented for three publicly available datasets: MNIST, Caltech 101 Silhouettes and OCR letters.'
volume: 38
URL: http://proceedings.mlr.press/v38/gan15.html
PDF: http://proceedings.mlr.press/v38/gan15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-gan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gan
given: Zhe
- family: Henao
given: Ricardo
- family: Carlson
given: David
- family: Carin
given: Lawrence
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 268-276
id: gan15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 268
lastpage: 276
published: 2015-02-21 00:00:00 +0000
- title: 'Efficient Estimation of Mutual Information for Strongly Dependent Variables'
abstract: 'We demonstrate that a popular class of non-parametric mutual information (MI) estimators based on k-nearest-neighbor graphs requires number of samples that scales exponentially with the true MI. Consequently, accurate estimation of MI between two strongly dependent variables is possible only for prohibitively large sample size. This important yet overlooked shortcoming of the existing estimators is due to their implicit reliance on local uniformity of the underlying joint distribution. We introduce a new estimator that is robust to local non-uniformity, works well with limited data, and is able to capture relationship strengths over many orders of magnitude. We demonstrate the superior performance of the proposed estimator on both synthetic and real-world data.'
volume: 38
URL: http://proceedings.mlr.press/v38/gao15.html
PDF: http://proceedings.mlr.press/v38/gao15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-gao15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gao
given: Shuyang
- family: Ver Steeg
given: Greg
- family: Galstyan
given: Aram
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 277-286
id: gao15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 277
lastpage: 286
published: 2015-02-21 00:00:00 +0000
- title: 'On Anomaly Ranking and Excess-Mass Curves'
abstract: 'Learning how to rank multivariate unlabeled observations depending on their degree of abnormality/novelty is a crucial problem in a wide range of applications. In practice, it generally consists in building a real valued "scoring" function on the feature space so as to quantify to which extent observations should be considered as abnormal. In the 1-d situation, measurements are generally considered as ”abnormal” when they are remote from central measures such as the mean or the median. Anomaly detection then relies on tail analysis of the variable of interest. Extensions to the multivariate setting are far from straightforward and it is precisely the main purpose of this paper to introduce a novel and convenient (functional) criterion for measuring the performance of a scoring function regarding the anomaly ranking task, referred to as the Excess-Mass curve (EM-curve). In addition, an adaptive algorithm for building a scoring function based on unlabeled data with a nearly optimal EM is proposed and is analyzed from a statistical perspective.'
volume: 38
URL: http://proceedings.mlr.press/v38/goix15.html
PDF: http://proceedings.mlr.press/v38/goix15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-goix15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Goix
given: Nicolas
- family: Sabourin
given: Anne
- family: Clémençon
given: Stéphan
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 287-295
id: goix15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 287
lastpage: 295
published: 2015-02-21 00:00:00 +0000
- title: 'Modeling Skill Acquisition Over Time with Sequence and Topic Modeling'
abstract: 'Online education provides data from students solving problems at different levels of proficiency over time. Unfortunately, methods that use these data for inferring student knowledge rely on costly domain expertise. We propose three novel data-driven methods that bridge sequence modeling with topic models to infer students’ time varying knowledge. These methods differ in complexity, interpretability, accuracy and human supervision. For example, our most interpretable method has similar classification accuracy to the models created by domain experts, but requires much less effort. On the other hand, the most accurate method is completely data-driven and improves predictions by up to 15% in AUC, an evaluation metric for classifiers.'
volume: 38
URL: http://proceedings.mlr.press/v38/gonzalez-brenes15.html
PDF: http://proceedings.mlr.press/v38/gonzalez-brenes15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-gonzalez-brenes15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: González-Brenes
given: José
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 296-305
id: gonzalez-brenes15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 296
lastpage: 305
published: 2015-02-21 00:00:00 +0000
- title: 'Consistent Collective Matrix Completion under Joint Low Rank Structure'
abstract: 'We address the collective matrix completion problem of jointly recovering a collection of matrices with shared structure from partial (and potentially noisy) observations. To ensure well–posedness of the problem, we impose a joint low rank structure, wherein each component matrix is low rank and the latent space of the low rank factors corresponding to each entity is shared across the entire collection. We first develop a rigorous algebra for representing and manipulating collective–matrix structure, and identify sufficient conditions for consistent estimation of collective matrices. We then propose a tractable convex estimator for solving the collective matrix completion problem, and provide the first non–trivial theoretical guarantees for consistency of collective matrix completion. We show that under reasonable assumptions stated in Sec. 3.1, with high probability, the proposed estimator exactly recovers the true matrices whenever sample complexity requirements dictated by Theorem 1 are met. The sample complexity requirement derived in the paper are optimum up to logarithmic factors, and significantly improve upon the requirements obtained by trivial extensions of standard matrix completion. Finally, we propose a scalable approximate algorithm to solve the proposed convex program, and corroborate our results through simulated and real life experiments.'
volume: 38
URL: http://proceedings.mlr.press/v38/gunasekar15.html
PDF: http://proceedings.mlr.press/v38/gunasekar15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-gunasekar15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Gunasekar
given: Suriya
- family: Yamada
given: Makoto
- family: Yin
given: Dawei
- family: Chang
given: Yi
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 306-314
id: gunasekar15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 306
lastpage: 314
published: 2015-02-21 00:00:00 +0000
- title: 'The Bayesian Echo Chamber: Modeling Social Influence via Linguistic Accommodation'
abstract: 'We present the Bayesian Echo Chamber, a new Bayesian generative model for social interaction data. By modeling the evolution of people’s language usage over time, this model discovers latent influence relationships between them. Unlike previous work on inferring influence, which has primarily focused on simple temporal dynamics evidenced via turn-taking behavior, our model captures more nuanced influence relationships, evidenced via linguistic accommodation patterns in interaction content. The model, which is based on a discrete analog of the multivariate Hawkes process, permits a fully Bayesian inference algorithm. We validate our model’s ability to discover latent influence patterns using transcripts of arguments heard by the US Supreme Court and the movie "12 Angry Men." We showcase our model’s capabilities by using it to infer latent influence patterns from Federal Open Market Committee meeting transcripts, demonstrating state-of-the-art performance at uncovering social dynamics in group discussions.'
volume: 38
URL: http://proceedings.mlr.press/v38/guo15.html
PDF: http://proceedings.mlr.press/v38/guo15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-guo15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Guo
given: Fangjian
- family: Blundell
given: Charles
- family: Wallach
given: Hanna
- family: Heller
given: Katherine
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 315-323
id: guo15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 315
lastpage: 323
published: 2015-02-21 00:00:00 +0000
- title: 'Preserving Privacy of Continuous High-dimensional Data with Minimax Filters'
abstract: 'Preserving privacy of high-dimensional and continuous data such as images or biometric data is a challenging problem. This paper formulates this problem as a learning game between three parties: 1) data contributors using a filter to sanitize data samples, 2) a cooperative data aggregator learning a target task using the filtered samples, and 3) an adversary learning to identify contributors using the same filtered samples. Minimax filters that achieve the optimal privacy-utility trade-off from broad families of filters and loss/classifiers are defined, and algorithms for learning the filers in batch or distributed settings are presented. Experiments with several real-world tasks including facial expression recognition, speech emotion recognition, and activity recognition from motion, show that the minimax filter can simultaneously achieve similar or better target task accuracy and lower privacy risk, often significantly lower than previous methods.'
volume: 38
URL: http://proceedings.mlr.press/v38/hamm15.html
PDF: http://proceedings.mlr.press/v38/hamm15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-hamm15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hamm
given: Jihun
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 324-332
id: hamm15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 324
lastpage: 332
published: 2015-02-21 00:00:00 +0000
- title: 'A Consistent Method for Graph Based Anomaly Localization'
abstract: 'The anomaly localization task aims at detecting faulty sensors automatically by monitoring the sensor values. In this paper, we propose an anomaly localization algorithm with a consistency guarantee on its results. Although several algorithms were proposed in the last decade, the consistency of the localization results was not discussed in the literature. To the best of our knowledge, this is the first study that provides theoretical guarantees for the localization results. Our new approach is to formulate the task as solving the sparsest subgraph problem on a difference graph. Since this problem is NP-hard, we then use a convex quadratic programming approximation algorithm, which is guaranteed to be consistent under suitable conditions. Across the simulations on both synthetic and real world datasets, we verify that the proposed method achieves higher anomaly localization performance compared to existing methods.'
volume: 38
URL: http://proceedings.mlr.press/v38/hara15.html
PDF: http://proceedings.mlr.press/v38/hara15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-hara15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hara
given: Satoshi
- family: Morimura
given: Tetsuro
- family: Takahashi
given: Toshihiro
- family: Yanagisawa
given: Hiroki
- family: Suzuki
given: Taiji
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 333-341
id: hara15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 333
lastpage: 341
published: 2015-02-21 00:00:00 +0000
- title: 'Metric recovery from directed unweighted graphs'
abstract: 'We analyze directed, unweighted graphs obtained from x_i∈\RR^d by connecting vertex i to j iff |x_i - x_j| < ε(x_i). Examples of such graphs include k-nearest neighbor graphs, where ε(x_i) varies from point to point, and, arguably, many real world graphs such as co-purchasing graphs. We ask whether we can recover the underlying Euclidean metric ε(x_i) and the associated density p(x_i) given only the directed graph and d. We show that consistent recovery is possible up to isometric scaling when the vertex degree is at least ω(n^2/(2+d)\log(n)^d/(d+2)). Our estimator is based on a careful characterization of a random walk over the directed graph and the associated continuum limit. As an algorithm, it resembles the PageRank centrality metric. We demonstrate empirically that the estimator performs well on simulated examples as well as on real-world co-purchasing graphs even with a small number of points and degree scaling as low as \log(n).'
volume: 38
URL: http://proceedings.mlr.press/v38/hashimoto15.html
PDF: http://proceedings.mlr.press/v38/hashimoto15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-hashimoto15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hashimoto
given: Tatsunori
- family: Sun
given: Yi
- family: Jaakkola
given: Tommi
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 342-350
id: hashimoto15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 342
lastpage: 350
published: 2015-02-21 00:00:00 +0000
- title: 'Scalable Variational Gaussian Process Classification'
abstract: 'Gaussian process classification is a popular method with a number of appealing properties. We show how to scale the model within a variational inducing point framework, out-performing the state of the art on benchmark datasets. Importantly, the variational formulation an be exploited to allow classification in problems with millions of data points, as we demonstrate in experiments.'
volume: 38
URL: http://proceedings.mlr.press/v38/hensman15.html
PDF: http://proceedings.mlr.press/v38/hensman15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-hensman15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hensman
given: James
- family: Matthews
given: Alexander
- family: Ghahramani
given: Zoubin
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 351-360
id: hensman15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 351
lastpage: 360
published: 2015-02-21 00:00:00 +0000
- title: 'Stochastic Structured Variational Inference'
abstract: 'Stochastic variational inference makes it possible to approximate posterior distributions induced by large datasets quickly using stochastic optimization. The algorithm relies on the use of fully factorized variational distributions. However, this “mean-field” independence approximation limits the fidelity of the posterior approximation, and introduces local optima. We show how to relax the mean-field approximation to allow arbitrary dependencies between global parameters and local hidden variables, producing better parameter estimates by reducing bias, sensitivity to local optima, and sensitivity to hyperparameters.'
volume: 38
URL: http://proceedings.mlr.press/v38/hoffman15.html
PDF: http://proceedings.mlr.press/v38/hoffman15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-hoffman15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hoffman
given: Matthew
- family: Blei
given: David
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 361-369
id: hoffman15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 361
lastpage: 369
published: 2015-02-21 00:00:00 +0000
- title: 'Reliable and Scalable Variational Inference for the Hierarchical Dirichlet Process'
abstract: 'We introduce a new variational inference objective for hierarchical Dirichlet process admixture models. Our approach provides novel and scalable algorithms for learning nonparametric topic models of text documents and Gaussian admixture models of image patches. Improving on the point estimates of topic probabilities used in previous work, we define full variational posteriors for all latent variables and optimize parameters via a novel surrogate likelihood bound. We show that this approach has crucial advantages for data-driven learning of the number of topics. Via merge and delete moves that remove redundant or irrelevant topics, we learn compact and interpretable models with less computation. Scaling to millions of documents is possible using stochastic or memoized variational updates.'
volume: 38
URL: http://proceedings.mlr.press/v38/hughes15.html
PDF: http://proceedings.mlr.press/v38/hughes15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-hughes15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Hughes
given: Michael
- family: Kim
given: Dae Il
- family: Sudderth
given: Erik
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 370-378
id: hughes15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 370
lastpage: 378
published: 2015-02-21 00:00:00 +0000
- title: 'Cross-domain recommendation without shared users or items by sharing latent vector distributions'
abstract: 'We propose a cross-domain recommendation method for predicting the ratings of items in different domains, where neither users nor items are shared across domains. The proposed method is based on matrix factorization, which learns a latent vector for each user and each item. Matrix factorization techniques for a single-domain fail in the cross-domain recommendation task because the learned latent vectors are not aligned over different domains. The proposed method assumes that latent vectors in different domains are generated from a common Gaussian distribution with a full covariance matrix. By inferring the mean and covariance of the common Gaussian from given cross-domain rating matrices, the latent factors are aligned, which enables us to predict ratings in different domains. Experiments conducted on rating datasets from a wide variety of domains, e.g., movie, books and electronics, demonstrate that the proposed method achieves higher performance for predicting cross-domain ratings than existing methods.'
volume: 38
URL: http://proceedings.mlr.press/v38/iwata15.html
PDF: http://proceedings.mlr.press/v38/iwata15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-iwata15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Iwata
given: Tomoharu
- family: Koh
given: Takeuchi
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 379-387
id: iwata15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 379
lastpage: 387
published: 2015-02-21 00:00:00 +0000
- title: 'Submodular Point Processes with Applications to Machine learning'
abstract: 'We introduce a class of discrete point processes that we call the \emphSubmodular Point Processes (SPPs). These processes are characterized via a submodular (or supermodular) function, and naturally model notions of \emphinformation, coverage and \emphdiversity, as well as \emphcooperation. Unlike Log-submodular and Log-supermodular distributions (Log-SPPs) such as determinantal point processes (DPPs), SPPs are themselves submodular (or supermodular). In this paper, we analyze the computational complexity of probabilistic inference in SPPs. We show that computing the partition function for SPPs (and Log-SPPs), requires exponential complexity in the worst case, and also provide algorithms which approximate SPPs up to polynomial factors. Moreover, for several subclasses of interesting submodular functions that occur in applications, we show how we can provide efficient closed form expressions for the partition functions, and thereby marginals and conditional distributions. We also show how SPPs are closed under mixtures, thus enabling maximum likelihood based strategies for learning mixtures of submodular functions. Finally, we argue how SPPs complement existing Log-SPP distributions, and are a natural model for several applications.'
volume: 38
URL: http://proceedings.mlr.press/v38/iyer15.html
PDF: http://proceedings.mlr.press/v38/iyer15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-iyer15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Iyer
given: Rishabh
- family: Bilmes
given: Jeffrey
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 388-397
id: iyer15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 388
lastpage: 397
published: 2015-02-21 00:00:00 +0000
- title: 'Online Optimization : Competing with Dynamic Comparators'
abstract: 'Recent literature on online learning has focused on developing adaptive algorithms that take advantage of a regularity of the sequence of observations, yet retain worst-case performance guarantees. A complementary direction is to develop prediction methods that perform well against complex benchmarks. In this paper, we address these two directions together. We present a fully adaptive method that competes with dynamic benchmarks in which regret guarantee scales with regularity of the sequence of cost functions and comparators. Notably, the regret bound adapts to the smaller complexity measure in the problem environment. Finally, we apply our results to drifting zero-sum, two-player games where both players achieve no regret guarantees against best sequences of actions in hindsight.'
volume: 38
URL: http://proceedings.mlr.press/v38/jadbabaie15.html
PDF: http://proceedings.mlr.press/v38/jadbabaie15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-jadbabaie15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jadbabaie
given: Ali
- family: Rakhlin
given: Alexander
- family: Shahrampour
given: Shahin
- family: Sridharan
given: Karthik
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 398-406
id: jadbabaie15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 398
lastpage: 406
published: 2015-02-21 00:00:00 +0000
- title: 'Estimating the accuracies of multiple classifiers without labeled data'
abstract: 'In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the reliability of these different classifiers, is it possible to consistently and computationally efficiently estimate their accuracies? Furthermore, also in a completely unsupervised manner, can one construct a more accurate unsupervised ensemble classifier? In this paper, focusing on the binary case, we present simple, computationally efficient algorithms to solve these questions. Furthermore, under standard classifier independence assumptions, we prove our methods are consistent and study their asymptotic error. Our approach is spectral, based on the fact that the off-diagonal entries of the classifiers’ covariance matrix and 3-d tensor are rank-one. We illustrate the competitive performance of our algorithms via extensive experiments on both artificial and real datasets.'
volume: 38
URL: http://proceedings.mlr.press/v38/jaffe15.html
PDF: http://proceedings.mlr.press/v38/jaffe15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-jaffe15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jaffe
given: Ariel
- family: Nadler
given: Boaz
- family: Kluger
given: Yuval
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 407-415
id: jaffe15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 407
lastpage: 415
published: 2015-02-21 00:00:00 +0000
- title: 'Sparse Dueling Bandits'
abstract: 'The dueling bandit problem is a variation of the classical multi-armed bandit in which the allowable actions are noisy comparisons between pairs of arms. This paper focuses on a new approach for finding the best arm according to the Borda criterion using noisy comparisons. We prove that in the absence of structural assumptions, the sample complexity of this problem is proportional to the sum of the inverse gaps squared of the Borda scores of each arm. We explore this dependence further and consider structural constraints on the pairwise comparison matrix (a particular form of sparsity natural to this problem) that can significantly reduce the sample complexity. This motivates a new algorithm called Successive Elimination with Comparison Sparsity (SECS) that exploits sparsity to find the Borda winner using fewer samples than standard algorithms. We also evaluate the new algorithm experimentally with synthetic and real data. The results show that the sparsity model and the new algorithm can provide significant improvements over standard approaches.'
volume: 38
URL: http://proceedings.mlr.press/v38/jamieson15.html
PDF: http://proceedings.mlr.press/v38/jamieson15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-jamieson15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jamieson
given: Kevin
- family: Katariya
given: Sumeet
- family: Deshpande
given: Atul
- family: Nowak
given: Robert
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 416-424
id: jamieson15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 416
lastpage: 424
published: 2015-02-21 00:00:00 +0000
- title: 'Consensus Message Passing for Layered Graphical Models'
abstract: 'Generative models provide a powerful framework for probabilistic reasoning. However, in many domains their use has been hampered by the practical difficulties of inference. This is particularly the case in computer vision, where models of the imaging process tend to be large, loopy and layered. For this reason bottom-up conditional models have traditionally dominated in such domains. We find that widely-used, general-purpose message passing inference algorithms such as Expectation Propagation (EP) and Variational Message Passing (VMP) fail on the simplest of vision models. With these models in mind, we introduce a modification to message passing that learns to exploit their layered structure by passing ’consensus’ messages that guide inference towards good solutions. Experiments on a variety of problems show that the proposed technique leads to significantly more accurate inference results, not only when compared to standard EP and VMP, but also when compared to competitive bottom-up conditional models.'
volume: 38
URL: http://proceedings.mlr.press/v38/jampani15.html
PDF: http://proceedings.mlr.press/v38/jampani15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-jampani15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Jampani
given: Varun
- family: Eslami
given: S. M. Ali
- family: Tarlow
given: Daniel
- family: Kohli
given: Pushmeet
- family: Winn
given: John
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 425-433
id: jampani15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 425
lastpage: 433
published: 2015-02-21 00:00:00 +0000
- title: 'Robust Cost Sensitive Support Vector Machine'
abstract: 'In this paper we consider robust classifications and show equivalence between the regularized classifications. In general, robust classifications are used to create a classifier robust to data by taking into account the uncertainty of the data. Our result shows that regularized classifications inherit robustness and provide reason on why some regularized classifications tend to be robust against data. Although most robust classification problems assume that every uncertain data lie within an identical bounded set, this paper considers a generalized model where the sizes of the bounded sets are different for each data. These models can be transformed into regularized classification models where the penalties for each data are assigned according to their losses. We see that considering such models opens up for new applications. For an example, we show that this robust classification technique can be used for Imbalanced Data Learning. We conducted experimentation with actual data and compared it with other IDL algorithms such as Cost Sensitive SVMs. This is a novel usage for the robust classification scheme and encourages it to be a suitable candidate for imbalanced data learning.'
volume: 38
URL: http://proceedings.mlr.press/v38/katsumata15.html
PDF: http://proceedings.mlr.press/v38/katsumata15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-katsumata15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Katsumata
given: Shuichi
- family: Takeda
given: Akiko
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 434-443
id: katsumata15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 434
lastpage: 443
published: 2015-02-21 00:00:00 +0000
- title: 'On Approximate Non-submodular Minimization via Tree-Structured Supermodularity'
abstract: 'We address the problem of minimizing non-submodular functions where the supermodularity is restricted to tree-structured pairwise terms. We are motivated by several real world applications, which require submodularity along with structured supermodularity, and this forms a rich class of expressive models, where the non-submodularity is restricted to a tree. While this problem is NP hard (as we show), we develop several practical algorithms to find approximate and near-optimal solutions for this problem, some of which provide lower and others of which provide upper bounds thereby allowing us to compute a tightness gap. We also show that some of our algorithms can be extended to handle more general forms of supermodularity restricted to arbitrary pairwise terms. We compare our algorithms on synthetic data, and also demonstrate the advantage of the formulation on the real world application of image segmentation, where we incorporate structured supermodularity into higher-order submodular energy minimization.'
volume: 38
URL: http://proceedings.mlr.press/v38/kawahara15.html
PDF: http://proceedings.mlr.press/v38/kawahara15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-kawahara15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kawahara
given: Yoshinobu
- family: Iyer
given: Rishabh
- family: Bilmes
given: Jeffrey
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 444-452
id: kawahara15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 444
lastpage: 452
published: 2015-02-21 00:00:00 +0000
- title: 'Sparse Submodular Probabilistic PCA'
abstract: 'We propose a novel approach for sparse probabilistic principal component analysis, that combines a low rank representation for the latent factors and loadings with a novel sparse variational inference approach for estimating distributions of latent variables subject to sparse support constraints. Inference and parameter estimation for the resulting model is achieved via expectation maximization with a novel variational inference method for the E-step that induces sparsity. We show that this inference problem can be reduced to discrete optimal support selection. The discrete optimization is submodular, hence, greedy selection is guaranteed to achieve 1-1/e fraction of the optimal. Empirical studies indicate effectiveness of the proposed approach for the recovery of a parsimonious decomposition as compared to established baseline methods. We also evaluate our method against state-of-the-art methods on high dimensional fMRI data, and show that the method performs as good as or better than other methods.'
volume: 38
URL: http://proceedings.mlr.press/v38/khanna15.html
PDF: http://proceedings.mlr.press/v38/khanna15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-khanna15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Khanna
given: Rajiv
- family: Ghosh
given: Joydeep
- family: Poldrack
given: Russell
- family: Koyejo
given: Oluwasanmi
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 453-461
id: khanna15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 453
lastpage: 461
published: 2015-02-21 00:00:00 +0000
- title: 'Latent feature regression for multivariate count data'
abstract: 'We consider the problem of regression on multivariate count data and present a Gibbs sampler for a latent feature regression model suitable for both under- and overdispersed response variables. The model learns count-valued latent features conditional on arbitrary covariates, modeling them as negative binomial variables, and maps them into the dependent count-valued observations using a Dirichlet-multinomial distribution. From another viewpoint, the model can be seen as a generalization of a specific topic model for scenarios where we are interested in generating the actual counts of observations and not just their relative frequencies and co-occurrences. The model is demonstrated on a smart traffic application where the task is to predict public transportation volume for unknown locations based on a characterization of the close-by services and venues.'
volume: 38
URL: http://proceedings.mlr.press/v38/klami15.html
PDF: http://proceedings.mlr.press/v38/klami15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-klami15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Klami
given: Arto
- family: Tripathi
given: Abhishek
- family: Sirola
given: Johannes
- family: Väre
given: Lauri
- family: Roulland
given: Frederic
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 462-470
id: klami15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 462
lastpage: 470
published: 2015-02-21 00:00:00 +0000
- title: 'Dimensionality estimation without distances'
abstract: 'While existing methods for estimating the intrinsic dimension of datasets require to know distances between data points, we consider a situation where one has considerably less information. Given a sample of points, all we get to see is who are the k nearest neighbors of every point. In other words, we get the adjacency matrix of the directed, unweighted k-nearest neighbor graph on the sample, but do not know any point coordinates or distances between the points. We provide two estimators for this situation, a naive one and a more elaborate one. Both of them can be proved to be statistically consistent. However, further theoretical and experimental evidence shows that the naive estimator performs rather poorly, whereas the elaborate one achieves results comparable to those of estimators based on distance information.'
volume: 38
URL: http://proceedings.mlr.press/v38/kleindessner15.html
PDF: http://proceedings.mlr.press/v38/kleindessner15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-kleindessner15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kleindessner
given: Matthäus
- family: Luxburg
given: Ulrike
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 471-479
id: kleindessner15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 471
lastpage: 479
published: 2015-02-21 00:00:00 +0000
- title: 'A Bayes consistent 1-NN classifier'
abstract: 'We show that a simple modification of the 1-nearest neighbor classifier yields a strongly Bayes consistent learner. Prior to this work, the only strongly Bayes consistent proximity-based method was the k-nearest neighbor classifier, for k growing appropriately with sample size. We will argue that a margin-regularized 1-NN enjoys considerable statistical and algorithmic advantages over the k-NN classifier. These include user-friendly finite-sample error bounds, as well as time- and memory-efficient learning and test-point evaluation algorithms with a principled speed-accuracy tradeoff. Encouraging empirical results are reported.'
volume: 38
URL: http://proceedings.mlr.press/v38/kontorovich15.html
PDF: http://proceedings.mlr.press/v38/kontorovich15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-kontorovich15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kontorovich
given: Aryeh
- family: Weiss
given: Roi
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 480-488
id: kontorovich15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 480
lastpage: 488
published: 2015-02-21 00:00:00 +0000
- title: 'DART: Dropouts meet Multiple Additive Regression Trees'
abstract: 'MART, an ensemble model of boosted regression trees, is known to deliver high prediction accuracy for diverse tasks, and is widely used in practice. However, it suffers an issue which we call over-specialization, wherein trees added at later iterations tend to impact the prediction of only a few instances, and make negligible contribution towards the remaining instances. This negatively affects the performance of the model on unseen data, and also makes the model over-sensitive to the contributions of the few, initially added tress. We show that the commonly used tool to address this issue, that of shrinkage, alleviates the problem only to a certain extent and the fundamental issue of over-specialization still remains. In this work, we explore a different approach to address the problem, that of employing dropouts, a tool that has been recently proposed in the context of learning deep neural networks. We propose a novel way of employing dropouts to tackle the issue of over-specialization in MART, resulting in the DART algorithm. We evaluate DART on ranking, regression and classification tasks, using large scale, publicly available datasets, and show that DART outperforms MART in each of the tasks, with a significant margin.'
volume: 38
URL: http://proceedings.mlr.press/v38/korlakaivinayak15.html
PDF: http://proceedings.mlr.press/v38/korlakaivinayak15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-korlakaivinayak15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Korlakai Vinayak
given: Rashmi
- family: Gilad-Bachrach
given: Ran
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 489-497
id: korlakaivinayak15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 489
lastpage: 497
published: 2015-02-21 00:00:00 +0000
- title: 'On Estimating L_2^2 Divergence'
abstract: 'We give a comprehensive theoretical characterization of a nonparametric estimator for the L_2^2 divergence between two continuous distributions. We first bound the rate of convergence of our estimator, showing that it is \sqrtn-consistent provided the densities are sufficiently smooth. In this smooth regime, we then show that our estimator is asymptotically normal, construct asymptotic confidence intervals, and establish a Berry-Esséen style inequality characterizing the rate of convergence to normality. We also show that this estimator is minimax optimal.'
volume: 38
URL: http://proceedings.mlr.press/v38/krishnamurthy15.html
PDF: http://proceedings.mlr.press/v38/krishnamurthy15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-krishnamurthy15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Krishnamurthy
given: Akshay
- family: Kandasamy
given: Kirthevasan
- family: Poczos
given: Barnabas
- family: Wasserman
given: Larry
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 498-506
id: krishnamurthy15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 498
lastpage: 506
published: 2015-02-21 00:00:00 +0000
- title: 'Tensor Factorization via Matrix Factorization'
abstract: 'Tensor factorization arises in many machine learning applications, such as knowledge base modeling and parameter estimation in latent variable models. However, numerical methods for tensor factorization have not reached the level of maturity of matrix factorization methods. In this paper, we propose a new algorithm for CP tensor factorization that uses random projections to reduce the problem to simultaneous matrix diagonalization. Our method is conceptually simple and also applies to non-orthogonal and asymmetric tensors of arbitrary order. We prove that a small number random projections essentially preserves the spectral information in the tensor, allowing us to remove the dependence on the eigengap that plagued earlier tensor-to-matrix reductions. Experimentally, our method outperforms existing tensor factorization methods on both simulated data and two real datasets.'
volume: 38
URL: http://proceedings.mlr.press/v38/kuleshov15.html
PDF: http://proceedings.mlr.press/v38/kuleshov15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-kuleshov15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kuleshov
given: Volodymyr
- family: Chaganty
given: Arun
- family: Liang
given: Percy
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 507-516
id: kuleshov15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 507
lastpage: 516
published: 2015-02-21 00:00:00 +0000
- title: 'Low-Rank Spectral Learning with Weighted Loss Functions'
abstract: 'Kulesza et al. recently observed that low-rank spectral learning algorithms, which discard the smallest singular values of a moment matrix during training, can behave in unexpected ways, producing large errors even when the discarded singular values are arbitrarily small. In this paper we prove that when learning predictive state representations those problematic cases disappear if we introduce a particular weighted loss function and learn using sufficiently large sets of statistics; our main result is a bound on the loss of the learned low-rank model in terms of the singular values that are discarded. Practically speaking, this suggests that regardless of the model rank we should use the largest possible sets of statistics, and we show empirically that this is true on both synthetic and real-world domains.'
volume: 38
URL: http://proceedings.mlr.press/v38/kulesza15.html
PDF: http://proceedings.mlr.press/v38/kulesza15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-kulesza15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kulesza
given: Alex
- family: Jiang
given: Nan
- family: Singh
given: Satinder
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 517-525
id: kulesza15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 517
lastpage: 525
published: 2015-02-21 00:00:00 +0000
- title: 'Symmetric Iterative Proportional Fitting'
abstract: 'Iterative Proportional Fitting (IPF) generates from an input matrix W a sequence of matrices that converges, under certain conditions, to a specific limit matrix W*. This limit is the relative-entropy nearest solution to W among all matrices of prescribed row marginals r and column marginals c. We prove this known fact by a novel strategy that contributes a pure algorithmic intuition. Then we focus on the symmetric setting: W=W’ and r=c. Since IPF inherently generates non-symmetric matrices, we introduce two symmetrized variants of IPF. We prove convergence for both of them. Further, we give a novel characterization for the existence of W* in terms of expansion properties of the undirected weighted graph represented by W. Finally, we show how our results contribute to recent work in machine learning.'
volume: 38
URL: http://proceedings.mlr.press/v38/kurras15.html
PDF: http://proceedings.mlr.press/v38/kurras15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-kurras15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kurras
given: Sven
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 526-534
id: kurras15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 526
lastpage: 534
published: 2015-02-21 00:00:00 +0000
- title: 'Tight Regret Bounds for Stochastic Combinatorial Semi-Bandits'
abstract: 'A stochastic combinatorial semi-bandit is an online learning problem where at each step a learning agent chooses a subset of ground items subject to constraints, and then observes stochastic weights of these items and receives their sum as a payoff. In this paper, we close the problem of computationally and sample efficient learning in stochastic combinatorial semi-bandits. In particular, we analyze a UCB-like algorithm for solving the problem, which is known to be computationally efficient; and prove O(K L (1 / ∆) \log n) and O(\sqrtK L n \log n) upper bounds on its n-step regret, where L is the number of ground items, K is the maximum number of chosen items, and ∆is the gap between the expected returns of the optimal and best suboptimal solutions. The gap-dependent bound is tight up to a constant factor and the gap-free bound is tight up to a polylogarithmic factor.'
volume: 38
URL: http://proceedings.mlr.press/v38/kveton15.html
PDF: http://proceedings.mlr.press/v38/kveton15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-kveton15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Kveton
given: Branislav
- family: Wen
given: Zheng
- family: Ashkan
given: Azin
- family: Szepesvari
given: Csaba
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 535-543
id: kveton15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 535
lastpage: 543
published: 2015-02-21 00:00:00 +0000
- title: 'Sequential Kernel Herding: Frank-Wolfe Optimization for Particle Filtering'
abstract: 'Recently, the Frank-Wolfe optimization algorithm was suggested as a procedure to obtain adaptive quadrature rules for integrals of functions in a reproducing kernel Hilbert space (RKHS) with a potentially faster rate of convergence than Monte Carlo integration (and “kernel herding” was shown to be a special case of this procedure). In this paper, we propose to replace the random sampling step in a particle filter by Frank-Wolfe optimization. By optimizing the position of the particles, we can obtain better accuracy than random or quasi-Monte Carlo sampling. In applications where the evaluation of the emission probabilities is expensive (such as in robot localization), the additional computational cost to generate the particles through optimization can be justified. Experiments on standard synthetic examples as well as on a robot localization task indicate indeed an improvement of accuracy over random and quasi-Monte Carlo sampling.'
volume: 38
URL: http://proceedings.mlr.press/v38/lacoste-julien15.html
PDF: http://proceedings.mlr.press/v38/lacoste-julien15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-lacoste-julien15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lacoste-Julien
given: Simon
- family: Lindsten
given: Fredrik
- family: Bach
given: Francis
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 544-552
id: lacoste-julien15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 544
lastpage: 552
published: 2015-02-21 00:00:00 +0000
- title: 'Particle Gibbs for Bayesian Additive Regression Trees'
abstract: 'Additive regression trees are flexible non-parametric models and popular off-the-shelf tools for real-world non-linear regression. In application domains, such as bioinformatics, where there is also demand for probabilistic predictions with measures of uncertainty, the Bayesian additive regression trees (BART) model, introduced by Chipman et al. (2010), is increasingly popular. As data sets have grown in size, however, the standard Metropolis–Hastings algorithms used to per- form inference in BART are proving inadequate. In particular, these Markov chains make local changes to the trees and suffer from slow mixing when the data are high- dimensional or the best-fitting trees are more than a few layers deep. We present a novel sampler for BART based on the Particle Gibbs (PG) algorithm (Andrieu et al., 2010) and a top-down particle filtering algorithm for Bayesian decision trees (Lakshminarayanan et al., 2013). Rather than making local changes to individual trees, the PG sampler proposes a complete tree to fit the residual. Experiments show that the PG sampler outperforms existing samplers in many settings.'
volume: 38
URL: http://proceedings.mlr.press/v38/lakshminarayanan15.html
PDF: http://proceedings.mlr.press/v38/lakshminarayanan15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-lakshminarayanan15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lakshminarayanan
given: Balaji
- family: Roy
given: Daniel
- family: Teh
given: Yee Whye
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 553-561
id: lakshminarayanan15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 553
lastpage: 561
published: 2015-02-21 00:00:00 +0000
- title: 'Deeply-Supervised Nets'
abstract: 'We propose deeply-supervised nets (DSN), a method that simultaneously minimizes classification error and improves the directness and transparency of the hidden layer learning process. We focus our attention on three aspects of traditional convolutional-neural-network-type (CNN-type) architectures: (1) transparency in the effect intermediate layers have on overall classification; (2) discriminativeness and robustness of learned features, especially in early layers; (3) training effectiveness in the face of “vanishing” gradients. To combat these issues, we introduce “companion” objective functions at each hidden layer, in addition to the overall objective function at the output layer (an integrated strategy distinct from layer-wise pre-training). We also analyze our algorithm using techniques extended from stochastic gradient methods. The advantages provided by our method are evident in our experimental results, showing state-of-the-art performance on MNIST, CIFAR-10, CIFAR-100, and SVHN.'
volume: 38
URL: http://proceedings.mlr.press/v38/lee15a.html
PDF: http://proceedings.mlr.press/v38/lee15a.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-lee15a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lee
given: Chen-Yu
- family: Xie
given: Saining
- family: Gallagher
given: Patrick
- family: Zhang
given: Zhengyou
- family: Tu
given: Zhuowen
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 562-570
id: lee15a
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 562
lastpage: 570
published: 2015-02-21 00:00:00 +0000
- title: 'Preferential Attachment in Graphs with Affinities'
abstract: 'Preferential attachment models for random graphs are successful in capturing many characteristics of real networks such as power law behavior. At the same time they lack flexibility to take vertex to vertex affinities into account, a feature that is commonly used in many link recommendation algorithms. We propose a random graph model based on both node attributes and preferential attachment. This approach overcomes the limitation of existing models on expressing vertex affinity and on reflecting properties of different subgraphs. We analytically prove that our model preserves the power law behavior in the degree distribution as expressed by natural graphs and we show that it satisfies the small world property. Experiments show that our model provides an excellent fit of many natural graph statistics and we provide an algorithm to infer the associated affinity function efficiently.'
volume: 38
URL: http://proceedings.mlr.press/v38/lee15b.html
PDF: http://proceedings.mlr.press/v38/lee15b.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-lee15b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lee
given: Jay
- family: Zaheer
given: Manzil
- family: Günnemann
given: Stephan
- family: Smola
given: Alex
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 571-580
id: lee15b
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 571
lastpage: 580
published: 2015-02-21 00:00:00 +0000
- title: 'Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility'
abstract: 'Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome and limit the scalability. In this paper we relax BHC into a non-probabilistic formulation, exploring smallvariance asymptotics in conjugate-exponential models. We develop a novel clustering algorithm, referred to as relaxed BHC (RBHC), from the asymptotic limit of the BHC model that exhibits the scalability of distance-based agglomerative clustering algorithms as well as the flexibility of Bayesian nonparametric models. We also investigate the reducibility of the dissimilarity measure emerged from the asymptotic limit of the BHC model, allowing us to use scalable algorithms such as the nearest neighbor chain algorithm. Numerical experiments on both synthetic and real-world datasets demonstrate the validity and high performance of our method.'
volume: 38
URL: http://proceedings.mlr.press/v38/lee15c.html
PDF: http://proceedings.mlr.press/v38/lee15c.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-lee15c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lee
given: Juho
- family: Choi
given: Seungjin
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 581-589
id: lee15c
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 581
lastpage: 589
published: 2015-02-21 00:00:00 +0000
- title: 'Modelling Policies in MDPs in Reproducing Kernel Hilbert Space'
abstract: 'We consider modelling policies for MDPs in (vector-valued) reproducing kernel Hilbert function spaces (RKHS). This enables us to work “non-parametrically” in a rich function class, and provides the ability to learn complex policies. We present a framework for performing gradient-based policy optimization in the RKHS, deriving the functional gradient of the return for our policy, which has a simple form and can be estimated efficiently. The policy representation naturally focuses on the relevant region of state space defined by the policy trajectories, and does not rely on a-priori defined basis points; this can be an advantage in high dimensions where suitable basis points may be difficult to define a-priori. The method is adaptive in the sense that the policy representation will naturally adapt to the complexity of the policy being modelled, which is achieved with standard efficient sparsification tools in an RKHS. We argue that finding a good kernel on states can be easier then remetrizing a high dimensional feature space. We demonstrate the approach on benchmark domains and a simulated quadrocopter navigation task.'
volume: 38
URL: http://proceedings.mlr.press/v38/lever15.html
PDF: http://proceedings.mlr.press/v38/lever15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-lever15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lever
given: Guy
- family: Stafford
given: Ronnie
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 590-598
id: lever15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 590
lastpage: 598
published: 2015-02-21 00:00:00 +0000
- title: 'Scalable Optimization of Randomized Operational Decisions in Adversarial Classification Settings'
abstract: 'When learning, such as classification, is used in adversarial settings, such as intrusion detection, intelligent adversaries will attempt to evade the resulting policies. The literature on adversarial machine learning aims to develop learning algorithms which are robust to such adversarial evasion, but exhibits two significant limitations: a) failure to account for operational constraints and b) a restriction that decisions are deterministic. To overcome these limitations, we introduce a conceptual separation between learning, used to infer attacker preferences, and operational decisions, which account for adversarial evasion, enforce operational constraints, and naturally admit randomization. Our approach gives rise to an intractably large linear program. To overcome scalability limitations, we introduce a novel method for estimating a compact parity basis representation for the operational decision function. Additionally, we develop an iterative constraint generation approach which embeds adversary’s best response calculation, to arrive at a scalable algorithm for computing near-optimal randomized operational decisions. Extensive experiments demonstrate the efficacy of our approach.'
volume: 38
URL: http://proceedings.mlr.press/v38/li15a.html
PDF: http://proceedings.mlr.press/v38/li15a.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-li15a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Bo
- family: Vorobeychik
given: Yevgeniy
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 599-607
id: li15a
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 599
lastpage: 607
published: 2015-02-21 00:00:00 +0000
- title: 'Toward Minimax Off-policy Value Estimation'
abstract: 'This paper studies the off-policy evaluation problem, where one aims to estimate the value of a target policy based on a sample of observations collected by another policy. We first consider the multi-armed bandit case, establish a finite-time minimax risk lower bound, and analyze the risk of three standard estimators. It is shown that in a large class of settings the so-called regression estimator is minimax optimal up to a constant that depends on the number of actions, while the other two can be arbitrarily worse even in the limit of infinitely many data points, despite their empirical success and popularity. The performance of these estimators are studied in synthetic and real problems; illustrating the nontriviality of this simple task. Finally the results are extended to the problem of off-policy evaluation in contextual bandits and fixed-horizon Markov decision processes.'
volume: 38
URL: http://proceedings.mlr.press/v38/li15b.html
PDF: http://proceedings.mlr.press/v38/li15b.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-li15b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Lihong
- family: Munos
given: Remi
- family: Szepesvari
given: Csaba
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 608-616
id: li15b
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 608
lastpage: 616
published: 2015-02-21 00:00:00 +0000
- title: 'Compressed Sensing with Very Sparse Gaussian Random Projections'
abstract: '\noindent We study the use of \em very sparse random projections \citeProc:Li_Hastie_Church_KDD06,Proc:Li_KDD07 for compressed sensing (sparse signal recovery) when the nonzero coordinates of signals can be either positive or negative. In our setting, the entries of a Gaussian design matrix are randomly sparsified so that only a very small fraction of entries are nonzero. Our proposed decoding algorithm is simple and efficient in that the major cost is one linear scan of the coordinates. Using our proposed \textbf\em “tie estimator”, we are able to recover a K-sparse signal of length N using 1.551 eK \log K/δmeasurements (where δ≤0.05 is the confidence) in one scan. The practical performance of our method, however, can be substantially better than this bound. The Gaussian design assumption is not essential although it simplifies the analysis. \noindent Prior studies have shown that existing one-scan (or roughly one-scan) recovery algorithms using sparse matrices would require substantially (e.g., one order of magnitude) more measurements than L1 decoding by linear programming, when the nonzero coordinates of signals can be either negative or positive. In this paper, following a well-known experimental setup \citeUrl:wiki_sparse, we show that, at the same number of measurements, the recovery accuracies of our proposed method are similar to the standard L1 decoding.'
volume: 38
URL: http://proceedings.mlr.press/v38/li15c.html
PDF: http://proceedings.mlr.press/v38/li15c.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-li15c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Ping
- family: Zhang
given: Cun-Hui
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 617-625
id: li15c
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 617
lastpage: 625
published: 2015-02-21 00:00:00 +0000
- title: 'Max-Margin Zero-Shot Learning for Multi-class Classification'
abstract: 'Due to the dramatic expanse of data categories and the lack of labeled instances, zero-shot learning, which transfers knowledge from observed classes to recognize unseen classes, has started drawing a lot of attention from the research community. In this paper, we propose a semi-supervised max-margin learning framework that integrates the semi-supervised classification problem over observed classes and the unsupervised clustering problem over unseen classes together to tackle zero-shot multi-class classification. By further integrating label embedding into this framework, we produce a dual formulation that permits convenient incorporation of auxiliary label semantic knowledge to improve zero-shot learning. We conduct extensive experiments on three standard image data sets to evaluate the proposed approach by comparing to two state-of-the-art methods. Our results demonstrate the efficacy of the proposed framework.'
volume: 38
URL: http://proceedings.mlr.press/v38/li15d.html
PDF: http://proceedings.mlr.press/v38/li15d.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-li15d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Xin
- family: Guo
given: Yuhong
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 626-634
id: li15d
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 626
lastpage: 634
published: 2015-02-21 00:00:00 +0000
- title: 'Conditional Restricted Boltzmann Machines for Multi-label Learning with Incomplete Labels'
abstract: 'Standard multi-label learning methods assume fully labeled training data. This assumption however is impractical in many application domains where labels are difficult to collect and missing labels are prevalent. In this paper, we develop a novel conditional restricted Boltzmann machine model to address multi-label learning with incomplete labels. It uses a restricted Boltzmann machine to capture the high-order label dependence relationships in the output space, aiming to enhance the capacity of recovering missing labels and learning high quality multi-label prediction models. Moreover, it also incorporates label co-occurrence information retrieved from auxiliary resources as prior knowledge. We perform model training by maximizing the regularized marginal conditional likelihood of the label vectors given the input features, and develop a Viterbi style EM algorithm to solve the induced optimization problem. The proposed approach is evaluated on four real word multi-label data sets by comparing to a number of state-of-the-art methods. The experimental results show it outperforms all the other comparison methods across the applied data sets.'
volume: 38
URL: http://proceedings.mlr.press/v38/li15e.html
PDF: http://proceedings.mlr.press/v38/li15e.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-li15e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Xin
- family: Zhao
given: Feipeng
- family: Guo
given: Yuhong
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 635-643
id: li15e
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 635
lastpage: 643
published: 2015-02-21 00:00:00 +0000
- title: 'Sparsistency of \ell_1-Regularized M-Estimators'
abstract: 'We consider the model selection consistency or sparsistency of a broad set of \ell_1-regularized M-estimators for linear and non-linear statistical models in a unified fashion. For this purpose, we propose the local structured smoothness condition (LSSC) on the loss function. We provide a general result giving deterministic sufficient conditions for sparsistency in terms of the regularization parameter, ambient dimension, sparsity level, and number of measurements. We show that several important statistical models have M-estimators that indeed satisfy the LSSC, and as a result, the sparsistency guarantees for the corresponding \ell_1-regularized M-estimators can be derived as simple applications of our main theorem.'
volume: 38
URL: http://proceedings.mlr.press/v38/li15f.html
PDF: http://proceedings.mlr.press/v38/li15f.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-li15f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Li
given: Yen-Huan
- family: Scarlett
given: Jonathan
- family: Ravikumar
given: Pradeep
- family: Cevher
given: Volkan
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 644-652
id: li15f
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 644
lastpage: 652
published: 2015-02-21 00:00:00 +0000
- title: 'Similarity Learning for High-Dimensional Sparse Data'
abstract: 'A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.'
volume: 38
URL: http://proceedings.mlr.press/v38/liu15.html
PDF: http://proceedings.mlr.press/v38/liu15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-liu15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Liu
given: Kuan
- family: Bellet
given: Aurélien
- family: Sha
given: Fei
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 653-662
id: liu15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 653
lastpage: 662
published: 2015-02-21 00:00:00 +0000
- title: 'Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning'
abstract: 'Faced with massive data, is it possible to trade off (statistical) risk, and (computational) space and time? This challenge lies at the heart of large-scale machine learning. Using k-means clustering as a prototypical unsupervised learning problem, we show how we can strategically summarize the data (control space) in order to trade off risk and time when data is generated by a probabilistic model. Our summarization is based on coreset constructions from computational geometry. We also develop an algorithm, TRAM, to navigate the space/time/data/risk tradeoff in practice. In particular, we show that for a fixed risk (or data size), as the data size increases (resp. risk increases) the running time of TRAM decreases. Our extensive experiments on real data sets demonstrate the existence and practical utility of such tradeoffs, not only for k-means but also for Gaussian Mixture Models.'
volume: 38
URL: http://proceedings.mlr.press/v38/lucic15.html
PDF: http://proceedings.mlr.press/v38/lucic15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-lucic15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Lucic
given: Mario
- family: Ohannessian
given: Mesrob
- family: Karbasi
given: Amin
- family: Krause
given: Andreas
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 663-671
id: lucic15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 663
lastpage: 671
published: 2015-02-21 00:00:00 +0000
- title: 'Active Pointillistic Pattern Search'
abstract: 'We introduce the problem of active pointillistic pattern search (APPS), which seeks to discover regions of a domain exhibiting desired behavior with limited observations. Unusually, the patterns we consider are defined by large-scale properties of an underlying function that we can only observe at a limited number of points. Given a description of the desired patterns (in the form of a classifier taking functional inputs), we sequentially decide where to query function values to identify as many regions matching the pattern as possible, with high confience. For one broad class of models the expected reward of each unobserved point can be computed analytically. We demonstrate the proposed algorithm on three difficult search problems: locating polluted regions in a lake via mobile sensors, forecasting winning electoral districts with minimal polling, and identifying vortices in a fluid flow simulation.'
volume: 38
URL: http://proceedings.mlr.press/v38/ma15.html
PDF: http://proceedings.mlr.press/v38/ma15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-ma15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ma
given: Yifei
- family: Sutherland
given: Dougal
- family: Garnett
given: Roman
- family: Schneider
given: Jeff
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 672-680
id: ma15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 672
lastpage: 680
published: 2015-02-21 00:00:00 +0000
- title: 'The Security of Latent Dirichlet Allocation'
abstract: 'Latent Dirichlet allocation (LDA) is an increasingly popular tool for data analysis in many domains. If LDA output affects decision making (especially when money is involved), there is an incentive for attackers to compromise it. We ask the question: how can an attacker minimally poison the corpus so that LDA produces topics that the attacker wants the LDA user to see? Answering this question is important to characterize such attacks, and to develop defenses in the future. We give a novel bilevel optimization formulation to identify the optimal poisoning attack. We present an efficient solution (up to local optima) using descent method and implicit functions. We demonstrate poisoning attacks on LDA with extensive experiments, and discuss possible defenses.'
volume: 38
URL: http://proceedings.mlr.press/v38/mei15.html
PDF: http://proceedings.mlr.press/v38/mei15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-mei15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Mei
given: Shike
- family: Zhu
given: Xiaojin
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 681-689
id: mei15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 681
lastpage: 689
published: 2015-02-21 00:00:00 +0000
- title: 'A Spectral Algorithm for Inference in Hidden semi-Markov Models'
abstract: 'Hidden semi-Markov models (HSMMs) are latent variable models which allow latent state persistence and can be viewed as a generalization of the popular hidden Markov models (HMMs). In this paper, we introduce a novel spectral algorithm to perform inference in HSMMs. Our approach is based on estimating certain sample moments, whose order depends only logarithmically on the maximum length of the hidden state persistence. Moreover, the algorithm requires only a few spectral decompositions and is therefore computationally efficient. Empirical evaluations on synthetic and real data demonstrate the promise of the algorithm.'
volume: 38
URL: http://proceedings.mlr.press/v38/melnyk15.html
PDF: http://proceedings.mlr.press/v38/melnyk15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-melnyk15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Melnyk
given: Igor
- family: Banerjee
given: Arindam
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 690-698
id: melnyk15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 690
lastpage: 698
published: 2015-02-21 00:00:00 +0000
- title: 'Efficient Training of Structured SVMs via Soft Constraints'
abstract: 'Structured output prediction is a powerful framework for jointly predicting interdependent output labels. Learning the parameters of structured predictors is a central task in machine learning applications. However, training the model from data often becomes computationally expensive. Several methods have been proposed to exploit the model structure, or decomposition, in order to obtain efficient training algorithms. In particular, methods based on linear programming relaxation, or dual decomposition, decompose the prediction task into multiple simpler prediction tasks and enforce agreement between overlapping predictions. In this work we observe that relaxing these agreement constraints and replacing them with soft constraints yields a much easier optimization problem. Based on this insight we propose an alternative training objective, analyze its theoretical properties, and derive an algorithm for its optimization. Our method, based on the Frank-Wolfe algorithm, achieves significant speedups over existing state-of-the-art methods without hurting prediction accuracy.'
volume: 38
URL: http://proceedings.mlr.press/v38/meshi15.html
PDF: http://proceedings.mlr.press/v38/meshi15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-meshi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Meshi
given: Ofer
- family: Srebro
given: Nathan
- family: Hazan
given: Tamir
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 699-707
id: meshi15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 699
lastpage: 707
published: 2015-02-21 00:00:00 +0000
- title: 'Variance Reduction via Antithetic Markov Chains'
abstract: 'We present a Monte Carlo integration method, antithetic Markov chain sampling (AMCS), that incorporates local Markov transitions in an underlying importance sampler. Like sequential Monte Carlo sampling, the proposed method uses a sequence of Markov transitions to adapt the sampling to favour more influential regions of the integrand (modes). However, AMCS differs in the type of transitions that may be used, the number of Markov chains, and the method of chain termination. In particular, from each point sampled from an initial proposal, AMCS collects a sequence of points by simulating two independent, but antithetic, Markov chains, each terminated by a sample-dependent stopping rule. This approach provides greater flexibility for targeting influential areas while eliminating the need to fix the length of the Markov chain a priori. We show that the resulting estimator is unbiased and can reduce variance on peaked multi-modal integrands that challenge existing methods.'
volume: 38
URL: http://proceedings.mlr.press/v38/neufeld15.html
PDF: http://proceedings.mlr.press/v38/neufeld15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-neufeld15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Neufeld
given: James
- family: Schuurmans
given: Dale
- family: Bowling
given: Michael
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 708-716
id: neufeld15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 708
lastpage: 716
published: 2015-02-21 00:00:00 +0000
- title: 'Fast Function to Function Regression'
abstract: 'We analyze the problem of regression when both input covariates and output responses are functions from a nonparametric function class. Function to function regression (FFR) covers a large range of interesting applications including time-series prediction problems, and also more general tasks like studying a mapping between two separate types of distributions. However, previous nonparametric estimators for FFR type problems scale badly computationally with the number of input/output pairs in a data-set. Given the complexity of a mapping between general functions it may be necessary to consider large data-sets in order to achieve a low estimation risk. To address this issue, we develop a novel scalable nonparametric estimator, the Triple-Basis Estimator (3BE), which is capable of operating over datasets with many instances. To the best of our knowledge, the 3BE is the first nonparametric FFR estimator that can scale to massive data-sets. We analyze the 3BE’s risk and derive an upperbound rate. Furthermore, we show an improvement of several orders of magnitude in terms of prediction speed and a reduction in error over previous estimators in various real-world data-sets.'
volume: 38
URL: http://proceedings.mlr.press/v38/oliva15.html
PDF: http://proceedings.mlr.press/v38/oliva15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-oliva15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Oliva
given: Junier
- family: Neiswanger
given: William
- family: Poczos
given: Barnabas
- family: Xing
given: Eric
- family: Trac
given: Hy
- family: Ho
given: Shirley
- family: Schneider
given: Jeff
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 717-725
id: oliva15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 717
lastpage: 725
published: 2015-02-21 00:00:00 +0000
- title: 'Reactive bandits with attitude'
abstract: 'We consider a general class of K-armed bandits that adapt to the actions of the player. A single continuous parameter characterizes the “attitude” of the bandit, ranging from stochastic to cooperative or to fully adversarial in nature. The player seeks to maximize the expected return from the adaptive bandit, and the associated optimization problem is related to the free energy of a statistical mechanical system under an external field. When the underlying stochastic distribution is Gaussian, we derive an analytic solution for the long run optimal player strategy for different regimes of the bandit. In the fully adversarial limit, this solution is equivalent to the Nash equilibrium of a two-player, zero-sum semi-infinite game. We show how optimal strategies can be learned from sequential draws and reward observations in these adaptive bandits using Bayesian filtering and Thompson sampling. Results show the qualitative difference in policy pseudo-regret between our proposed strategy and other well-known bandit algorithms.'
volume: 38
URL: http://proceedings.mlr.press/v38/ortega15.html
PDF: http://proceedings.mlr.press/v38/ortega15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-ortega15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ortega
given: Pedro
- family: Kim
given: Kee-Eung
- family: Lee
given: Daniel
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 726-734
id: ortega15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 726
lastpage: 734
published: 2015-02-21 00:00:00 +0000
- title: 'Feature Selection for Linear SVM with Provable Guarantees'
abstract: 'We give two provably accurate feature-selection techniques for the linear SVM. The algorithms run in deterministic and randomized time respectively. Our algorithms can be used in an unsupervised or supervised setting. The supervised approach is based on sampling features from support vectors. We prove that the margin in the feature space is preserved to within ε-relative error of the margin in the full feature space in the worst-case. In the unsupervised setting, we also provide worst-case guarantees of the radius of the minimum enclosing ball, thereby ensuring comparable generalization as in the full feature space and resolving an open problem posed in Dasgupta et al. We present extensive experiments on real-world datasets to support our theory and to demonstrate that our methods are competitive and often better than prior state-of-the-art, for which there are no known provable guarantees.'
volume: 38
URL: http://proceedings.mlr.press/v38/paul15.html
PDF: http://proceedings.mlr.press/v38/paul15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-paul15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Paul
given: Saurabh
- family: Magdon-Ismail
given: Malik
- family: Drineas
given: Petros
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 735-743
id: paul15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 735
lastpage: 743
published: 2015-02-21 00:00:00 +0000
- title: 'On Theoretical Properties of Sum-Product Networks'
abstract: 'Sum-product networks (SPNs) are a promising avenue for probabilistic modeling and have been successfully applied to various tasks. However, some theoretic properties about SPNs are not yet well understood. In this paper we fill some gaps in the theoretic foundation of SPNs. First, we show that the weights of any complete and consistent SPN can be transformed into locally normalized weights without changing the SPN distribution. Second, we show that consistent SPNs cannot model distributions significantly (exponentially) more compactly than decomposable SPNs. As a third contribution, we extend the inference mechanisms known for SPNs with finite states to generalized SPNs with arbitrary input distributions.'
volume: 38
URL: http://proceedings.mlr.press/v38/peharz15.html
PDF: http://proceedings.mlr.press/v38/peharz15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-peharz15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Peharz
given: Robert
- family: Tschiatschek
given: Sebastian
- family: Pernkopf
given: Franz
- family: Domingos
given: Pedro
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 744-752
id: peharz15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 744
lastpage: 752
published: 2015-02-21 00:00:00 +0000
- title: 'Robust sketching for multiple square-root LASSO problems'
abstract: 'Many learning tasks, such as cross-validation, parameter search, or leave-one-out analysis, involve multiple instances of similar problems, each instance sharing a large part of learning data with the others. We introduce a robust framework for solving multiple square-root LASSO problems, based on a sketch of the learning data that uses low-rank approximations. Our approach allows a dramatic reduction in computational effort, in effect reducing the number of observations from m (the number of observations to start with) to k (the number of singular values retained in the low-rank model), while not sacrificing—sometimes even improving—the statistical performance. Theoretical analysis, as well as numerical experiments on both synthetic and real data, illustrate the efficiency of the method in large scale applications.'
volume: 38
URL: http://proceedings.mlr.press/v38/pham15.html
PDF: http://proceedings.mlr.press/v38/pham15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-pham15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Pham
given: Vu
- family: El Ghaoui
given: Laurent
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 753-761
id: pham15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 753
lastpage: 761
published: 2015-02-21 00:00:00 +0000
- title: 'Deep Exponential Families'
abstract: 'We describe deep exponential families (DEFs), a class of latent variable models that are inspired by the hidden structures used in deep neural networks. DEFs capture a hierarchy of dependencies between latent variables, and are easily generalized to many settings through exponential families. We perform inference using recent “black box" variational inference techniques. We then evaluate various DEFs on text and combine multiple DEFs into a model for pairwise recommendation data. In an extensive study, we show that going beyond one layer improves predictions for DEFs. We demonstrate that DEFs find interesting exploratory structure in large data sets, and give better predictive performance than state-of-the-art models.'
volume: 38
URL: http://proceedings.mlr.press/v38/ranganath15.html
PDF: http://proceedings.mlr.press/v38/ranganath15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-ranganath15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ranganath
given: Rajesh
- family: Tang
given: Linpeng
- family: Charlin
given: Laurent
- family: Blei
given: David
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 762-771
id: ranganath15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 762
lastpage: 771
published: 2015-02-21 00:00:00 +0000
- title: 'On the High Dimensional Power of a Linear-Time Two Sample Test under Mean-shift Alternatives'
abstract: 'Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (\textitgeneral alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (\textitmean-shift alternatives). The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting. Specifically, we explicitly derive the power of the linear-time Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test’s power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and the first analysis of how tests designed for general alternatives perform against easier ones.'
volume: 38
URL: http://proceedings.mlr.press/v38/reddi15.html
PDF: http://proceedings.mlr.press/v38/reddi15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-reddi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Reddi
given: Sashank
- family: Ramdas
given: Aaditya
- family: Poczos
given: Barnabas
- family: Singh
given: Aarti
- family: Wasserman
given: Larry
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 772-780
id: reddi15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 772
lastpage: 780
published: 2015-02-21 00:00:00 +0000
- title: 'A Scalable Algorithm for Structured Kernel Feature Selection'
abstract: 'Kernel methods are powerful tools for nonlinear feature representation. Incorporated with structured LASSO, the kernelized structured LASSO is an effective feature selection approach that can preserve the nonlinear input-output relationships as well as the structured sparseness. But as the data dimension increases, the method can quickly become computationally prohibitive. In this paper we propose a stochastic optimization algorithm that can efficiently address this computational problem on account of the redundant kernel representations of the given data. Experiments on simulation data and PET 3D brain image data show that our method can achieve superior accuracy with less computational cost than existing methods.'
volume: 38
URL: http://proceedings.mlr.press/v38/ren15.html
PDF: http://proceedings.mlr.press/v38/ren15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-ren15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ren
given: Shaogang
- family: Huang
given: Shuai
- family: Onofrey
given: John
- family: Papademetris
given: Xenios
- family: Qian
given: Xiaoning
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 781-789
id: ren15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 781
lastpage: 789
published: 2015-02-21 00:00:00 +0000
- title: 'Learning Efficient Anomaly Detectors from K-NN Graphs'
abstract: 'We propose a non-parametric anomaly detection algorithm for high dimensional data. We score each datapoint by its average K-NN distance, and rank them accordingly. We then train limited complexity models to imitate these scores based on the max-margin learning-to-rank framework. A test-point is declared as an anomaly at α-false alarm level if the predicted score is in the α-percentile. The resulting anomaly detector is shown to be asymptotically optimal in that for any false alarm rate α, its decision region converges to the α-percentile minimum volume level set of the unknown underlying density. In addition, we test both the statistical performance and computational efficiency of our algorithm on a number of synthetic and real-data experiments. Our results demonstrate the superiority of our algorithm over existing K-NN based anomaly detection algorithms, with significant computational savings.'
volume: 38
URL: http://proceedings.mlr.press/v38/root15.html
PDF: http://proceedings.mlr.press/v38/root15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-root15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Root
given: Jonathan
- family: Qian
given: Jing
- family: Saligrama
given: Venkatesh
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 790-799
id: root15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 790
lastpage: 799
published: 2015-02-21 00:00:00 +0000
- title: 'Gamma Processes, Stick-Breaking, and Variational Inference'
abstract: 'While most Bayesian nonparametric models in machine learning have focused on the Dirichlet process, the beta process, or their variants, the gamma process has recently emerged as a useful nonparametric prior in its own right. Current inference schemes for models involving the gamma process are restricted to MCMC-based methods, which limits their scalability. In this paper, we present a variational inference framework for models involving gamma process priors. Our approach is based on a novel stick-breaking constructive definition of the gamma process. We prove correctness of this stick-breaking process by using the characterization of the gamma process as a completely random measure (CRM), and we explicitly derive the rate measure of our construction using Poisson process machinery. We also derive error bounds on the truncation of the infinite process required for variational inference, similar to the truncation analyses for other nonparametric models based on the Dirichlet and beta processes. Our representation is then used to derive a variational inference algorithm for a particular Bayesian nonparametric latent structure formulation known as the infinite Gamma-Poisson model, where the latent variables are drawn from a gamma process prior with Poisson likelihoods. Finally, we present results for our algorithm on non-negative matrix factorization tasks on document corpora, and show that we compare favorably to both sampling-based techniques and variational approaches based on beta-Bernoulli priors, as well as a direct DP-based construction of the gamma process.'
volume: 38
URL: http://proceedings.mlr.press/v38/roychowdhury15.html
PDF: http://proceedings.mlr.press/v38/roychowdhury15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-roychowdhury15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Roychowdhury
given: Anirban
- family: Kulis
given: Brian
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 800-808
id: roychowdhury15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 800
lastpage: 808
published: 2015-02-21 00:00:00 +0000
- title: 'Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation'
abstract: 'Estimation of density derivatives is a versatile tool in statistical data analysis. A naive approach is to first estimate the density and then compute its derivative. However, such a two-step approach does not work well because a good density estimator does not necessarily mean a good density-derivative estimator. In this paper, we give a direct method to approximate the density derivative without estimating the density itself. Our proposed estimator allows analytic and computationally efficient approximation of multi-dimensional high-order density derivatives, with the ability that all hyper-parameters can be chosen objectively by cross-validation. We further show that the proposed density-derivative estimator is useful in improving the accuracy of non-parametric KL-divergence estimation via metric learning. The practical superiority of the proposed method is experimentally demonstrated in change detection and feature selection.'
volume: 38
URL: http://proceedings.mlr.press/v38/sasaki15.html
PDF: http://proceedings.mlr.press/v38/sasaki15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-sasaki15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sasaki
given: Hiroaki
- family: Noh
given: Yung-Kyun
- family: Sugiyama
given: Masashi
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 809-818
id: sasaki15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 809
lastpage: 818
published: 2015-02-21 00:00:00 +0000
- title: 'Non-Uniform Stochastic Average Gradient Method for Training Conditional Random Fields'
abstract: 'We apply stochastic average gradient (SAG) algorithms for training conditional random fields (CRFs). We describe a practical implementation that uses structure in the CRF gradient to reduce the memory requirement of this linearly-convergent stochastic gradient method, propose a non-uniform sampling scheme that substantially improves practical performance, and analyze the rate of convergence of the SAGA variant under non-uniform sampling. Our experimental results reveal that our method significantly outperforms existing methods in terms of the training objective, and performs as well or better than optimally-tuned stochastic gradient methods in terms of test error.'
volume: 38
URL: http://proceedings.mlr.press/v38/schmidt15.html
PDF: http://proceedings.mlr.press/v38/schmidt15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-schmidt15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Schmidt
given: Mark
- family: Babanezhad
given: Reza
- family: Ahmed
given: Mohamed
- family: Defazio
given: Aaron
- family: Clifton
given: Ann
- family: Sarkar
given: Anoop
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 819-828
id: schmidt15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 819
lastpage: 828
published: 2015-02-21 00:00:00 +0000
- title: 'Sensor Selection for Crowdsensing Dynamical Systems'
abstract: 'We model crowdsensing as the selection of sensors with unknown variance to monitor a large linear dynamical system. To achieve low estimation error, we propose a Thompson sampling approach combining submodular optimization and a scalable online variational inference algorithm to maintain the posterior distribution over the variance. We also consider three alternative parameter estimation algorithms. We illustrate the behavior of our sensor selection algorithms on real traffic data from the city of Dublin. Our online algorithm achieves significantly lower estimation error than sensor selection using a fixed variance value for all sensors.'
volume: 38
URL: http://proceedings.mlr.press/v38/schnitzler15.html
PDF: http://proceedings.mlr.press/v38/schnitzler15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-schnitzler15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Schnitzler
given: Francois
- family: Yuan Yu
given: Jia
- family: Mannor
given: Shie
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 829-837
id: schnitzler15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 829
lastpage: 837
published: 2015-02-21 00:00:00 +0000
- title: 'A Rate of Convergence for Mixture Proportion Estimation, with Application to Learning from Noisy Labels'
abstract: 'Mixture proportion estimation (MPE) is a fundamental tool for solving a number of weakly supervised learning problems – supervised learning problems where label information is noisy or missing. Previous work on MPE has established a universally consistent estimator. In this work we establish a rate of convergence for mixture proportion estimation under an appropriate distributional assumption, and argue that this rate of convergence is useful for analyzing weakly supervised learning algorithms that build on MPE. To illustrate this idea, we examine an algorithm for classification in the presence of noisy labels based on surrogate risk minimization, and show that the rate of convergence for MPE enables proof of the algorithm’s consistency. Finally, we provide a practical implementation of mixture proportion estimation and demonstrate its efficacy in classification with noisy labels.'
volume: 38
URL: http://proceedings.mlr.press/v38/scott15.html
PDF: http://proceedings.mlr.press/v38/scott15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-scott15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Scott
given: Clayton
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 838-846
id: scott15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 838
lastpage: 846
published: 2015-02-21 00:00:00 +0000
- title: 'Inference of Cause and Effect with Unsupervised Inverse Regression'
abstract: 'We address the problem of causal discovery in the two-variable case given a sample from their joint distribution. The proposed method is based on a known assumption that, if X -> Y (X causes Y), the marginal distribution of the cause, P(X), contains no information about the conditional distribution P(Y|X). Consequently, estimating P(Y|X) from P(X) should not be possible. However, estimating P(X|Y) based on P(Y) may be possible. This paper employs this asymmetry to propose CURE, a causal discovery method which decides upon the causal direction by comparing the accuracy of the estimations of P(Y|X) and P(X|Y). To this end, we propose a method for estimating a conditional from samples of the corresponding marginal, which we call unsupervised inverse GP regression. We evaluate CURE on synthetic and real data. On the latter, our method outperforms existing causal inference methods.'
volume: 38
URL: http://proceedings.mlr.press/v38/sgouritsa15.html
PDF: http://proceedings.mlr.press/v38/sgouritsa15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-sgouritsa15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sgouritsa
given: Eleni
- family: Janzing
given: Dominik
- family: Hennig
given: Philipp
- family: Schölkopf
given: Bernhard
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 847-855
id: sgouritsa15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 847
lastpage: 855
published: 2015-02-21 00:00:00 +0000
- title: 'Estimation from Pairwise Comparisons: Sharp Minimax Bounds with Topology Dependence'
abstract: 'Consider the problem of identifying the underlying qualities of a set of items based on measuring noisy comparisons between pairs of items. The Bradley-Terry-Luce (BTL) and Thurstone models are the most widely used parametric models for such pairwise comparison data. Working within a standard minimax framework, this paper provides sharp upper and lower bounds on the optimal error in estimating the underlying qualities under the BTL and the Thurstone models. These bounds are are topology-aware, meaning that they change qualitatively depending on the comparison graph induced by the subset of pairs being compared. Thus, in settings where the subset of pairs may be chosen, our results provide some principled guidelines for making this choice. Finally, we compare these error rates to those under cardinal measurement models and show that the error rates in the ordinal and cardinal settings have identical scalings apart from constant pre-factors. We use this result to investigate the relative merits of cardinal and ordinal measurement schemes.'
volume: 38
URL: http://proceedings.mlr.press/v38/shah15.html
PDF: http://proceedings.mlr.press/v38/shah15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-shah15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Shah
given: Nihar
- family: Balakrishnan
given: Sivaraman
- family: Bradley
given: Joseph
- family: Parekh
given: Abhay
- family: Ramchandran
given: Kannan
- family: Wainwright
given: Martin
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 856-865
id: shah15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 856
lastpage: 865
published: 2015-02-21 00:00:00 +0000
- title: 'Exploiting Symmetries to Construct Efficient MCMC Algorithms With an Application to SLAM'
abstract: 'The Metropolis-Hastings (MH) algorithm is a flexible method to generate samples from a target distribution, a key problem in probabilistic inference. In this paper we propose a variation of the MH algorithm based on group moves, where the next state is obtained by first choosing a random transformation of the state space and then applying this transformation to the current state. This adds much-needed flexibility to the "textbook" MH algorithm where all measures involved must be given in terms of densities with respect to a common reference measure. Under mild conditions, our main result extends the acceptance probability formula of the textbook algorithm to MH algorithms with group moves. We work out how the new algorithms can be used to exploit a problem’s natural symmetries and apply the technique to the simultaneous localization and mapping (SLAM) problem, obtaining the first fully rigorous justification of a previous MCMC-based SLAM method. New experimental results comparing our method to existing state-of-the-art specialized methods on a standard range-only SLAM benchmark problem validate the strength of the approach.'
volume: 38
URL: http://proceedings.mlr.press/v38/shariff15.html
PDF: http://proceedings.mlr.press/v38/shariff15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-shariff15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Shariff
given: Roshan
- family: György
given: András
- family: Szepesvari
given: Csaba
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 866-874
id: shariff15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 866
lastpage: 874
published: 2015-02-21 00:00:00 +0000
- title: 'Learning Where to Sample in Structured Prediction'
abstract: 'In structured prediction, most inference algorithms allocate a homogeneous amount of computation to all parts of the output, which can be wasteful when different parts vary widely in terms of difficulty. In this paper, we propose a heterogeneous approach that dynamically allocates computation to the different parts. Given a pre-trained model, we tune its inference algorithm (a sampler) to increase test-time throughput. The inference algorithm is parametrized by a meta-model and trained via reinforcement learning, where actions correspond to sampling candidate parts of the output, and rewards are log-likelihood improvements. The meta-model is based on a set of domain-general meta-features capturing the progress of the sampler. We test our approach on five datasets and show that it attains the same accuracy as Gibbs sampling but is 2 to 5 times faster.'
volume: 38
URL: http://proceedings.mlr.press/v38/shi15.html
PDF: http://proceedings.mlr.press/v38/shi15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-shi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Shi
given: Tianlin
- family: Steinhardt
given: Jacob
- family: Liang
given: Percy
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 875-884
id: shi15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 875
lastpage: 884
published: 2015-02-21 00:00:00 +0000
- title: 'State Space Methods for Efficient Inference in Student-t Process Regression'
abstract: 'The added flexibility of Student-t processes (TPs) over Gaussian processes (GPs) robustifies inference in outlier-contaminated noisy data. The uncertainties are better accounted for than in GP regression, because the predictive covariances explicitly depend on the training observations. For an entangled noise model, the canonical-form TP regression problem can be solved analytically, but the naive TP and GP solutions share the same cubic computational cost in the number of training observations. We show how a large class of temporal TP regression models can be reformulated as state space models, and how a forward filtering and backward smoothing recursion can be derived for solving the inference analytically in linear time complexity. This is a novel finding that generalizes the previously known connection between Gaussian process regression and Kalman filtering to more general elliptical processes and non-Gaussian Bayesian filtering. We derive this connection, demonstrate the benefits of the approach with examples, and finally apply the method to empirical data.'
volume: 38
URL: http://proceedings.mlr.press/v38/solin15.html
PDF: http://proceedings.mlr.press/v38/solin15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-solin15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Solin
given: Arno
- family: Särkkä
given: Simo
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 885-893
id: solin15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 885
lastpage: 893
published: 2015-02-21 00:00:00 +0000
- title: 'Learning from Data with Heterogeneous Noise using SGD'
abstract: 'We consider learning from data of variable quality that may be obtained from different heterogeneous sources. Addressing learning from heterogeneous data in its full generality is a challenging problem. In this paper, we adopt instead a model in which data is observed through heterogeneous noise, where the noise level reflects the quality of the data source. We study how to use stochastic gradient algorithms to learn in this model. Our study is motivated by two concrete examples where this problem arises naturally: learning with local differential privacy based on data from multiple sources with different privacy requirements, and learning from data with labels of variable quality. The main contribution of this paper is to identify how heterogeneous noise impacts performance. We show that given two datasets with heterogeneous noise, the order in which to use them in standard SGD depends on the learning rate. We propose a method for changing the learning rate as a function of the heterogeneity, and prove new regret bounds for our method in two cases of interest. Finally, we evaluate the performance of our algorithm on real data.'
volume: 38
URL: http://proceedings.mlr.press/v38/song15.html
PDF: http://proceedings.mlr.press/v38/song15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-song15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Song
given: Shuang
- family: Chaudhuri
given: Kamalika
- family: Sarwate
given: Anand
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 894-902
id: song15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 894
lastpage: 902
published: 2015-02-21 00:00:00 +0000
- title: 'Data modeling with the elliptical gamma distribution'
abstract: 'We study mixture modeling using the elliptical gamma (EG) distribution, a non-Gaussian distribution that allows heavy and light tail and peak behaviors. We first consider maximum likelihood parameter estimation, a task that turns out to be very challenging: we must handle positive definiteness constraints, and more crucially, we must handle possibly nonconcave log-likelihoods, which makes maximization hard. We overcome these difficulties by developing algorithms based on fixed-point theory; our methods respect the psd constraint, while also efficiently solving the (possibly) nonconcave maximization to global optimality. Subsequently, we focus on mixture modeling using EG distributions: we present a closed-form expression of the KL-divergence between two EG distributions, which we then combine with our ML estimation methods to obtain an efficient split-and-merge expectation maximization algorithm. We illustrate the use of our model and algorithms on a dataset of natural image patches.'
volume: 38
URL: http://proceedings.mlr.press/v38/sra15.html
PDF: http://proceedings.mlr.press/v38/sra15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-sra15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sra
given: Suvrit
- family: Hosseini
given: Reshad
- family: Theis
given: Lucas
- family: Bethge
given: Matthias
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 903-911
id: sra15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 903
lastpage: 911
published: 2015-02-21 00:00:00 +0000
- title: 'WASP: Scalable Bayes via barycenters of subset posteriors'
abstract: 'The promise of Bayesian methods for big data sets has not fully been realized due to the lack of scalable computational algorithms. For massive data, it is necessary to store and process subsets on different machines in a distributed manner. We propose a simple, general, and highly efficient approach, which first runs a posterior sampling algorithm in parallel on different machines for subsets of a large data set. To combine these subset posteriors, we calculate the Wasserstein barycenter via a highly efficient linear program. The resulting estimate for the Wasserstein posterior (WASP) has an atomic form, facilitating straightforward estimation of posterior summaries of functionals of interest. The WASP approach allows posterior sampling algorithms for smaller data sets to be trivially scaled to huge data. We provide theoretical justification in terms of posterior consistency and algorithm efficiency. Examples are provided in complex settings including Gaussian process regression and nonparametric Bayes mixture models.'
volume: 38
URL: http://proceedings.mlr.press/v38/srivastava15.html
PDF: http://proceedings.mlr.press/v38/srivastava15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-srivastava15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Srivastava
given: Sanvesh
- family: Cevher
given: Volkan
- family: Dinh
given: Quoc
- family: Dunson
given: David
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 912-920
id: srivastava15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 912
lastpage: 920
published: 2015-02-21 00:00:00 +0000
- title: 'Calibration of conditional composite likelihood for Bayesian inference on Gibbs random fields'
abstract: 'Gibbs random fields play an important role in statistics, however, the resulting likelihood is typically unavailable due to an intractable normalizing constant. Composite likelihoods offer a principled means to construct useful approximations. This paper provides a mean to calibrate the posterior distribution resulting from using a composite likelihood and illustrate its performance in several examples.'
volume: 38
URL: http://proceedings.mlr.press/v38/stoehr15.html
PDF: http://proceedings.mlr.press/v38/stoehr15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-stoehr15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Stoehr
given: Julien
- family: Friel
given: Nial
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 921-929
id: stoehr15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 921
lastpage: 929
published: 2015-02-21 00:00:00 +0000
- title: 'A Dirichlet Process Mixture Model for Spherical Data'
abstract: 'Directional data, naturally represented as points on the unit sphere, appear in many applications. However, unlike the case of Euclidean data, flexible mixture models on the sphere that can capture correlations, handle an unknown number of components and extend readily to high-dimensional data have yet to be suggested. For this purpose we propose a Dirichlet process mixture model of Gaussian distributions in distinct tangent spaces (DP-TGMM) to the sphere. Importantly, the formulation of the proposed model allows the extension of recent advances in efficient inference for Bayesian nonparametric models to the spherical domain. Experiments on synthetic data as well as real-world 3D surface normal and 20-dimensional semantic word vector data confirm the expressiveness and applicability of the DP-TGMM.'
volume: 38
URL: http://proceedings.mlr.press/v38/straub15.html
PDF: http://proceedings.mlr.press/v38/straub15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-straub15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Straub
given: Julian
- family: Chang
given: Jason
- family: Freifeld
given: Oren
- family: Fisher III
given: John
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 930-938
id: straub15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 930
lastpage: 938
published: 2015-02-21 00:00:00 +0000
- title: 'Inferring Block Structure of Graphical Models in Exponential Families'
abstract: 'Learning the structure of a graphical model is a fundamental problem and it is used extensively to infer the relationship between random variables. In many real world applications, we usually have some prior knowledge about the underlying graph structure, such as degree distribution and block structure. In this paper, we propose a novel generative model for describing the block structure in general exponential families, and optimize it by an Expectation-Maximization(EM) algorithm with variational Bayes. Experimental results show that our method performs well on both synthetic and real data. Further, our method can predict overlapped block structure of a graphical model in general exponential families.'
volume: 38
URL: http://proceedings.mlr.press/v38/sun15.html
PDF: http://proceedings.mlr.press/v38/sun15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-sun15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Sun
given: Siqi
- family: Wang
given: Hai
- family: Xu
given: Jinbo
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 939-947
id: sun15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 939
lastpage: 947
published: 2015-02-21 00:00:00 +0000
- title: 'Two-stage sampled learning theory on distributions'
abstract: 'We focus on the distribution regression problem: regressing to a real-valued response from a probability distribution. Although there exist a large number of similarity measures between distributions, very little is known about their generalization performance in specific learning tasks. Learning problems formulated on distributions have an inherent two-stage sampled difficulty: in practice only samples from sampled distributions are observable, and one has to build an estimate on similarities computed between sets of points. To the best of our knowledge, the only existing method with consistency guarantees for distribution regression requires kernel density estimation as an intermediate step (which suffers from slow convergence issues in high dimensions), and the domain of the distributions to be compact Euclidean. In this paper, we provide theoretical guarantees for a remarkably simple algorithmic alternative to solve the distribution regression problem: embed the distributions to a reproducing kernel Hilbert space, and learn a ridge regressor from the embeddings to the outputs. Our main contribution is to prove the consistency of this technique in the two-stage sampled setting under mild conditions (on separable, topological domains endowed with kernels). As a special case, we answer a 15-year-old open question: we establish the consistency of the classical set kernel [Haussler, 1999; Gaertner et. al, 2002] in regression, and cover more recent kernels on distributions, including those due to [Christmann and Steinwart, 2010].'
volume: 38
URL: http://proceedings.mlr.press/v38/szabo15.html
PDF: http://proceedings.mlr.press/v38/szabo15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-szabo15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Szabo
given: Zoltan
- family: Gretton
given: Arthur
- family: Poczos
given: Barnabas
- family: Sriperumbudur
given: Bharath
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 948-957
id: szabo15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 948
lastpage: 957
published: 2015-02-21 00:00:00 +0000
- title: 'Predicting Preference Reversals via Gaussian Process Uncertainty Aversion'
abstract: 'Modeling of a product or service’s attractiveness as a function of its own attributes (e.g., price and quality) is one of the foundations in econometric forecasts, which have been provided with an assumption that each human rationally has a consistent preference order among his choice decisions. Yet the preference orders by real humans become irrationally reversed, when the choice set of available options is manipulated. In order to accurately predict choice decisions involving preference reversals, which existing econometric methods have failed to incorporate, the authors introduce a new cognitive choice model whose parameters are efficiently fitted with a global convex optimization algorithm. The proposed model captures each human as a Bayesian decision maker facing a mental conflict between objective evaluation samples and a subjective prior, where the underlying objective evaluation function is rationally independent from contexts while the subjective prior is irrationally determined by each choice set. As the key idea to analytically handle the irrationality and to yield the convex optimization, the Bayesian decision mechanism is implemented as a closed-form Gaussian process regression using similarities among the available options in each context. By explaining the irrational decisions as a consequence of averting uncertainty, the proposed model outperformed the existing econometric models in predicting the irrational choice decisions recorded in real-world datasets.'
volume: 38
URL: http://proceedings.mlr.press/v38/takahashi15.html
PDF: http://proceedings.mlr.press/v38/takahashi15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-takahashi15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Takahashi
given: Rikiya
- family: Morimura
given: Tetsuro
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 958-967
id: takahashi15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 958
lastpage: 967
published: 2015-02-21 00:00:00 +0000
- title: 'Streaming Variational Inference for Bayesian Nonparametric Mixture Models'
abstract: 'In theory, Bayesian nonparametric (BNP) models are well suited to streaming data scenarios due to their ability to adapt model complexity based on the amount of data observed. Unfortunately, such benefits have not been fully realized in practice; existing inference algorithms either are not applicable to streaming applications or are not extensible to nonparametric models. For the special case of Dirichlet processes, streaming inference has been considered. However, there is growing interest in more flexible BNP models, in particular building on the class of normalized random measures (NRMs). We work within this general framework and present a streaming variational inference algorithm for NRM mixture models based on assumed density filtering. Extensions to expectation propagation algorithms are possible in the batch data setting. We demonstrate the efficacy of the algorithm on clustering documents in large, streaming text corpora.'
volume: 38
URL: http://proceedings.mlr.press/v38/tank15.html
PDF: http://proceedings.mlr.press/v38/tank15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-tank15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tank
given: Alex
- family: Foti
given: Nicholas
- family: Fox
given: Emily
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 968-976
id: tank15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 968
lastpage: 976
published: 2015-02-21 00:00:00 +0000
- title: 'Missing at Random in Graphical Models'
abstract: 'The notion of missing at random (MAR) plays a central role in the theory underlying current methods for handling missing data. However the standard definition of MAR is difficult to interpret in practice. In this paper, we assume the missing data model is represented as a directed acyclic graph that not only encodes the dependencies among the variables but also explicitly portrays the causal mechanisms responsible for the missingness process. We introduce an intuitively appealing notion of MAR in such graphical models, and establish its relation with the standard MAR and a few versions of MAR used in the literature. We address the question of whether MAR is testable, given that data are corrupted by missingness, by proposing a general method for identifying testable implications imposed by the graphical structure on the observed data.'
volume: 38
URL: http://proceedings.mlr.press/v38/tian15.html
PDF: http://proceedings.mlr.press/v38/tian15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-tian15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Tian
given: Jin
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 977-985
id: tian15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 977
lastpage: 985
published: 2015-02-21 00:00:00 +0000
- title: 'Particle Gibbs with Ancestor Sampling for Probabilistic Programs'
abstract: 'Particle Markov chain Monte Carlo techniques rank among current state-of-the-art methods for probabilistic program inference. A drawback of these techniques is that they rely on importance resampling, which results in degenerate particle trajectories and a low effective sample size for variables sampled early in a program. We here develop a formalism to adapt ancestor resampling, a technique that mitigates particle degeneracy, to the probabilistic programming setting. We present empirical results that demonstrate nontrivial performance gains.'
volume: 38
URL: http://proceedings.mlr.press/v38/vandemeent15.html
PDF: http://proceedings.mlr.press/v38/vandemeent15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-vandemeent15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Meent
given: Jan-Willem
- family: Yang
given: Hongseok
- family: Mansinghka
given: Vikash
- family: Wood
given: Frank
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 986-994
id: vandemeent15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 986
lastpage: 994
published: 2015-02-21 00:00:00 +0000
- title: 'Learning of Non-Parametric Control Policies with High-Dimensional State Features'
abstract: 'Learning complex control policies from high-dimensional sensory input is a challenge for reinforcement learning algorithms. Kernel methods that approximate values functions or transition models can address this problem. Yet, many current approaches rely on instable greedy maximization. In this paper, we develop a policy search algorithm that integrates robust policy updates and kernel embeddings. Our method can learn non-parametric control policies for infinite horizon continuous MDPs with high-dimensional sensory representations. We show that our method outperforms related approaches, and that our algorithm can learn an underpowered swing-up task task directly from high-dimensional image data.'
volume: 38
URL: http://proceedings.mlr.press/v38/vanhoof15.html
PDF: http://proceedings.mlr.press/v38/vanhoof15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-vanhoof15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Van Hoof
given: Herke
- family: Peters
given: Jan
- family: Neumann
given: Gerhard
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 995-1003
id: vanhoof15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 995
lastpage: 1003
published: 2015-02-21 00:00:00 +0000
- title: 'Maximally Informative Hierarchical Representations of High-Dimensional Data'
abstract: 'We consider a set of probabilistic functions of some input variables as a representation of the inputs. We present bounds on how informative a representation is about input data. We extend these bounds to hierarchical representations so that we can quantify the contribution of each layer towards capturing the information in the original data. The special form of these bounds leads to a simple, bottom-up optimization procedure to construct hierarchical representations that are also maximally informative about the data. This optimization has linear computational complexity and constant sample complexity in the number of variables. These results establish a new approach to unsupervised learning of deep representations that is both principled and practical. We demonstrate the usefulness of the approach on both synthetic and real-world data.'
volume: 38
URL: http://proceedings.mlr.press/v38/versteeg15.html
PDF: http://proceedings.mlr.press/v38/versteeg15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-versteeg15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Ver Steeg
given: Greg
- family: Galstyan
given: Aram
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1004-1012
id: versteeg15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1004
lastpage: 1012
published: 2015-02-21 00:00:00 +0000
- title: 'Falling Rule Lists'
abstract: 'Falling rule lists are classification models consisting of an ordered list of if-then rules, where (i) the order of rules determines which example should be classified by each rule, and (ii) the estimated probability of success decreases monotonically down the list. These kinds of rule lists are inspired by healthcare applications where patients would be stratified into risk sets and the highest at-risk patients should be considered first. We provide a Bayesian framework for learning falling rule lists that does not rely on traditional greedy decision tree learning methods.'
volume: 38
URL: http://proceedings.mlr.press/v38/wang15a.html
PDF: http://proceedings.mlr.press/v38/wang15a.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-wang15a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Fulton
- family: Rudin
given: Cynthia
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1013-1022
id: wang15a
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1013
lastpage: 1022
published: 2015-02-21 00:00:00 +0000
- title: 'Multi-Manifold Modeling in Non-Euclidean spaces'
abstract: 'This paper advocates a novel framework for segmenting a dataset on a Riemannian manifold M into clusters lying around low-dimensional submanifolds of M. Important examples of M, for which the proposed algorithm is computationally efficient, include the sphere, the set of positive definite matrices, and the Grassmannian. The proposed algorithm constructs a data-affinity matrix by thoroughly exploiting the intrinsic geometry and then applies spectral clustering. Local geometry is encoded by sparse coding and directional information of local tangent spaces and geodesics, which is important in resolving intersecting clusters and establishing the theoretical guarantees for a simplified variant of the algorithm. To avoid complication, these guarantees assume that the underlying submanifolds are geodesic. Extensive validation on synthetic and real data demonstrates the resiliency of the proposed method against deviations from the theoretical (geodesic) model as well as its superior performance over state-of-the-art techniques.'
volume: 38
URL: http://proceedings.mlr.press/v38/wang15b.html
PDF: http://proceedings.mlr.press/v38/wang15b.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-wang15b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Xu
- family: Slavakis
given: Konstantinos
- family: Lerman
given: Gilad
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1023-1032
id: wang15b
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1023
lastpage: 1032
published: 2015-02-21 00:00:00 +0000
- title: 'Column Subset Selection with Missing Data via Active Sampling'
abstract: 'Column subset selection of massive data matrices has found numerous applications in real-world data systems. In this paper, we propose and analyze two sampling based algorithms for column subset selection without access to the complete input matrix. To our knowledge, these are the first algorithms for column subset selection with missing data that are provably correct. The proposed methods work for row/column coherent matrices by employing the idea of adaptive sampling. Furthermore, when the input matrix has a noisy low-rank structure, one algorithm enjoys a relative error bound.'
volume: 38
URL: http://proceedings.mlr.press/v38/wang15c.html
PDF: http://proceedings.mlr.press/v38/wang15c.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-wang15c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Yining
- family: Singh
given: Aarti
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1033-1041
id: wang15c
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1033
lastpage: 1041
published: 2015-02-21 00:00:00 +0000
- title: 'Trend Filtering on Graphs'
abstract: 'We introduce a family of adaptive estimators on graphs, based on penalizing the \ell_1 norm of discrete graph differences. This generalizes the idea of trend filtering (Kim et al., 2009, Tibshirani, 2014) used for univariate nonparametric regression, to graphs. Analogous to the univariate case, graph trend filtering exhibits a level of local adaptivity unmatched by the usual \ell_2-based graph smoothers. It is also defined by a convex minimization problem that is readily solved (e.g., by fast ADMM or Newton algorithms). We demonstrate the merits of graph trend filtering through examples and theory.'
volume: 38
URL: http://proceedings.mlr.press/v38/wang15d.html
PDF: http://proceedings.mlr.press/v38/wang15d.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-wang15d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wang
given: Yu-Xiang
- family: Sharpnack
given: James
- family: Smola
given: Alex
- family: Tibshirani
given: Ryan
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1042-1050
id: wang15d
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1042
lastpage: 1050
published: 2015-02-21 00:00:00 +0000
- title: 'A Greedy Homotopy Method for Regression with Nonconvex Constraints'
abstract: 'The goal of this paper is to estimate sparse linear regression models, where for a given partition \mathcalG of input variables, the selected variables are chosen from a \it diverse set of groups in \mathcalG. We consider a novel class of nonconvex constraint functions, and develop RepLasso, a greedy homotopy method that exploits geometrical properties of the constraint functions to build a sequence of suitably adapted convex surrogate problems. We prove that in some situations RepLasso recovers the global minima path of the nonconvex problem. Moreover, even if it does not recover the global minima, we prove that it will often do no worse than the Lasso in terms of (signed) support recovery, while in practice outperforming it. We show empirically that the strategy can also be used to improve over various other Lasso-style algorithms. Finally, a GWAS of ankylosing spondylitis highlights our method’s practical utility.'
volume: 38
URL: http://proceedings.mlr.press/v38/wauthier15.html
PDF: http://proceedings.mlr.press/v38/wauthier15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-wauthier15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wauthier
given: Fabian
- family: Donnelly
given: Peter
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1051-1060
id: wauthier15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1051
lastpage: 1060
published: 2015-02-21 00:00:00 +0000
- title: 'Revisiting the Limits of MAP Inference by MWSS on Perfect Graphs'
abstract: 'A recent, promising approach to identifying a configuration of a discrete graphical model with highest probability (termed MAP inference) is to reduce the problem to finding a maximum weight stable set (MWSS) in a derived weighted graph, which, if perfect, allows a solution to be found in polynomial time. Weller and Jebara (2013) investigated the class of binary pairwise models where this method may be applied. However, their analysis made a seemingly innocuous assumption which simplifies analysis but led to only a subset of possible reparameterizations being considered. Here we introduce novel techniques and consider all cases, demonstrating that this greatly expands the set of tractable models. We provide a simple, exact characterization of the new, enlarged set and show how such models may be efficiently identified, thus settling the power of the approach on this class.'
volume: 38
URL: http://proceedings.mlr.press/v38/weller15.html
PDF: http://proceedings.mlr.press/v38/weller15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-weller15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Weller
given: Adrian
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1061-1069
id: weller15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1061
lastpage: 1069
published: 2015-02-21 00:00:00 +0000
- title: 'Understanding and Evaluating Sparse Linear Discriminant Analysis'
abstract: 'Linear discriminant analysis (LDA) represents a simple yet powerful technique for partitioning a p-dimensional feature vector into one of K classes based on a linear projection learned from N labeled observations. However, it is well-established that in the high-dimensional setting (p > N) the underlying projection estimator degenerates. Moreover, any linear discriminate function involving a large number of features may be difficult to interpret. To ameliorate these issues, two general categories of sparse LDA modifications have been proposed, both to reduce the number of active features and to stabilize the resulting projections. The first, based on optimal scoring, is more straightforward to implement and analyze but has been heavily criticized for its ambiguous connection with the original LDA formulation. In contrast, a second strategy applies sparse penalty functions directly to the original LDA objective but requires additional heuristic trade-off parameters, has unknown global and local minima properties, and requires a greedy sequential optimization procedure. In all cases the choice of sparse regularizer can be important, but no rigorous guidelines have been provided regarding which penalty might be preferable. Against this backdrop, we winnow down the broad space of candidate sparse LDA algorithms and promote a specific selection based on optimal scoring coupled with a particular, complementary sparse regularizer. This overall process ultimately progresses our understanding of sparse LDA in general, while leading to targeted modifications of existing algorithms that produce superior results in practice on three high-dimensional gene data sets.'
volume: 38
URL: http://proceedings.mlr.press/v38/wu15.html
PDF: http://proceedings.mlr.press/v38/wu15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-wu15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Wu
given: Yi
- family: Wipf
given: David
- family: Yun
given: Jeong-Min
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1070-1078
id: wu15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1070
lastpage: 1078
published: 2015-02-21 00:00:00 +0000
- title: 'Stochastic Block Transition Models for Dynamic Networks'
abstract: 'There has been great interest in recent years on statistical models for dynamic networks. In this paper, I propose a stochastic block transition model (SBTM) for dynamic networks that is inspired by the well-known stochastic block model (SBM) for static networks and previous dynamic extensions of the SBM. Unlike most existing dynamic network models, it does not make a hidden Markov assumption on the edge-level dynamics, allowing the presence or absence of edges to directly influence future edge probabilities while retaining the interpretability of the SBM. I derive an approximate inference procedure for the SBTM and demonstrate that it is significantly better at reproducing durations of edges in real social network data.'
volume: 38
URL: http://proceedings.mlr.press/v38/xu15.html
PDF: http://proceedings.mlr.press/v38/xu15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-xu15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Xu
given: Kevin
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1079-1087
id: xu15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1079
lastpage: 1087
published: 2015-02-21 00:00:00 +0000
- title: 'Majorization-Minimization for Manifold Embedding'
abstract: 'Nonlinear dimensionality reduction by manifold embedding has become a popular and powerful approach both for visualization and as preprocessing for predictive tasks, but more efficient optimization algorithms are still crucially needed. Majorization-Minimization (MM) is a promising approach that monotonically decreases the cost function, but it remains unknown how to tightly majorize the manifold embedding objective functions such that the resulting MM algorithms are efficient and robust. We propose a new MM procedure that yields fast MM algorithms for a wide variety of manifold embedding problems. In our majorization step, two parts of the cost function are respectively upper bounded by quadratic and Lipschitz surrogates, and the resulting upper bound can be minimized in closed form. For cost functions amenable to such QL-majorization, the MM yields monotonic improvement and is efficient: in experiments the newly developed MM algorithms outperform five state-of-the-art optimization approaches in manifold embedding tasks.'
volume: 38
URL: http://proceedings.mlr.press/v38/yang15a.html
PDF: http://proceedings.mlr.press/v38/yang15a.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-yang15a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Yang
given: Zhirong
- family: Peltonen
given: Jaakko
- family: Kaski
given: Samuel
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1088-1097
id: yang15a
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1088
lastpage: 1097
published: 2015-02-21 00:00:00 +0000
- title: 'A la Carte – Learning Fast Kernels'
abstract: 'Kernel methods have great promise for learning rich statistical representations of large modern datasets. However, compared to neural networks, kernel methods have been perceived as lacking in scalability and flexibility. We introduce a family of fast, flexible, general purpose, and lightly parametrized kernel learning methods, derived from Fastfood basis function expansions. We provide mechanisms to learn the properties of groups of spectral frequencies in these expansions, which require only O(m log d) time and O(m) memory, for m basis functions and d input dimensions. We show that the proposed methods can learn a wide class of kernels, outperforming the alternatives in accuracy, speed, and memory consumption.'
volume: 38
URL: http://proceedings.mlr.press/v38/yang15b.html
PDF: http://proceedings.mlr.press/v38/yang15b.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-yang15b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Yang
given: Zichao
- family: Wilson
given: Andrew
- family: Smola
given: Alex
- family: Song
given: Le
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1098-1106
id: yang15b
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1098
lastpage: 1106
published: 2015-02-21 00:00:00 +0000
- title: 'Minimizing Nonconvex Non-Separable Functions'
abstract: 'Regularization has played a key role in deriving sensible estimators in high dimensional statistical inference. A substantial amount of recent works has argued for nonconvex regularizers in favor of their superior theoretical properties and excellent practical performances. In a different but analogous vein, nonconvex loss functions are promoted because of their robustness against "outliers". However, these nonconvex formulations are computationally more challenging, especially in the presence of nonsmoothness and non-separability. To address this issue, we propose a new proximal gradient meta-algorithm by rigorously extending the proximal average to the nonconvex setting. We formally prove its nice convergence properties, and illustrate its effectiveness on two applications: multi-task graph-guided fused lasso and robust support vector machines. Experiments demonstrate that our method compares favorably against other alternatives.'
volume: 38
URL: http://proceedings.mlr.press/v38/yu15.html
PDF: http://proceedings.mlr.press/v38/yu15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-yu15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Yu
given: Yaoliang
- family: Zheng
given: Xun
- family: Marchetti-Bowick
given: Micol
- family: Xing
given: Eric
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1107-1115
id: yu15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1107
lastpage: 1115
published: 2015-02-21 00:00:00 +0000
- title: 'A Simple Homotopy Algorithm for Compressive Sensing'
abstract: 'In this paper, we consider the problem of recovering the s largest elements of an arbitrary vector from noisy measurements. Inspired by previous work, we develop an homotopy algorithm which solves the \ell_1-regularized least square problem for a sequence of decreasing values of the regularization parameter. Compared to the previous method, our algorithm is more efficient in the sense it only updates the solution once for each intermediate problem, and more practical in the sense it has a simple stopping criterion by checking the sparsity of the intermediate solution. Theoretical analysis reveals that our method enjoys a linear convergence rate in reducing the recovery error. Furthermore, our guarantee for recovering the top s elements of the target vector is tighter than previous results, and that for recovering the target vector itself matches the state of the art in compressive sensing.'
volume: 38
URL: http://proceedings.mlr.press/v38/zhang15.html
PDF: http://proceedings.mlr.press/v38/zhang15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-zhang15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhang
given: Lijun
- family: Yang
given: Tianbao
- family: Jin
given: Rong
- family: Zhou
given: Zhi-Hua
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1116-1124
id: zhang15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1116
lastpage: 1124
published: 2015-02-21 00:00:00 +0000
- title: 'Scalable Nonparametric Multiway Data Analysis'
abstract: 'Multiway data analysis deals with multiway arrays, i.e., tensors, and the goal is twofold: predicting missing entries by modeling the interactions between array elements and discovering hidden patterns, such as clusters or communities in each mode. Despite the success of existing tensor factorization approaches, they are either unable to capture nonlinear interactions, or computationally expensive to handle massive data. In addition, most of the existing methods lack a principled way to discover latent clusters, which is important for better understanding of the data. To address these issues, we propose a scalable nonparametric tensor decomposition model. It employs Dirichlet process mixture (DPM) prior to model the latent clusters; it uses local Gaussian processes (GPs) to capture nonlinear relationships and to improve scalability. An efficient online variational Bayes Expectation-Maximization algorithm is proposed to learn the model. Experiments on both synthetic and real-world data show that the proposed model is able to discover latent clusters with higher prediction accuracy than competitive methods. Furthermore, the proposed model obtains significantly better predictive performance than the state-of-the-art large scale tensor decomposition algorithm, GigaTensor, on two large datasets with billions of entries.'
volume: 38
URL: http://proceedings.mlr.press/v38/zhe15.html
PDF: http://proceedings.mlr.press/v38/zhe15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-zhe15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhe
given: Shandian
- family: Xu
given: Zenglin
- family: Chu
given: Xinqi
- family: Qi
given: Yuan
- family: Park
given: Youngja
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1125-1134
id: zhe15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1125
lastpage: 1134
published: 2015-02-21 00:00:00 +0000
- title: 'Infinite Edge Partition Models for Overlapping Community Detection and Link Prediction'
abstract: 'A hierarchical gamma process infinite edge partition model is proposed to factorize the binary adjacency matrix of an unweighted undirected relational network under a Bernoulli-Poisson link. The model describes both homophily and stochastic equivalence, and is scalable to big sparse networks by focusing its computation on pairs of linked nodes. It can not only discover overlapping communities and inter-community interactions, but also predict missing edges. A simplified version omitting inter-community interactions is also provided and we reveal its interesting connections to existing models. The number of communities is automatically inferred in a nonparametric Bayesian manner, and efficient inference via Gibbs sampling is derived using novel data augmentation techniques. Experimental results on four real networks demonstrate the models’ scalability and state-of-the-art performance.'
volume: 38
URL: http://proceedings.mlr.press/v38/zhou15a.html
PDF: http://proceedings.mlr.press/v38/zhou15a.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-zhou15a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhou
given: Mingyuan
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1135-1143
id: zhou15a
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1135
lastpage: 1143
published: 2015-02-21 00:00:00 +0000
- title: 'Power-Law Graph Cuts'
abstract: 'Algorithms based on spectral graph cut objectives such as normalized cuts, ratio cuts and ratio association have become popular in recent years because they are widely applicable and simple to implement via standard eigenvector computations. Despite strong performance for a number of clustering tasks, spectral graph cut algorithms still suffer from several limitations: first, they require the number of clusters to be known in advance, but this information is often unknown a priori; second, they tend to produce clusters with uniform sizes. In some cases, the true clusters exhibit a known size distribution; in image segmentation, for instance, human- segmented images tend to yield segment sizes that follow a power-law distribution. In this paper, we propose a general framework of power-law graph cut algorithms that produce clusters whose sizes are power-law distributed, and also does not fix the number of clusters upfront. To achieve our goals, we treat the Pitman-Yor exchangeable partition probability function (EPPF) as a regularizer to graph cut objectives. Because the result- ing objectives cannot be solved by relaxing via eigenvectors, we derive a simple iterative algorithm to locally optimize the objectives. Moreover, we show that our proposed algorithm can be viewed as performing MAP inference on a particular Pitman-Yor mixture model. Our experiments on various data sets show the effectiveness of our algorithms.'
volume: 38
URL: http://proceedings.mlr.press/v38/zhou15b.html
PDF: http://proceedings.mlr.press/v38/zhou15b.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-zhou15b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhou
given: Xiangyang
- family: Zhang
given: Jiaxin
- family: Kulis
given: Brian
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1144-1152
id: zhou15b
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1144
lastpage: 1152
published: 2015-02-21 00:00:00 +0000
- title: 'The Log-Shift Penalty for Adaptive Estimation of Multiple Gaussian Graphical Models'
abstract: 'Sparse Gaussian graphical models characterize sparse dependence relationships between random variables in a network. To estimate multiple related Gaussian graphical models on the same set of variables, we formulate a hierarchical model, which leads to an optimization problem with a nonconvex log-shift penalty function. We show that under mild conditions the optimization problem is convex despite the inclusion of a nonconvex penalty, and derive an efficient optimization algorithm. Experiments on both synthetic and real data show that the proposed method is able to achieve good selection and estimation performance simultaneously, because the nonconvexity of the log-shift penalty allows for weak signals to be thresholded to zero without excessive shrinkage on the strong signals.'
volume: 38
URL: http://proceedings.mlr.press/v38/zhu15.html
PDF: http://proceedings.mlr.press/v38/zhu15.pdf
edit: https://github.com/mlresearch/v38/edit/gh-pages/_posts/2015-02-21-zhu15.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics'
publisher: 'PMLR'
author:
- family: Zhu
given: Yuancheng
- family: Foygel Barber
given: Rina
editor:
- family: Lebanon
given: Guy
- family: Vishwanathan
given: S. V. N.
address: San Diego, California, USA
page: 1153-1161
id: zhu15
issued:
date-parts:
- 2015
- 2
- 21
firstpage: 1153
lastpage: 1161
published: 2015-02-21 00:00:00 +0000