- title: 'Minimax Gaussian Classification & Clustering'
  abstract: 'We present minimax bounds for classification and clustering error in the setting where covariates are drawn from a mixture of two isotropic Gaussian distributions. Here, we define clustering error in a  discriminative fashion, demonstrating fundamental connections between classification (supervised) and clustering (unsupervised). For both classification and clustering, our lower bounds show that without enough samples, the best any classifier or clustering rule can do is close to random guessing. For classification, as part of our upper bound analysis,  we show that Fisher''s linear discriminant achieves a fast minimax rate $\Theta(1/n)$ with enough samples $n$. For clustering, as part of our upper bound analysis, we show that a clustering rule constructed using principal component analysis achieves the minimax rate with enough samples. We also provide lower and upper bounds for the high-dimensional sparse  setting where the dimensionality of the covariates $p$ is potentially larger than the number of samples $n$, but where the difference between the Gaussian means is sparse.
'
  volume: 54
  URL: https://proceedings.mlr.press/v54/li17a.html
  PDF: http://proceedings.mlr.press/v54/li17a/li17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-li17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tianyang
    family: Li
  - given: Xinyang
    family: Yi
  - given: Constantine
    family: Carmanis
  - given: Pradeep
    family: Ravikumar
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1-9
  id: li17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1
  lastpage: 9
  published: 2017-04-10 00:00:00 +0000
- title: 'Conditions beyond treewidth for tightness of higher-order LP relaxations'
  abstract: 'Linear programming (LP) relaxations are a popular method to attempt to find a most likely configuration of a discrete graphical model. If a solution to the relaxed problem is obtained at an integral vertex then the solution is guaranteed to be exact and we say that the relaxation is tight. We consider binary pairwise models and introduce new methods which allow us to demonstrate refined conditions for tightness of LP relaxations in the Sherali-Adams hierarchy. Our results include showing that for higher order LP relaxations, treewidth is not precisely the right way to characterize tightness. This work is primarily theoretical, with insights that can improve efficiency in practice.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/rowland17a.html
  PDF: http://proceedings.mlr.press/v54/rowland17a/rowland17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-rowland17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mark
    family: Rowland
  - given: Aldo
    family: Pacchiano
  - given: Adrian
    family: Weller
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 10-18
  id: rowland17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 10
  lastpage: 18
  published: 2017-04-10 00:00:00 +0000
- title: 'Large-Scale Data-Dependent Kernel Approximation'
  abstract: 'Learning a computationally efficient kernel from data is an important machine learning problem. The majority of kernels in the literature do not leverage the geometry of the data, and those that do are computationally infeasible for contemporary datasets. Recent advances in approximation techniques have expanded the applicability of the kernel methodology to scale linearly with the data size. Data-dependent kernels, which could leverage this computational advantage, have however not yet seen the benefit. Here we derive an approximate large-scale learning procedure for data-dependent kernels that is efficient and performs well in practice. We provide a Lemma that can be used to derive the asymptotic convergence of the approximation in the limit of infinite random features, and, under certain conditions, an estimate of the convergence speed. We empirically prove that our construction represents a valid, yet efficient approximation of the data-dependent kernel. For large-scale datasets of millions of datapoints, where the proposed method is now applicable for the first time, we notice a significant performance boost over both baselines consisting of data independent kernels and of kernel approximations, at comparable computational cost.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/ionescu17a.html
  PDF: http://proceedings.mlr.press/v54/ionescu17a/ionescu17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-ionescu17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Catalin
    family: Ionescu
  - given: Alin
    family: Popa
  - given: Cristian
    family: Sminchisescu
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 19-27
  id: ionescu17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 19
  lastpage: 27
  published: 2017-04-10 00:00:00 +0000
- title: 'Clustering from Multiple Uncertain Experts'
  abstract: 'Utilizing expert input often improves clustering performance.   However in a knowledge discovery problem, ground truth is unknown even to an expert.   Thus, instead of one expert, we solicit the opinion from multiple experts. The key question motivating this work is: which experts should be assigned higher weights when  there is disagreement on whether to put a pair of samples in the same group? To model the uncertainty in constraints from different experts, we build a probabilistic model for  pairwise constraints through jointly modeling each expert’s accuracy and the mapping from features to latent cluster assignments.  After learning our probabilistic discriminative clustering model and accuracies of different experts, 1) samples that were not annotated by any expert can be clustered using the discriminative clustering model; and  2) experts with higher accuracies are automatically assigned higher weights in determining the  latent cluster assignments. Experimental results on UCI benchmark datasets and a real-world disease subtyping dataset demonstrate that our proposed approach outperforms competing alternatives, including semi-crowdsourced clustering,  semi-supervised clustering with constraints from majority voting,   and consensus clustering.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/chang17a.html
  PDF: http://proceedings.mlr.press/v54/chang17a/chang17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-chang17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yale
    family: Chang
  - given: Junxiang
    family: Chen
  - given: Michael
    family: Cho
  - given: Peter
    family: Castaldi
  - given: Ed
    family: Silverman
  - given: Jennifer
    family: Dy
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 28-36
  id: chang17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 28
  lastpage: 36
  published: 2017-04-10 00:00:00 +0000
- title: 'Online Nonnegative Matrix Factorization with General Divergences'
  abstract: 'We develop a unified and systematic framework for performing online nonnegative matrix factorization under a wide variety of important divergences. The online nature of our algorithms makes them particularly amenable to large-scale data.  We prove that the sequence of learned dictionaries converges almost surely to the set of critical points of the expected loss function. Experimental results demonstrate the computational efficiency and outstanding performances of our algorithms on several real-life applications, including topic modeling, document clustering and foreground-background separation.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/zhao17a.html
  PDF: http://proceedings.mlr.press/v54/zhao17a/zhao17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-zhao17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Renbo
    family: Zhao
  - given: Vincent
    family: Tan
  - given: Huan
    family: Xu
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 37-45
  id: zhao17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 37
  lastpage: 45
  published: 2017-04-10 00:00:00 +0000
- title: 'ASAGA: Asynchronous Parallel SAGA'
  abstract: 'We describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates. Through a novel perspective, we revisit and clarify a subtle but important technical issue present in a large fraction of the recent convergence rate proofs for asynchronous parallel optimization algorithms, and propose a simplification of the recently introduced “perturbed iterate” framework that resolves it. We thereby prove that ASAGA can obtain a theoretical linear speedup on multi-core systems even without sparsity assumptions. We present results of an implementation on a 40-core architecture illustrating the practical speedup as well as the hardware overhead.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/leblond17a.html
  PDF: http://proceedings.mlr.press/v54/leblond17a/leblond17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-leblond17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rémi
    family: Leblond
  - given: Fabian
    family: Pedregosa
  - given: Simon
    family: Lacoste-Julien
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 46-54
  id: leblond17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 46
  lastpage: 54
  published: 2017-04-10 00:00:00 +0000
- title: 'Lower Bounds on Active Learning for Graphical Model Selection'
  abstract: 'We consider the problem of estimating the underlying graph associated with a Markov random field, with the added twist that the decoding algorithm can iteratively choose which subsets of nodes to sample based on the previous samples, resulting in an active learning setting.  Considering both Ising and Gaussian models, we provide algorithm-independent lower bounds for high-probability recovery within the class of degree-bounded graphs.  Our main results are minimax lower bounds for the active setting that match the best known lower bounds for the passive setting, which in turn are known to be tight in several cases of interest.     Our analysis is based on Fano’s inequality, along with novel mutual information bounds for the active learning setting, and the application of restricted graph ensembles.  While we consider ensembles that are similar or identical to those used in the passive setting, we require different analysis techniques, with a key challenge being bounding a mutual information quantity associated with observed subsets of nodes, as opposed to full observations.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/scarlett17a.html
  PDF: http://proceedings.mlr.press/v54/scarlett17a/scarlett17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-scarlett17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jonathan
    family: Scarlett
  - given: Volkan
    family: Cevher
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 55-64
  id: scarlett17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 55
  lastpage: 64
  published: 2017-04-10 00:00:00 +0000
- title: 'Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach'
  abstract: 'We consider the non-square matrix sensing problem, under restricted isometry property (RIP) assumptions.  We focus on the non-convex formulation, where any rank-r matrix $X ∈R^m x n$ is represented as $UV^T$, where $U ∈R^m x r$ and $V ∈R^n x r$. In this paper, we complement recent findings on the non-convex geometry of the analogous PSD setting [5], and show that matrix factorization does not introduce any spurious local minima, under RIP. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/park17a.html
  PDF: http://proceedings.mlr.press/v54/park17a/park17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-park17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dohyung
    family: Park
  - given: Anastasios
    family: Kyrillidis
  - given: Constantine
    family: Carmanis
  - given: Sujay
    family: Sanghavi
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 65-74
  id: park17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 65
  lastpage: 74
  published: 2017-04-10 00:00:00 +0000
- title: 'Sparse Accelerated Exponential Weights'
  abstract: 'We consider the stochastic optimization problem where a convex function is minimized observing recursively the gradients.  We introduce SAEW, a new procedure that accelerates exponential weights procedures with the slow rate $1/\sqrtT$ to procedures achieving the fast rate $1/T$. Under the strong convexity of the risk, we achieve the optimal rate of convergence for approximating sparse parameters in $R^d$. The acceleration is achieved by using successive  averaging steps in an online fashion. The procedure also produces sparse estimators thanks to additional hard threshold steps. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/gaillard17a.html
  PDF: http://proceedings.mlr.press/v54/gaillard17a/gaillard17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-gaillard17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pierre
    family: Gaillard
  - given: Olivier
    family: Wintenberger
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 75-82
  id: gaillard17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 75
  lastpage: 82
  published: 2017-04-10 00:00:00 +0000
- title: 'On the Learnability of Fully-Connected Neural Networks'
  abstract: 'Despite the empirical success of deep neural networks, there is limited theoretical understanding on the learnability of these models using a polynomial-time algorithm. In this paper, we characterize the learnability of fully-connected neural networks via both positive and negative results. We focus on $\ell_1$-regularized networks, where the $\ell_1$-norm of the incoming weights of every neuron is assumed to be bounded by a constant $B > 0$. Our first result shows that such networks are properly learnable in $\text{poly}(n,d,\exp(1/ε^2))$ time, where $n$ and $d$ are the sample size and the input dimension, and $ε> 0$ is the gap to optimality. The bound is achieved by repeatedly sampling over a low-dimensional manifold so as to ensure approximate optimality, but avoids the $\exp(d)$ cost of exhaustively searching over the parameter space. We also establish a hardness result showing that the exponential dependence on $1/ε$ is unavoidable unless $\bf RP = \bf NP$.  Our second result shows that the exponential dependence on $1/ε$ can be avoided by exploiting the underlying structure of the data distribution. In particular, if the positive and negative examples can be separated with margin $γ> 0$ by an unknown neural network, then the network can be learned in $\text{poly}(n,d,1/ε)$ time. The bound is achieved by an ensemble method which uses the first algorithm as a weak learner. We further show that the separability assumption can be weakened to tolerate noisy labels. Finally, we show that the exponential dependence on $1/γ$ is unimprovable under a certain cryptographic assumption.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/zhang17a.html
  PDF: http://proceedings.mlr.press/v54/zhang17a/zhang17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-zhang17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuchen
    family: Zhang
  - given: Jason
    family: Lee
  - given: Martin
    family: Wainwright
  - given: Michael I.
    family: Jordan
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 83-91
  id: zhang17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 83
  lastpage: 91
  published: 2017-04-10 00:00:00 +0000
- title: 'An Information-Theoretic Route from Generalization in Expectation to Generalization in Probability'
  abstract: 'One fundamental goal in any learning algorithm is to mitigate its risk for overfitting. Mathematically, this requires that the learning algorithm enjoys a small generalization risk, which is defined either in expectation or in probability. Both types of generalization are commonly used in the literature. For instance, generalization in expectation has been used to analyze algorithms, such as ridge regression and SGD, whereas generalization in probability is used in the VC theory, among others. Recently, a third notion of generalization has been studied, called uniform generalization, which requires that the generalization risk vanishes uniformly in expectation across all bounded parametric losses. It has been shown that uniform generalization is, in fact, equivalent to an information-theoretic stability constraint, and that it recovers classical results in learning theory. It is achievable under various settings, such as sample compression schemes, finite hypothesis spaces, finite domains, and differential privacy. However, the relationship between uniform generalization and concentration remained unknown. In this paper, we answer this question by proving that, while a generalization in expectation does not imply a generalization in probability, a uniform generalization in expectation does imply concentration. We establish a chain rule for the uniform generalization risk of the composition of hypotheses and use it to derive a large deviation bound. Finally, we prove that the bound is tight.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/alabdulmohsin17a.html
  PDF: http://proceedings.mlr.press/v54/alabdulmohsin17a/alabdulmohsin17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-alabdulmohsin17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ibrahim
    family: Alabdulmohsin
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 92-100
  id: alabdulmohsin17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 92
  lastpage: 100
  published: 2017-04-10 00:00:00 +0000
- title: 'Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection'
  abstract: 'In the Best-k-Arm problem, we are given n stochastic bandit arms, each associated with an unknown reward distribution. We are required to identify the k arms with the largest means by taking as few samples as possible. In this paper, we make progress towards a complete characterization of the instance-wise sample complexity bounds for the Best-k-Arm problem. On the lower bound side, we obtain a novel complexity term to measure the sample complexity that every Best-k-Arm instance requires. This is derived by an interesting and nontrivial reduction from the Best-1-Arm problem. We also provide an elimination-based algorithm that matches the instance-wise lower bound within doubly-logarithmic factors. The sample complexity of our algorithm strictly dominates the state-of-the-art for Best-k-Arm (module constant factors).'
  volume: 54
  URL: https://proceedings.mlr.press/v54/chen17a.html
  PDF: http://proceedings.mlr.press/v54/chen17a/chen17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-chen17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lijie
    family: Chen
  - given: Jian
    family: Li
  - given: Mingda
    family: Qiao
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 101-110
  id: chen17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 101
  lastpage: 110
  published: 2017-04-10 00:00:00 +0000
- title: 'Guaranteed Non-convex Optimization: Submodular Maximization over Continuous Domains'
  abstract: 'Submodular continuous functions are a category of (generally) non-convex/non-concave functions with a wide spectrum of applications. We characterize these functions and demonstrate that they can be maximized efficiently with approximation guarantees. Specifically, i) We introduce  the weak DR property that gives a unified characterization of submodularity for all  set, integer-lattice and continuous functions; ii) for maximizing monotone DR-submodular continuous functions under general down-closed convex constraints, we propose a Frank-Wolfe variant with (1-1/e) approximation guarantee, and sub-linear convergence rate; iii) for maximizing general non-monotone submodular continuous functions subject to box constraints, we propose a DoubleGreedy algorithm with 1/3 approximation guarantee. Submodular continuous functions naturally find applications in various real-world settings, including influence and revenue maximization with continuous assignments, sensor energy management, facility location, etc.  Experimental results show that the proposed algorithms efficiently generate superior solutions  compared to baseline algorithms.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/bian17a.html
  PDF: http://proceedings.mlr.press/v54/bian17a/bian17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-bian17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Andrew An
    family: Bian
  - given: Baharan
    family: Mirzasoleiman
  - given: Joachim
    family: Buhmann
  - given: Andreas
    family: Krause
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 111-120
  id: bian17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 111
  lastpage: 120
  published: 2017-04-10 00:00:00 +0000
- title: 'Tensor-Dictionary Learning with Deep Kruskal-Factor Analysis'
  abstract: 'A multi-way factor analysis model is introduced for tensor-variate data of any order. Each data item is represented as a (sparse) sum of Kruskal decompositions, a Kruskal- factor analysis (KFA). KFA is nonparametric and can infer both the tensor-rank of each dictionary atom and the number of dictionary atoms. The model is adapted for online learning, which allows dictionary learning on large data sets. After KFA is introduced, the model is extended to a deep convolutional tensor-factor analysis, supervised by a Bayesian SVM. The experiments section demonstrates the improvement of KFA over vectorized approaches (e.g., BPFA), tensor decompositions, and convolutional neural networks (CNN) in multi-way denoising, blind inpainting, and image classification. The improvement in PSNR for the inpainting results over other methods exceeds 1dB in several cases and we achieve state of the art results on Caltech101 image classification. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/stevens17a.html
  PDF: http://proceedings.mlr.press/v54/stevens17a/stevens17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-stevens17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Andrew
    family: Stevens
  - given: Yunchen
    family: Pu
  - given: Yannan
    family: Sun
  - given: Gregory
    family: Spell
  - given: Lawrence
    family: Carin
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 121-129
  id: stevens17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 121
  lastpage: 129
  published: 2017-04-10 00:00:00 +0000
- title: 'Consistent and Efficient Nonparametric Different-Feature Selection'
  abstract: 'Two-sample feature selection is a ubiquitous problem in both scientific and engineering studies. We propose a feature selection method to find features that describe a difference in two probability distributions. The proposed method is nonparametric and does not assume any specific parametric models on data distributions. We show that the proposed method is computationally efficient and does not require any extra computation for model selection. Moreover, we prove that the proposed method provides a consistent estimator of features under mild conditions. Our experimental results show that the proposed method outperforms the current method with regard to both accuracy and computation time.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/hara17a.html
  PDF: http://proceedings.mlr.press/v54/hara17a/hara17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-hara17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Satoshi
    family: Hara
  - given: Takayuki
    family: Katsuki
  - given: Hiroki
    family: Yanagisawa
  - given: Takafumi
    family: Ono
  - given: Ryo
    family: Okamoto
  - given: Shigeki
    family: Takeuchi
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 130-138
  id: hara17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 130
  lastpage: 138
  published: 2017-04-10 00:00:00 +0000
- title: 'Annular Augmentation Sampling'
  abstract: 'The exponentially large sample space of general binary probabilistic models  renders intractable standard operations such as exact marginalization, inference, and normalization.  Typically, researchers deal with these distributions via deterministic approximations, the class of belief propagation methods being a prominent example.  Comparatively, Markov Chain Monte Carlo methods have been significantly less used in this domain.  In this work, we introduce an auxiliary variable MCMC scheme that samples from an annular augmented space, translating to a great circle path around the hypercube of the binary sample space.  This annular augmentation sampler explores the sample space more effectively than coordinate-wise samplers and has no tunable parameters, leading to substantial performance gains in estimating quantities of interest in large binary models.  We extend the method to incorporate into the sampler any existing mean-field approximation (such as from belief propagation), leading to further performance improvements.  Empirically, we consider a range of large Ising models and an application to risk factors for heart disease.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/fagan17a.html
  PDF: http://proceedings.mlr.press/v54/fagan17a/fagan17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-fagan17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Francois
    family: Fagan
  - given: Jalaj
    family: Bhandari
  - given: John
    family: Cunningham
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 139-147
  id: fagan17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 139
  lastpage: 147
  published: 2017-04-10 00:00:00 +0000
- title: 'Less than a Single Pass: Stochastically Controlled Stochastic Gradient'
  abstract: 'We develop and analyze a procedure for gradient-based optimization that we refer to as stochastically controlled stochastic gradient (SCSG). As a member of the SVRG family of algorithms, SCSG makes use of gradient estimates at two scales. Unlike most existing algorithms in this family, both the computation cost and the communication cost of SCSG do not necessarily scale linearly with the sample size n; indeed, these costs are independent of n when the target accuracy is small. An experimental evaluation of SCSG on the MNIST dataset shows that it can yield accurate results on this dataset on a single commodity machine with a memory footprint of only 2.6MB and only eight disk accesses.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/lei17a.html
  PDF: http://proceedings.mlr.press/v54/lei17a/lei17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-lei17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lihua
    family: Lei
  - given: Michael
    family: Jordan
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 148-156
  id: lei17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 148
  lastpage: 156
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning Time Series Detection Models from Temporally Imprecise Labels'
  abstract: 'In this paper, we consider a new low-quality label learning problem: learning time series detection models from temporally imprecise labels. In this problem, the data consist of a set of input time series, and supervision is provided by a sequence of noisy time stamps corresponding to the occurrence of positive class events. Such temporally imprecise labels commonly occur in areas like mobile health research where human annotators are tasked with labeling the occurrence of very short duration events. We propose a general learning framework for this problem that can accommodate different base classifiers and noise models. We present results on real mobile health data showing that the proposed framework significantly outperforms a number of alternatives including assuming that the label time stamps are noise-free, transforming the problem into the multiple instance learning framework, and learning on labels that were manually re-aligned.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/adams17a.html
  PDF: http://proceedings.mlr.press/v54/adams17a/adams17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-adams17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Roy
    family: Adams
  - given: Ben
    family: Marlin
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 157-165
  id: adams17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 157
  lastpage: 165
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning Cost-Effective and Interpretable Treatment Regimes'
  abstract: 'Decision makers, such as doctors and judges, make crucial decisions such as recommending treatments to patients, and granting bails to defendants on a daily basis. Such decisions typically involve weighting the potential benefits of taking an action against the costs involved.  In this work, we aim to automate this task of learning cost-effective, interpretable and actionable treatment regimes. We formulate this as a problem of learning a decision list – a sequence of if-then-else rules – which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments. We propose a novel objective to construct a decision list which maximizes outcomes for the population, and minimizes overall costs. Since we do not observe the outcomes corresponding to counterfactual scenarios, we use techniques from causal inference literature to infer them. We model the problem of learning the decision list as a Markov Decision Process (MDP) and employ a variant of the Upper Confidence Bound for Trees (UCT) strategy which leverages customized checks for pruning the search space effectively.  Experimental results on real world observational data capturing judicial bail decisions and treatment recommendations for asthma patients demonstrate the effectiveness of our approach.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/lakkaraju17a.html
  PDF: http://proceedings.mlr.press/v54/lakkaraju17a/lakkaraju17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-lakkaraju17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Himabindu
    family: Lakkaraju
  - given: Cynthia
    family: Rudin
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 166-175
  id: lakkaraju17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 166
  lastpage: 175
  published: 2017-04-10 00:00:00 +0000
- title: 'Linear Thompson Sampling Revisited'
  abstract: 'We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order $O(d^3/2\sqrtT)$ as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and how selecting optimal arms associated to \textitoptimistic parameters does control it. Thus we show that TS can be seen as a generic randomized algorithm where the sampling distribution is designed to have a fixed probability of being optimistic, at the cost of an additional $\sqrtd$ regret factor compared to a UCB-like approach. Furthermore, we show that our proof can be readily applied to regularized linear optimization and generalized linear model problems.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/abeille17a.html
  PDF: http://proceedings.mlr.press/v54/abeille17a/abeille17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-abeille17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Marc
    family: Abeille
  - given: Alessandro
    family: Lazaric
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 176-184
  id: abeille17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 176
  lastpage: 184
  published: 2017-04-10 00:00:00 +0000
- title: 'A Sub-Quadratic Exact Medoid Algorithm'
  abstract: 'We present a new algorithm, ‘trimed’ for obtaining the medoid of a set, that is the element of the set which minimises the mean distance to all other elements. The algorithm is shown to have, under certain assumptions, expected run time $O(N^(3/2))$ in $R^d$ where N is the set size, making it the first sub-quadratic exact medoid algorithm for $d > 1$. Experiments show that it performs very well on spatial network data, frequently requiring two orders of magnitude fewer distance calculations than state-of-the-art approximate algorithms. As an application, we show how trimed can be used as a component in an accelerated K-medoids algorithm, and then how it can be relaxed to obtain further computational gains with only a minor loss in cluster quality.  '
  volume: 54
  URL: https://proceedings.mlr.press/v54/newling17a.html
  PDF: http://proceedings.mlr.press/v54/newling17a/newling17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-newling17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: James
    family: Newling
  - given: Francois
    family: Fleuret
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 185-193
  id: newling17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 185
  lastpage: 193
  published: 2017-04-10 00:00:00 +0000
- title: 'Minimax Density Estimation for Growing Dimension'
  abstract: 'This paper presents minimax rates for density estimation when the data dimension $d$ is allowed to grow with the number of observations $n$ rather than remaining fixed as in previous analyses. We prove a non-asymptotic lower bound which gives the worst-case rate over standard classes of smooth densities, and we show that kernel density estimators achieve this rate. We also give oracle choices for the bandwidth and derive the fastest rate $d$ can grow with $n$ to maintain estimation consistency.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/mcdonald17a.html
  PDF: http://proceedings.mlr.press/v54/mcdonald17a/mcdonald17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-mcdonald17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Daniel
    family: McDonald
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 194-203
  id: mcdonald17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 194
  lastpage: 203
  published: 2017-04-10 00:00:00 +0000
- title: 'Estimating Density Ridges by Direct Estimation of Density-Derivative-Ratios'
  abstract: 'Estimation of \emphdensity ridges has been gathering a great deal of attention since it enables us to reveal lower-dimensional structures hidden in data. Recently, \emphsubspace constrained mean shift (SCMS) was proposed as a practical algorithm for density ridge estimation. A key technical ingredient in SCMS is to accurately estimate the ratios of the density derivatives to the density. SCMS takes a three-step approach for this purpose — first estimating the data density, then computing its derivatives, and finally taking their ratios. However, this three-step approach can be unreliable because a good density estimator does not necessarily mean a good density derivative estimator and division by an estimated density could significantly magnify the estimation error. To overcome these problems, we propose a novel method that directly estimates the ratios without going through density estimation and division. Our proposed estimator has an analytic-form solution and it can be computed efficiently. We further establish a non-parametric convergence bound for the proposed ratio estimator. Finally, based on this direct ratio estimator, we develop a practical algorithm for density ridge estimation and experimentally demonstrate its usefulness on a variety of datasets.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/sasaki17a.html
  PDF: http://proceedings.mlr.press/v54/sasaki17a/sasaki17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-sasaki17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hiroaki
    family: Sasaki
  - given: Takafumi
    family: Kanamori
  - given: Masashi
    family: Sugiyama
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 204-212
  id: sasaki17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 204
  lastpage: 212
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning Theory for Conditional Risk Minimization'
  abstract: 'In this work we study the learnability of stochastic processes with respect to the conditional risk, i.e. the existence of a learning algorithm that improves its next-step performance with the amount of observed data.  We introduce a notion of pairwise discrepancy between conditional distributions at different times steps and show how certain properties of these discrepancies can be used to construct a successful learning algorithm. Our main results are two theorems that establish criteria for learnability for many classes of stochastic processes, including all special cases studied previously in the literature. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/zimin17a.html
  PDF: http://proceedings.mlr.press/v54/zimin17a/zimin17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-zimin17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Zimin
  - given: Christoph
    family: Lampert
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 213-222
  id: zimin17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 213
  lastpage: 222
  published: 2017-04-10 00:00:00 +0000
- title: 'Near-optimal Bayesian Active Learning with Correlated and Noisy Tests'
  abstract: '  We consider the Bayesian active learning and experimental design problem, where the goal is to learn the value of some unknown target variable through a sequence of informative, noisy tests. In contrast to prior work, we focus on the challenging, yet practically relevant setting where test outcomes can be conditionally dependent given the hidden target variable. Under such assumptions, common heuristics, such as greedily performing tests that maximize the reduction in uncertainty of the target, often perform poorly.  We propose ECED, a novel, efficient active learning algorithm, and prove strong theoretical guarantees that hold with correlated, noisy tests. Rather than directly optimizing the prediction error, at each step, ECED picks the test that maximizes the gain in a surrogate objective, which takes into account the dependencies between tests. Our analysis relies on an information-theoretic auxiliary function to track the progress of ECED, and utilizes adaptive submodularity to attain the approximation bound. We demonstrate strong empirical performance of ECED on two problem instances, including a Bayesian experimental design task intended to distinguish among economic theories of how people make risky decisions, and an active preference learning task via pairwise comparisons.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/chen17b.html
  PDF: http://proceedings.mlr.press/v54/chen17b/chen17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-chen17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuxin
    family: Chen
  - given: Hamed
    family: Hassani
  - given: Andreas
    family: Krause
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 223-231
  id: chen17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 223
  lastpage: 231
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning Nash Equilibrium for General-Sum Markov Games from Batch Data'
  abstract: 'This paper addresses the problem of learning a Nash equilibrium in $γ$-discounted multiplayer general-sum Markov Games (MGs) in a batch setting. As the number of players increases in MG, the agents may either collaborate or team apart to increase their final rewards. One solution to address this problem is to look for a Nash equilibrium. Although, several techniques were found for the subcase of two-player zero-sum MGs, those techniques fail to find a Nash equilibrium in general-sum Markov Games.  In this paper, we introduce a new definition of $ε$-Nash equilibrium in MGs which grasps the strategy’s quality for multiplayer games. We prove that minimizing the norm of two Bellman-like residuals implies to learn such an $ε$-Nash equilibrium. Then, we show that minimizing an empirical estimate of the $L_p$ norm of these Bellman-like residuals allows learning for general-sum games within the batch setting. Finally, we introduce a neural network architecture that successfully learns a Nash equilibrium in generic multiplayer general-sum turn-based MGs.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/perolat17a.html
  PDF: http://proceedings.mlr.press/v54/perolat17a/perolat17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-perolat17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Julien
    family: Perolat
  - given: Florian
    family: Strub
  - given: Bilal
    family: Piot
  - given: Olivier
    family: Pietquin
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 232-241
  id: perolat17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 232
  lastpage: 241
  published: 2017-04-10 00:00:00 +0000
- title: 'Distance Covariance Analysis'
  abstract: 'We propose a dimensionality reduction method to identify linear projections that capture interactions between two or more sets of variables.   The method, distance covariance analysis (DCA), can detect both linear and nonlinear relationships, and can take dependent variables into account.  On previous testbeds and a new testbed that systematically assesses the ability to detect both linear and nonlinear interactions, DCA performs better than or comparable to existing methods, while being one of the fastest methods. To showcase the versatility of DCA, we also applied it to three different neurophysiological datasets.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/cowley17a.html
  PDF: http://proceedings.mlr.press/v54/cowley17a/cowley17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-cowley17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Benjamin
    family: Cowley
  - given: Joao
    family: Semedo
  - given: Amin
    family: Zandvakili
  - given: Matthew
    family: Smith
  - given: Adam
    family: Kohn
  - given: Byron
    family: Yu
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 242-251
  id: cowley17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 242
  lastpage: 251
  published: 2017-04-10 00:00:00 +0000
- title: 'Phase Retrieval Meets Statistical Learning Theory: A Flexible Convex Relaxation'
  abstract: 'We propose a flexible convex relaxation for the phase retrieval problem that operates in the natural domain of the signal. Therefore, we avoid the prohibitive computational cost associated with “lifting” and semidefinite programming (SDP) in methods such as PhaseLift and compete with recently developed non-convex techniques for phase retrieval. We relax the quadratic equations for phaseless measurements to inequality constraints each of which representing a symmetric “slab”. Through a simple convex program, our proposed estimator finds an extreme point of the intersection of these slabs that is best aligned with a given anchor vector. We characterize geometric conditions that certify success of the proposed estimator. Furthermore, using classic results in statistical learning theory, we show that for random measurements the geometric certificates hold with high probability at an optimal sample complexity. Phase transition of our estimator is evaluated through simulations. Our numerical experiments also suggest that the proposed method can solve phase retrieval problems with coded diffraction measurements as well.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/bahmani17a.html
  PDF: http://proceedings.mlr.press/v54/bahmani17a/bahmani17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-bahmani17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sohail
    family: Bahmani
  - given: Justin
    family: Romberg
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 252-260
  id: bahmani17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 252
  lastpage: 260
  published: 2017-04-10 00:00:00 +0000
- title: 'Regret Bounds for Lifelong Learning'
  abstract: 'We consider the problem of transfer learning in an online setting. Different tasks are presented sequentially  and processed by a within-task algorithm. We propose a lifelong learning strategy which refines the underlying  data representation used by the within-task algorithm, thereby transferring information from one task to the next.  We show that when the within-task algorithm comes with some regret bound, our strategy inherits this good property.  Our bounds are in expectation for a general loss function, and uniform for a convex loss. We discuss applications to dictionary learning and finite set of predictors. In the latter case, we improve previous $O(1/\sqrt{m})$ bounds  to $O(1/m)$, where $m$ is the per task sample size.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/alquier17a.html
  PDF: http://proceedings.mlr.press/v54/alquier17a/alquier17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-alquier17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pierre
    family: Alquier
  - given: The Tien
    family: Mai
  - given: Massimiliano
    family: Pontil
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 261-269
  id: alquier17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 261
  lastpage: 269
  published: 2017-04-10 00:00:00 +0000
- title: 'Poisson intensity estimation with reproducing kernels'
  abstract: 'Despite the fundamental nature of the Poisson process in the theory and application of stochastic processes, and its attractive generalizations (e.g. Cox process), few tractable nonparametric modeling approaches exist, especially in high dimensional settings. In this paper we develop a new, computationally tractable Reproducing Kernel Hilbert Space (RKHS) formulation for the inhomogeneous Poisson process. We model the square root of the intensity as an RKHS function. The modeling challenge is that the usual representer theorem arguments no longer apply due to the form of the inhomogeneous Poisson process likelihood.  However, we prove that the representer theorem does hold in an appropriately transformed RKHS, guaranteeing that the optimization of the penalized likelihood can be cast as a tractable finite-dimensional problem.  The resulting approach is simple to implement, and readily scales to high dimensions and large-scale datasets. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/flaxman17a.html
  PDF: http://proceedings.mlr.press/v54/flaxman17a/flaxman17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-flaxman17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Seth
    family: Flaxman
  - given: Yee Whye
    family: Teh
  - given: Dino
    family: Sejdinovic
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 270-279
  id: flaxman17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 270
  lastpage: 279
  published: 2017-04-10 00:00:00 +0000
- title: 'Generalized Pseudolikelihood Methods for Inverse Covariance Estimation'
  abstract: 'We introduce PseudoNet, a new pseudolikelihood-based estimator of the inverse covariance matrix, that has a number of useful statistical and computational properties.  We show, through detailed experiments with synthetic and also real-world finance as well as wind power data, that PseudoNet outperforms related methods in terms of estimation error and support recovery, making it well-suited for use in a downstream application, where obtaining low estimation error can be important.  We also show, under regularity conditions, that PseudoNet is consistent.  Our proof assumes the existence of accurate estimates of the diagonal entries of the underlying inverse covariance matrix; we additionally provide a two-step method to obtain these estimates, even in a high-dimensional setting, going beyond the proofs for related methods.  Unlike other pseudolikelihood-based methods, we also show that PseudoNet does not saturate, i.e., in high dimensions, there is no hard limit on the number of nonzero entries in the PseudoNet estimate.  We present a fast algorithm as well as screening rules that make computing the PseudoNet estimate over a range of tuning parameters tractable.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/ali17a.html
  PDF: http://proceedings.mlr.press/v54/ali17a/ali17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-ali17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alnur
    family: Ali
  - given: Kshitij
    family: Khare
  - given: Sang-Yun
    family: Oh
  - given: Bala
    family: Rajaratnam
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 280-288
  id: ali17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 280
  lastpage: 288
  published: 2017-04-10 00:00:00 +0000
- title: 'Removing Phase Transitions from Gibbs Measures'
  abstract: 'Gibbs measures are a fundamental class of distributions for the analysis of high dimensional data. Phase transitions, which are also known as degeneracy in the network science literature, are an emergent property of these models that well describe many physical systems. However, the reach of the Gibbs measure is now far outside the realm of physical systems, and in many of these domains multiphase behavior is a nuisance. This nuisance often makes distribution fitting impossible due to failure of the MCMC sampler, and even when an MLE fit is possible, if the solution is near a phase transition point, the plausibility of the fit can be highly questionable. We introduce a modification to the Gibbs distribution that reduces the effects of phase transitions, and with properly chosen hyper-parameters, provably removes all multiphase behavior. We show that this new distribution is just as easy to fit via MCMCMLE as the Gibbs measure, and provide examples in the Ising model from statistical physics and ERGMs from network science.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/fellows17a.html
  PDF: http://proceedings.mlr.press/v54/fellows17a/fellows17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-fellows17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ian
    family: Fellows
  - given: Mark
    family: Handcock
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 289-297
  id: fellows17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 289
  lastpage: 297
  published: 2017-04-10 00:00:00 +0000
- title: 'Performance Bounds for Graphical Record Linkage'
  abstract: 'Record linkage involves merging records in large, noisy databases to remove duplicate entities.  It has become an important area because of its widespread occurrence in bibliometrics, public health, official statistics production, political science, and beyond. Traditional linkage methods  directly linking records to one another are computationally infeasible as the number of records grows. As a result, it is increasingly common for researchers to treat linkage as a clustering task, in which each latent entity is associated with one or more noisy database records. We critically assess performance bounds  using the  Kullback-Leibler (KL) divergence under a Bayesian record linkage framework, making connections to Kolchin partition models. We provide an upper bound using the KL divergence and a lower bound on the minimum probability of misclassifying a latent entity. We give insights for when our bounds hold using simulated data and provide practical user guidance. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/steorts17a.html
  PDF: http://proceedings.mlr.press/v54/steorts17a/steorts17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-steorts17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rebecca C.
    family: Steorts
  - given: Mattew
    family: Barnes
  - given: Willie
    family: Neiswanger
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 298-306
  id: steorts17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 298
  lastpage: 306
  published: 2017-04-10 00:00:00 +0000
- title: 'Regret Bounds for Transfer Learning in Bayesian Optimisation'
  abstract: 'This paper studies the regret bound of two transfer learning algorithms in Bayesian optimisation. The first algorithm models any difference between the source and target functions as a noise process. The second algorithm proposes a new way to model the difference between the source and target as a Gaussian process which is then used to adapt the source data. We show that in both cases the regret bounds are tighter than in the no transfer case. We also experimentally compare the performance of these algorithms relative to no transfer learning and demonstrate benefits of transfer learning.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/shilton17a.html
  PDF: http://proceedings.mlr.press/v54/shilton17a/shilton17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-shilton17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alistair
    family: Shilton
  - given: Sunil
    family: Gupta
  - given: Santu
    family: Rana
  - given: Svetha
    family: Venkatesh
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 307-315
  id: shilton17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 307
  lastpage: 315
  published: 2017-04-10 00:00:00 +0000
- title: 'Scaling Submodular Maximization via Pruned Submodularity Graphs'
  abstract: 'We propose a new random pruning method (called “submodular sparsification (SS)”) to reduce the cost of submodular maximization. The pruning is applied via a “submodularity graph” over the $n$ ground elements, where each directed edge is associated with a pairwise dependency defined by the submodular function. In each step, SS prunes a $1-1/\sqrt{c}$ (for $c>1$) fraction of the nodes using weights on edges computed based on only a small number ($O(\log n)$) of randomly sampled nodes. The algorithm requires $\log_\sqrt{c}n$ steps with a small and highly parallelizable per-step computation. An accuracy-speed tradeoff parameter $c$, set as $c = 8$, leads to a fast shrink rate $\sqrt{2}/4$ and small iteration complexity $\log_{2\sqrt{2}}n$. Analysis shows that w.h.p., the greedy algorithm on the pruned set of size $O(\log^2 n)$ can achieve a guarantee similar to that of processing the original dataset. In news and video summarization tasks, SS is able to substantially reduce both computational costs and memory usage, while maintaining (or even slightly exceeding) the quality of the original (and much more costly) greedy algorithm.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/zhou17a.html
  PDF: http://proceedings.mlr.press/v54/zhou17a/zhou17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-zhou17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tianyi
    family: Zhou
  - given: Hua
    family: Ouyang
  - given: Jeff
    family: Bilmes
  - given: Yi
    family: Chang
  - given: Carlos
    family: Guestrin
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 316-324
  id: zhou17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 316
  lastpage: 324
  published: 2017-04-10 00:00:00 +0000
- title: 'Localized Lasso for High-Dimensional Regression'
  abstract: 'We introduce the localized Lasso, which learns models that both are interpretable and have a high predictive power in problems with high dimensionality d and small sample size n.  More specifically, we consider a function defined by local sparse models, one at each data point. We introduce sample-wise network regularization to borrow strength across the models, and sample-wise exclusive group sparsity (a.k.a., l12 norm) to introduce diversity into the choice of feature sets in the local models. The local models are interpretable in terms of similarity of their sparsity patterns. The cost function is convex, and thus has a globally optimal solution. Moreover, we propose a simple yet efficient iterative least-squares based optimization procedure for the localized Lasso, which does not need a tuning parameter, and is guaranteed to converge to a globally optimal solution. The solution is empirically shown to outperform alternatives for both simulated and genomic personalized/precision medicine data.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/yamada17a.html
  PDF: http://proceedings.mlr.press/v54/yamada17a/yamada17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-yamada17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Makoto
    family: Yamada
  - given: Takeuchi
    family: Koh
  - given: Tomoharu
    family: Iwata
  - given: John
    family: Shawe-Taylor
  - given: Samuel
    family: Kaski
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 325-333
  id: yamada17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 325
  lastpage: 333
  published: 2017-04-10 00:00:00 +0000
- title: 'Encrypted Accelerated Least Squares Regression'
  abstract: 'Information that is stored in an encrypted format is, by definition, usually not amenable to statistical analysis or machine learning methods. In this paper we present detailed analysis of coordinate and accelerated gradient descent algorithms which are capable of fitting least squares and penalised ridge regression models, using data encrypted under a fully homomorphic encryption scheme.  Gradient descent is shown to dominate in terms of encrypted computational speed, and theoretical results are proven to give parameter bounds which ensure correctness of decryption.  The characteristics of encrypted computation are empirically shown to favour a non-standard acceleration technique.  This demonstrates the possibility of approximating conventional statistical regression methods using encrypted data without compromising privacy.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/esperanca17a.html
  PDF: http://proceedings.mlr.press/v54/esperanca17a/esperanca17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-esperanca17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pedro
    family: Esperanca
  - given: Louis
    family: Aslett
  - given: Chris
    family: Holmes
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 334-343
  id: esperanca17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 334
  lastpage: 343
  published: 2017-04-10 00:00:00 +0000
- title: 'Random Consensus Robust PCA'
  abstract: 'This paper presents R2PCA, a random consensus method for robust principal component analysis. R2PCA takes RANSAC’s principle of using as little data as possible one step further. It iteratively selects small subsets of the data to identify pieces of the principal components, to then stitch them together. We show that if the principal components are in general position and the errors are sufficiently sparse, R2PCA will exactly recover the principal components with probability 1, in lieu of assumptions on coherence or the distribution of the sparse errors, and even under adversarial settings. R2PCA enjoys many advantages: it works well under noise, its computational complexity scales linearly in the ambient dimension, it is easily parallelizable, and due to its low sample complexity, it can be used in settings where data is so large it cannot even be stored in memory. We complement our theoretical findings with synthetic and real data experiments showing that r2pca outperforms state-of-the-art methods in a broad range of settings.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/pimentel-alarcon17a.html
  PDF: http://proceedings.mlr.press/v54/pimentel-alarcon17a/pimentel-alarcon17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-pimentel-alarcon17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Daniel
    family: Pimentel-Alarcon
  - given: Robert
    family: Nowak
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 344-352
  id: pimentel-alarcon17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 344
  lastpage: 352
  published: 2017-04-10 00:00:00 +0000
- title: 'Gray-box Inference for Structured Gaussian Process Models'
  abstract: 'We develop an automated variational inference method for  Bayesian structured prediction problems with Gaussian process (GP) priors and linear-chain likelihoods.  Our approach does not  need to know the details of the structured likelihood model and can scale up to a large number of observations.  Furthermore, we show that the required expected likelihood term and its gradients in the variational objective (ELBO) can be estimated efficiently by using expectations over very low-dimensional Gaussian distributions. Optimization of the ELBO is fully parallelizable over sequences and amenable to stochastic optimization, which we use along with control variate techniques to make our framework useful in practice.  Results on a set of natural language processing tasks show that our method can be as good as (and sometimes better than, in particular with respect to expected log-likelihood)  hard-coded approaches including SVM-struct and CRF, and overcomes the scalability limitations of previous inference algorithms based on sampling.  Overall, this is a fundamental step to developing automated inference methods for Bayesian structured prediction.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/galliani17a.html
  PDF: http://proceedings.mlr.press/v54/galliani17a/galliani17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-galliani17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pietro
    family: Galliani
  - given: Amir
    family: Dezfouli
  - given: Edwin
    family: Bonilla
  - given: Novi
    family: Quadrianto
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 353-361
  id: galliani17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 353
  lastpage: 361
  published: 2017-04-10 00:00:00 +0000
- title: 'Frank-Wolfe Algorithms for Saddle Point Problems'
  abstract: 'We extend the Frank-Wolfe (FW) optimization algorithm to solve constrained smooth convex-concave saddle point (SP) problems. Remarkably, the method only requires access to linear minimization oracles. Leveraging recent advances in FW optimization, we provide the first proof of convergence of a FW-type saddle point solver over polytopes, thereby partially answering a 30 year-old conjecture. We also survey other convergence results and highlight gaps in the theoretical underpinnings of FW-style algorithms. Motivating applications without known efficient alternatives are explored through structured prediction with combinatorial penalties as well as games over matching polytopes involving an exponential number of constraints.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/gidel17a.html
  PDF: http://proceedings.mlr.press/v54/gidel17a/gidel17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-gidel17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Gauthier
    family: Gidel
  - given: Tony
    family: Jebara
  - given: Simon
    family: Lacoste-Julien
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 362-371
  id: gidel17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 362
  lastpage: 371
  published: 2017-04-10 00:00:00 +0000
- title: 'A Framework for Optimal Matching for Causal Inference'
  abstract: 'We propose a novel framework for matching estimators for causal effect from observational data that is based on minimizing the dual norm of estimation error when expressed as an operator. We show that many popular matching estimators can be expressed as optimal in this framework, including nearest-neighbor matching, coarsened exact matching, and mean-matched sampling. This reveals their motivation and aptness as structural priors formulated by embedding the effect in a particular functional space. This also gives rise to a range of new, kernel-based matching estimators that arise when one embeds the effect in a reproducing kernel Hilbert space. Depending on the case, these estimators can be found using either quadratic optimization or integer optimization. We show that estimators based on universal kernels are universally consistent without model specification. In empirical results using both synthetic and real data, the new, kernel-based estimators outperform all standard causal estimators in estimation error.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/kallus17a.html
  PDF: http://proceedings.mlr.press/v54/kallus17a/kallus17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-kallus17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nathan
    family: Kallus
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 372-381
  id: kallus17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 372
  lastpage: 381
  published: 2017-04-10 00:00:00 +0000
- title: 'Quantifying the accuracy of approximate diffusions and Markov chains'
  abstract: 'Markov chains and diffusion processes are indispensable tools in machine learning and statistics that are used for inference, sampling, and modeling. With the growth of large-scale datasets, the computational cost associated with simulating these stochastic processes can be considerable, and many algorithms have been proposed to approximate the underlying Markov chain or diffusion. A fundamental question is how the computational savings trade off against the statistical error incurred due to approximations. This paper develops general results that address this question. We bound the Wasserstein distance between the equilibrium distributions of two diffusions as a function of their mixing rates and the deviation in their drifts. We show that this error bound is tight in simple Gaussian settings. Our general result on continuous diffusions can be discretized to provide insights into the computational–statistical trade-off of Markov chains. As an illustration, we apply our framework to derive finite-sample error bounds of approximate unadjusted Langevin dynamics. We characterize computation-constrained settings where, by using fast-to-compute approximate gradients in the Langevin dynamics, we obtain more accurate samples compared to using the exact gradients. Finally, as an additional application of our approach, we quantify the accuracy of approximate zig-zag sampling. Our theoretical analyses are supported by simulation experiments. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/huggins17a.html
  PDF: http://proceedings.mlr.press/v54/huggins17a/huggins17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-huggins17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jonathan
    family: Huggins
  - given: James
    family: Zou
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 382-391
  id: huggins17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 382
  lastpage: 391
  published: 2017-04-10 00:00:00 +0000
- title: 'Stochastic Rank-1 Bandits'
  abstract: 'We propose stochastic rank-1 bandits, a class of online learning problems where at each step a learning agent chooses a pair of row and column arms, and receives the product of their values as a reward. The main challenge of the problem is that the individual values of the row and column are unobserved. We assume that these values are stochastic and drawn independently. We propose a computationally-efficient algorithm for solving our problem, which we call Rank1Elim. We derive a O((K + L) (1 / Delta) log n) upper bound on its n-step regret, where K is the number of rows, L is the number of columns, and Delta is the minimum of the row and column gaps; under the assumption that the mean row and column rewards are bounded away from zero. To the best of our knowledge, we present the first bandit algorithm that finds the maximum entry of a rank-1 matrix whose regret is linear in K + L, 1 / Delta, and log n. We also derive a nearly matching lower bound. Finally, we evaluate Rank1Elim empirically on multiple problems. We observe that it leverages the structure of our problems and can learn near-optimal solutions even if our modeling assumptions are mildly violated.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/katariya17a.html
  PDF: http://proceedings.mlr.press/v54/katariya17a/katariya17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-katariya17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sumeet
    family: Katariya
  - given: Branislav
    family: Kveton
  - given: Csaba
    family: Szepesvari
  - given: Claire
    family: Vernade
  - given: Zheng
    family: Wen
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 392-401
  id: katariya17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 392
  lastpage: 401
  published: 2017-04-10 00:00:00 +0000
- title: 'On the Troll-Trust Model for Edge Sign Prediction in Social Networks'
  abstract: 'In the problem of edge sign prediction, we are given a directed graph (representing a social network), and our task is to predict the binary labels of the edges (i.e., the positive or negative nature of the social relationships). Many successful heuristics for this problem are based on the troll-trust features, estimating at each node the fraction of outgoing and incoming positive/negative edges. We show that these heuristics can be understood, and rigorously analyzed, as approximators to the Bayes optimal classifier for a simple probabilistic model of the edge labels. We then show that the maximum likelihood estimator for this model approximately corresponds to the predictions of a Label Propagation algorithm run on a transformed version of the original social graph. Extensive experiments on a number of real-world datasets show that this algorithm is competitive against state-of-the-art classifiers in terms of both accuracy and scalability. Finally, we show that troll-trust features can also be used to derive online learning algorithms which have theoretical guarantees even when edges are adversarially labeled.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/falher17a.html
  PDF: http://proceedings.mlr.press/v54/falher17a/falher17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-falher17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Géraud Le
    family: Falher
  - given: Nicolo
    family: Cesa-Bianchi
  - given: Claudio
    family: Gentile
  - given: Fabio
    family: Vitale
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 402-411
  id: falher17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 402
  lastpage: 411
  published: 2017-04-10 00:00:00 +0000
- title: 'Online Optimization of Smoothed Piecewise Constant Functions'
  abstract: 'We study online optimization of smoothed piecewise constant functions over the domain [0, 1). This is motivated by the problem of adaptively picking parameters of learning algorithms as in the recently introduced framework by Gupta and Roughgarden (2016). Majority of the machine learning literature has focused on Lipschitz-continuous functions or functions with bounded gradients. This is with good reason-any learning algorithm suffers linear regret even against piecewise constant functions that are chosen adversarially, arguably the simplest of non-Lipschitz continuous functions. The smoothed setting we consider is inspired by the seminal work of Spielman and Teng (2004) and the recent work of Gupta and Roughgarden (2016)-in this setting, the sequence of functions may be chosen by an adversary, however, with some uncertainty in the location of discontinuities. We give algorithms that achieve sublinear regret in the full information and bandit settings.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/cohen-addad17a.html
  PDF: http://proceedings.mlr.press/v54/cohen-addad17a/cohen-addad17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-cohen-addad17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vincent
    family: Cohen-Addad
  - given: Varun
    family: Kanade
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 412-420
  id: cohen-addad17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 412
  lastpage: 420
  published: 2017-04-10 00:00:00 +0000
- title: 'Combinatorial Topic Models using Small-Variance Asymptotics'
  abstract: 'Modern topic models typically have a probabilistic formulation, and derive their inference algorithms based on Latent Dirichlet Allocation (LDA) and its variants. In contrast, we approach topic modeling via combinatorial optimization, and take a small-variance limit of LDA to derive a new objective function. We minimize this objective by using ideas from combinatorial optimization, obtaining a new, fast, and high-quality topic modeling algorithm.  In particular, we show that our results are not only significantly better than traditional SVA algorithms, but also truly competitive with popular LDA-based approaches; we also discuss the (dis)similarities between our approach and its probabilistic counterparts.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/jiang17a.html
  PDF: http://proceedings.mlr.press/v54/jiang17a/jiang17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-jiang17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ke
    family: Jiang
  - given: Suvrit
    family: Sra
  - given: Brian
    family: Kulis
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 421-429
  id: jiang17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 421
  lastpage: 429
  published: 2017-04-10 00:00:00 +0000
- title: 'ConvNets with Smooth Adaptive Activation Functions for Regression'
  abstract: 'Within Neural Networks (NN), the parameters of Adaptive Activation Functions (AAF) control the shapes of activation functions. These parameters are trained along with other parameters in the NN. AAFs have improved performance of Convolutional Neural Networks (CNN) in multiple classification tasks. In this paper, we propose and apply AAFs on CNNs for regression tasks. We argue that applying AAFs in the regression (second-to-last) layer of a NN can significantly decrease the bias of the regression NN. However, using existing AAFs may lead to overfitting. To address this problem, we propose a Smooth Adaptive Activation Function (SAAF) with a piecewise polynomial form which can approximate any continuous function to arbitrary degree of error, while having a bounded Lipschitz constant for given bounded model parameters. As a result, NNs with SAAF can avoid overfitting by simply regularizing model parameters. We empirically evaluated CNNs with SAAFs and achieved state-of-the-art results on age and pose estimation datasets.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/hou17a.html
  PDF: http://proceedings.mlr.press/v54/hou17a/hou17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-hou17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Le
    family: Hou
  - given: Dimitris
    family: Samaras
  - given: Tahsin
    family: Kurc
  - given: Yi
    family: Gao
  - given: Joel
    family: Saltz
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 430-439
  id: hou17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 430
  lastpage: 439
  published: 2017-04-10 00:00:00 +0000
- title: 'Rapid Mixing Swendsen-Wang Sampler for Stochastic Partitioned Attractive Models'
  abstract: 'The Gibbs sampler is the most popular Markov chain used for learning and inference problems in Graphical Models (GM). These tasks are computationally intractable in general, and the Gibbs sampler often suffers from slow  mixing. In this paper, we study the Swendsen-Wang  dynamics which is a more sophisticated Markov chain designed to overcome bottlenecks that impede Gibbs sampler. We prove O(log n) mixing time for attractive binary pairwise GMs (i.e., ferromagnetic Ising models) on stochastic partitioned graphs  having n vertices, under some mild conditions including low temperature regions where the Gibbs sampler provably mixes exponentially slow. Our experiments also confirm that the Swendsen-Wang sampler  significantly outperforms the Gibbs sampler for learning parameters of attractive GMs.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/park17b.html
  PDF: http://proceedings.mlr.press/v54/park17b/park17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-park17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sejun
    family: Park
  - given: Yunhun
    family: Jang
  - given: Andreas
    family: Galanis
  - given: Jinwoo
    family: Shin
  - given: Daniel
    family: Stefankovic
  - given: Eric
    family: Vigoda
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 440-449
  id: park17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 440
  lastpage: 449
  published: 2017-04-10 00:00:00 +0000
- title: 'Efficient Rank Aggregation via Lehmer Codes'
  abstract: 'We propose a novel rank aggregation method based on converting permutations into their  corresponding Lehmer codes or other subdiagonal images. Lehmer codes, also known as inversion vectors, are vector representations of permutations in which each coordinate can take values not restricted by the values of other coordinates. This transformation allows for decoupling of the coordinates and for performing aggregation via simple scalar median or mode computations. We present simulation results illustrating the performance of this completely parallelizable approach and analytically prove that both the mode and median aggregation procedure recover the correct centroid aggregate with small sample complexity when the permutations are drawn according to the well-known Mallows models. The proposed Lehmer code approach may also be used on partial rankings, with similar performance guarantees.  '
  volume: 54
  URL: https://proceedings.mlr.press/v54/li17b.html
  PDF: http://proceedings.mlr.press/v54/li17b/li17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-li17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pan
    family: Li
  - given: Arya
    family: Mazumdar
  - given: Olgica
    family: Milenkovic
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 450-459
  id: li17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 450
  lastpage: 459
  published: 2017-04-10 00:00:00 +0000
- title: 'Nonlinear ICA of Temporally Dependent Stationary Sources'
  abstract: 'We develop a nonlinear generalization of independent component analysis (ICA) or blind source separation, based on temporal dependencies (e.g. autocorrelations). We introduce a nonlinear generative model where the independent sources are assumed to be temporally dependent, non-Gaussian, and stationary, and we observe arbitrarily nonlinear mixtures of them. We develop a method for estimating the model (i.e. separating the sources) based on logistic regression in a neural network which learns to discriminate between a short temporal window of the data vs. a temporal window of temporally permuted data. We prove that the method estimates the sources for general smooth mixing nonlinearities, assuming the sources have sufficiently strong temporal dependencies, and these dependencies are in a certain way different from dependencies found in Gaussian processes. For Gaussian (and similar) sources, the method estimates the nonlinear part of the mixing. We thus provide the first rigorous and general proof of identifiability of nonlinear ICA for temporally dependent sources, together with a practical method for its estimation.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/hyvarinen17a.html
  PDF: http://proceedings.mlr.press/v54/hyvarinen17a/hyvarinen17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-hyvarinen17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Aapo
    family: Hyvarinen
  - given: Hiroshi
    family: Morioka
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 460-469
  id: hyvarinen17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 460
  lastpage: 469
  published: 2017-04-10 00:00:00 +0000
- title: 'Stochastic Difference of Convex Algorithm and its Application to Training Deep Boltzmann Machines'
  abstract: 'Difference of convex functions (DC) programming is an important approach to nonconvex optimization problems because these structures can be encountered in several fields. Effective optimization methods, called DC algorithms, have been developed in deterministic optimization literature. In machine learning, a lot of important learning problems such as the Boltzmann machines (BMs) can be formulated as DC programming. However, there is no DC-like algorithm guaranteed by convergence rate analysis for stochastic problems that are more suitable settings for machine learning tasks.  In this paper, we propose a stochastic variant of DC algorithm and give computational complexities to converge to a stationary point under several situations. Moreover, we show our method includes expectation-maximization (EM) and Monte Carlo EM (MCEM) algorithm as special cases on training BMs. In other words, we extend EM/MCEM algorithm to more effective methods from DC viewpoint with theoretical convergence guarantees. Experimental results indicate that our method performs well for training binary restricted Boltzmann machines and deep Boltzmann machines without pre-training.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/nitanda17a.html
  PDF: http://proceedings.mlr.press/v54/nitanda17a/nitanda17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-nitanda17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Atsushi
    family: Nitanda
  - given: Taiji
    family: Suzuki
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 470-478
  id: nitanda17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 470
  lastpage: 478
  published: 2017-04-10 00:00:00 +0000
- title: 'Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot'
  abstract: 'While there has been a significant amount of work studying gradient descent techniques for non-convex optimization problems over the last few years, all existing results establish either local convergence with good rates or global convergence with highly suboptimal rates, for many problems of interest. In this paper, we take the first step in getting the best of both worlds – establishing global convergence and obtaining a good rate of convergence for the problem of computing squareroot of a positive semidefinite (PSD) matrix, which is a widely studied problem in numerical linear algebra with applications in machine learning and statistics among others.  Given a PSD matrix M and a PSD starting point $U_0$, we show that gradient descent with appropriately chosen step-size finds an epsilon-accurate squareroot of M in $O(α\log(||M-U_0||_F^2 / ε))$ iterations, where $α=  (\max{||U_0||^2, ||M||} / \min{\sigma_min^2(U_0), \sigma_min(M)} )^3/2$. Our result is the first to establish global convergence for this problem and that it is robust to errors in each iteration.  A key contribution of our work is the general proof technique which we believe should further excite research in understanding deterministic and stochastic variants of simple non-convex gradient descent algorithms with good global convergence rates for other problems in machine learning and numerical linear algebra. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/jain17a.html
  PDF: http://proceedings.mlr.press/v54/jain17a/jain17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-jain17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Prateek
    family: Jain
  - given: Chi
    family: Jin
  - given: Sham
    family: Kakade
  - given: Praneeth
    family: Netrapalli
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 479-488
  id: jain17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 479
  lastpage: 488
  published: 2017-04-10 00:00:00 +0000
- title: 'Reparameterization Gradients through Acceptance-Rejection Sampling Algorithms'
  abstract: 'Variational inference using the reparameterization trick has enabled large-scale approximate Bayesian inference in complex probabilistic models, leveraging stochastic optimization to sidestep intractable expectations. The reparameterization trick is applicable when we can simulate a random variable by applying a differentiable deterministic function on an auxiliary random variable whose distribution is fixed. For many distributions of interest (such as the gamma or Dirichlet), simulation of random variables relies on acceptance-rejection sampling. The discontinuity introduced by the accept-reject step means that standard reparameterization tricks are not applicable. We propose a new method that lets us leverage reparameterization gradients even when variables are outputs of a acceptance-rejection sampling algorithm. Our approach enables reparameterization on a larger class of variational distributions. In several studies of real and synthetic data, we show that the variance of the estimator of the gradient is significantly lower than other state-of-the-art methods. This leads to faster convergence of stochastic gradient variational inference.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/naesseth17a.html
  PDF: http://proceedings.mlr.press/v54/naesseth17a/naesseth17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-naesseth17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Christian
    family: Naesseth
  - given: Francisco
    family: Ruiz
  - given: Scott
    family: Linderman
  - given: David
    family: Blei
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 489-498
  id: naesseth17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 489
  lastpage: 498
  published: 2017-04-10 00:00:00 +0000
- title: 'Asymptotically exact inference in differentiable generative models'
  abstract: 'Many generative models can be expressed as a differentiable function of random inputs drawn from some simple probability density. This framework includes both deep generative architectures such as Variational Autoencoders and a large class of procedurally defined simulator models. We present a method for performing efficient MCMC inference in such models when conditioning on observations of the model output. For some models this offers an asymptotically exact inference method where Approximate Bayesian Computation might otherwise be employed. We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs. This motivates the use of a constrained variant of Hamiltonian Monte Carlo which leverages the smooth geometry of the manifold to coherently move between inputs exactly consistent with observations. We validate the method by performing inference tasks in a diverse set of models.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/graham17a.html
  PDF: http://proceedings.mlr.press/v54/graham17a/graham17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-graham17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Matthew
    family: Graham
  - given: Amos
    family: Storkey
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 499-508
  id: graham17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 499
  lastpage: 508
  published: 2017-04-10 00:00:00 +0000
- title: 'Decentralized Collaborative Learning of Personalized Models over Networks'
  abstract: 'We consider a set of learning agents in a collaborative peer-to-peer network, where each agent learns a personalized model according to its own learning objective. The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives? We introduce and analyze two asynchronous gossip algorithms running in a fully decentralized manner. Our first approach, inspired from label propagation, aims to smooth pre-trained local models over the network while accounting for the confidence that each agent has in its initial model. In our second approach, agents jointly learn and propagate their model by making iterative updates based on both their local dataset and the behavior of their neighbors. To optimize this challenging objective, our decentralized algorithm is based on ADMM.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/vanhaesebrouck17a.html
  PDF: http://proceedings.mlr.press/v54/vanhaesebrouck17a/vanhaesebrouck17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-vanhaesebrouck17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Paul
    family: Vanhaesebrouck
  - given: Aurélien
    family: Bellet
  - given: Marc
    family: Tommasi
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 509-517
  id: vanhaesebrouck17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 509
  lastpage: 517
  published: 2017-04-10 00:00:00 +0000
- title: 'Contextual Bandits with Latent Confounders: An NMF Approach'
  abstract: 'Motivated by online recommendation and advertising systems, we consider a causal model for stochastic contextual bandits with a latent low-dimensional confounder. In our model, there are $L$ observed contexts and $K$ arms of the bandit. The observed context influences the reward obtained through a latent confounder variable with cardinality $m$ ($m ≪L,K$). The arm choice and the latent confounder causally determines the reward while the observed context is correlated with the confounder. Under this model, the $L \times K$ mean reward matrix $\mathbfU$ (for each context in $[L]$ and each arm in $[K]$) factorizes into non-negative factors $\mathbfA$ ($L \times m$) and  $\mathbfW$ ($m \times K$). This insight enables us to propose an $ε$-greedy NMF-Bandit algorithm that designs a sequence of interventions (selecting specific arms), that achieves a balance between learning this low-dimensional structure and selecting the best arm to minimize regret. Our algorithm achieves a regret of  $\mathcalO\left(L\mathrmpoly(m, \log K) \log T \right)$ at time $T$, as compared to $\mathcalO(LK\log T)$ for conventional contextual bandits, assuming a constant gap between the best arm and the rest for each context. These guarantees are obtained under mild sufficiency conditions on the factors that are weaker versions of the well-known Statistical RIP condition.  We further propose a class of generative models that satisfy our sufficient conditions, and derive a lower bound of $\mathcalO\left(Km\log T\right)$.  These are the first regret guarantees for online matrix completion with bandit feedback, when the rank is greater than one.  We further compare the performance of our algorithm with the state of the art, on synthetic and real world data-sets.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/sen17a.html
  PDF: http://proceedings.mlr.press/v54/sen17a/sen17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-sen17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rajat
    family: Sen
  - given: Karthikeyan
    family: Shanmugam
  - given: Murat
    family: Kocaoglu
  - given: Alex
    family: Dimakis
  - given: Sanjay
    family: Shakkottai
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 518-527
  id: sen17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 518
  lastpage: 527
  published: 2017-04-10 00:00:00 +0000
- title: 'Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets'
  abstract: 'Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset.  We construct a Bayesian optimization procedure, dubbed FABOLAS, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that FABOLAS often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/klein17a.html
  PDF: http://proceedings.mlr.press/v54/klein17a/klein17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-klein17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Aaron
    family: Klein
  - given: Stefan
    family: Falkner
  - given: Simon
    family: Bartels
  - given: Philipp
    family: Hennig
  - given: Frank
    family: Hutter
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 528-536
  id: klein17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 528
  lastpage: 536
  published: 2017-04-10 00:00:00 +0000
- title: 'Least-Squares Log-Density Gradient Clustering for Riemannian Manifolds'
  abstract: 'Mean shift is a mode-seeking clustering algorithm that has been successfully used in a wide range of applications such as image segmentation and object tracking. To further improve the clustering performance, mean shift has been extended to various directions, including generalization to handle data on Riemannian manifolds and extension to directly estimating the density gradient without density estimation. In this paper, we combine these ideas and propose a novel mode-seeking algorithm for Riemannian manifolds with direct density-gradient estimation. Although the idea of combining the two extensions is rather straightforward, directly estimating the density gradient on Riemannian manifolds is mathematically challenging. We will provide a mathematically sound algorithm and demonstrate its usefulness through experiments.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/ashizawa17a.html
  PDF: http://proceedings.mlr.press/v54/ashizawa17a/ashizawa17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-ashizawa17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mina
    family: Ashizawa
  - given: Hiroaki
    family: Sasaki
  - given: Tomoya
    family: Sakai
  - given: Masashi
    family: Sugiyama
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 537-546
  id: ashizawa17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 537
  lastpage: 546
  published: 2017-04-10 00:00:00 +0000
- title: 'Fast column generation for atomic norm regularization'
  abstract: 'We consider optimization problems that consist in minimizing a quadratic function under an atomic norm regularization or constraint. In the line of work on conditional gradient algorithms, we show that the fully corrective Frank-Wolfe (FCFW) algorithm — which is most naturally reformulated as a column generation algorithm in the regularized case — can be made particularly efficient for difficult problems in this family by solving the simplicial or conical subproblems produced by FCFW using a special instance of a classical  active set algorithm for quadratic programming that generalizes the min-norm point algorithm.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/vinyes17a.html
  PDF: http://proceedings.mlr.press/v54/vinyes17a/vinyes17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-vinyes17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Marina
    family: Vinyes
  - given: Guillaume
    family: Obozinski
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 547-556
  id: vinyes17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 547
  lastpage: 556
  published: 2017-04-10 00:00:00 +0000
- title: 'Bayesian Hybrid Matrix Factorisation for Data Integration'
  abstract: 'We introduce a novel Bayesian hybrid matrix factorisation model (HMF) for data integration, based on combining multiple matrix factorisation methods, that can be used for in- and out-of-matrix prediction of missing values. The model is very general and can be used to integrate many datasets across different entity types, including repeated experiments, similarity matrices, and very sparse datasets. We apply our method on two biological applications, and extensively compare it to state-of-the-art machine learning and matrix factorisation models. For in-matrix predictions on drug sensitivity datasets we obtain consistently better performances than existing methods. This is especially the case when we increase the sparsity of the datasets. Furthermore, we perform out-of-matrix predictions on methylation and gene expression datasets, and obtain the best results on two of the three datasets, especially when the predictivity of datasets is high.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/brouwer17a.html
  PDF: http://proceedings.mlr.press/v54/brouwer17a/brouwer17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-brouwer17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Thomas
    family: Brouwer
  - given: Pietro
    family: Lio
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 557-566
  id: brouwer17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 557
  lastpage: 566
  published: 2017-04-10 00:00:00 +0000
- title: 'Co-Occurring Directions Sketching for Approximate Matrix Multiply'
  abstract: 'We introduce co-occurring directions sketching, a deterministic algorithm for approximate matrix product (AMM), in the streaming model. We show that co-occurring directions achieves a better error bound for AMM than other randomized and deterministic approaches for AMM. Co-occurring directions gives a (1 + epsilon) - approximation of the optimal low rank approximation of a matrix product. Empirically our algorithm outperforms competing methods for AMM, for a small sketch size. We validate empirically our theoretical findings and algorithms.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/mroueh17a.html
  PDF: http://proceedings.mlr.press/v54/mroueh17a/mroueh17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-mroueh17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Youssef
    family: Mroueh
  - given: Etienne
    family: Marcheret
  - given: Vaibahava
    family: Goel
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 567-575
  id: mroueh17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 567
  lastpage: 575
  published: 2017-04-10 00:00:00 +0000
- title: 'Exploration-Exploitation in MDPs with Options'
  abstract: 'While a large body of empirical results show that temporally-extended actions and options may significantly affect the learning performance of an agent, the theoretical understanding of how and when options can be beneficial in online reinforcement learning is relatively limited. In this paper, we derive an upper and lower bound on the regret of a variant of UCRL using options. While we first analyze the algorithm in the general case of semi-Markov decision processes (SMDPs), we show how these results can be translated to the specific case of MDPs with options and we illustrate simple scenarios in which the regret of learning with options can be provably much smaller than the regret suffered when learning with primitive actions. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/fruit17a.html
  PDF: http://proceedings.mlr.press/v54/fruit17a/fruit17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-fruit17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ronan
    family: Fruit
  - given: Alessandro
    family: Lazaric
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 576-584
  id: fruit17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 576
  lastpage: 584
  published: 2017-04-10 00:00:00 +0000
- title: 'Local Perturb-and-MAP for Structured Prediction'
  abstract: 'Conditional random fields (CRFs) provide a powerful tool for structured prediction, but cast significant challenges in both the learning and inference steps. Approximation techniques are widely used in both steps, which should be considered jointly to guarantee good performance (a.k.a. “inferning"). Perturb-and-MAP models provide a promising alternative to CRFs, but require global combinatorial optimization and hence they are usable only on specific models. In this work, we present a new Local Perturb-and-MAP (locPMAP) framework that replaces the global optimization with a local optimization by exploiting our observed connection between locPMAP and the pseudolikelihood of the original CRF model. We test our approach on three different vision tasks and show that our method achieves consistently improved performance over other approximate inference techniques optimized to a pseudolikelihood objective.  Additionally, we demonstrate that we can integrate our method in the fully convolutional network framework to increase our model’s complexity. Finally, our observed connection between locPMAP and the pseudolikelihood leads to a novel perspective for understanding and using pseudolikelihood. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/bertasius17a.html
  PDF: http://proceedings.mlr.press/v54/bertasius17a/bertasius17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-bertasius17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Gedas
    family: Bertasius
  - given: Qiang
    family: Liu
  - given: Lorenzo
    family: Torresani
  - given: Jianbo
    family: Shi
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 585-594
  id: bertasius17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 585
  lastpage: 594
  published: 2017-04-10 00:00:00 +0000
- title: 'Gradient Boosting on Stochastic Data Streams'
  abstract: 'Boosting is a popular ensemble algorithm that generates more powerful learners by linearly combining base models from a simpler hypothesis  class. In this work, we investigate the problem of adapting batch gradient boosting  for minimizing convex loss functions to online setting where the loss at each iteration is i.i.d sampled from an unknown distribution.  To generalize from batch to online, we first introduce the definition of online weak learning edge with which for strongly convex and  smooth loss functions, we present an algorithm, Streaming Gradient Boosting (SGB) with exponential shrinkage guarantees in the number of weak learners. We further present an adaptation of SGB to optimize non-smooth  loss functions, for which we derive a $O(\ln(N)/N)$ convergence rate. We also show that our analysis can extend to adversarial online  learning setting under a stronger assumption  that the online weak learning edge will hold in adversarial setting. We finally demonstrate  experimental results showing that in practice our algorithms can achieve competitive  results as classic gradient boosting while using less computation.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/hu17a.html
  PDF: http://proceedings.mlr.press/v54/hu17a/hu17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-hu17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Hanzhang
    family: Hu
  - given: Wen
    family: Sun
  - given: Arun
    family: Venkatraman
  - given: Martial
    family: Hebert
  - given: Andrew
    family: Bagnell
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 595-603
  id: hu17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 595
  lastpage: 603
  published: 2017-04-10 00:00:00 +0000
- title: 'Online Learning and Blackwell Approachability with Partial Monitoring: Optimal Convergence Rates'
  abstract: 'Blackwell approachability is an online learning setup generalizing the classical problem of regret minimization by allowing for instance multi-criteria optimization, global (online) optimization of a convex loss, or online linear optimization under some cumulative constraint. We consider partial monitoring where the decision maker does not necessarily observe the outcomes of his decision (unlike the traditional regret/bandit literature). Instead, he receives a random signal correlated to the decision–outcome pair, or only to the outcome.  We construct, for the first time, approachability algorithms with convergence rate of order $O(T^-1/2)$ when the signal is independent of the decision and of order $O(T^-1/3)$ in the case of general signals. Those rates are optimal in the sense that they cannot be improved without further assumption on the structure of the objectives and/or the signals.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/kwon17a.html
  PDF: http://proceedings.mlr.press/v54/kwon17a/kwon17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-kwon17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Joon
    family: Kwon
  - given: Vianney
    family: Perchet
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 604-613
  id: kwon17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 604
  lastpage: 613
  published: 2017-04-10 00:00:00 +0000
- title: 'Tensor Decompositions via Two-Mode Higher-Order SVD (HOSVD)'
  abstract: 'Tensor decompositions have rich applications in statistics and machine learning, and developing efficient, accurate algorithms for the problem has received much attention recently. Here, we present a new method built on Kruskal’s uniqueness theorem to decompose symmetric, nearly orthogonally decomposable tensors. Unlike the classical higher-order singular value decomposition which unfolds a tensor along a single mode, we consider unfoldings along two modes and use rank-1 constraints to characterize the underlying components. This tensor decomposition method provably handles a greater level of noise compared to previous methods and achieves a high estimation accuracy. Numerical results demonstrate that our algorithm is robust to various noise distributions and that it performs especially favorably as the order increases.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/wang17a.html
  PDF: http://proceedings.mlr.press/v54/wang17a/wang17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-wang17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Miaoyan
    family: Wang
  - given: Yun
    family: Song
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 614-622
  id: wang17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 614
  lastpage: 622
  published: 2017-04-10 00:00:00 +0000
- title: 'Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers'
  abstract: 'For optimal decision making under variable class distributions and misclassification costs a classifier needs to produce well-calibrated estimates of the posterior probability. Isotonic calibration is a powerful non-parametric method that is however prone to overfitting on smaller datasets; hence a parametric method based on the logistic curve is commonly used. While logistic calibration is designed for normally distributed per-class scores, we demonstrate experimentally that many classifiers including Naive Bayes and Adaboost suffer from a particular distortion where these score distributions are heavily skewed. In such cases logistic calibration can easily yield probability estimates that are worse than the original scores. Moreover, the logistic curve family does not include the identity function, and hence logistic calibration can easily uncalibrate a perfectly calibrated classifier.  In this paper we solve all these problems with a richer class of calibration maps based on the beta distribution. We derive the method from first principles and show that fitting it is as easy as fitting a logistic curve. Extensive experiments show that beta calibration is superior to logistic calibration for Naive Bayes and Adaboost.  '
  volume: 54
  URL: https://proceedings.mlr.press/v54/kull17a.html
  PDF: http://proceedings.mlr.press/v54/kull17a/kull17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-kull17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Meelis
    family: Kull
  - given: Telmo Silva
    family: Filho
  - given: Peter
    family: Flach
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 623-631
  id: kull17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 623
  lastpage: 631
  published: 2017-04-10 00:00:00 +0000
- title: 'Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes'
  abstract: 'Datasets with hundreds of variables and many missing values are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false positives. This paper proposes an approach that combines probabilistic programming, information theory, and non-parametric Bayes. It shows how to use Bayesian non-parametric modeling to (i) build an ensemble of joint probability models for all the variables; (ii) efficiently detect marginal independencies; and (iii) estimate the conditional mutual information between arbitrary subsets of variables, subject to a broad class of constraints. Users can access these capabilities using BayesDB, a probabilistic programming platform for probabilistic data analysis, by writing queries in a simple, SQL-like language. This paper demonstrates empirically that the method can (i) detect context-specific (in)dependencies on challenging synthetic problems and (ii) yield improved sensitivity and specificity over baselines from statistics and machine learning, on a real-world database of over 300 sparsely observed indicators of macroeconomic development and public health.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/saad17a.html
  PDF: http://proceedings.mlr.press/v54/saad17a/saad17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-saad17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Feras
    family: Saad
  - given: Vikash
    family: Mansinghka
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 632-641
  id: saad17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 632
  lastpage: 641
  published: 2017-04-10 00:00:00 +0000
- title: 'High-dimensional Time Series Clustering via Cross-Predictability'
  abstract: 'The key to time series clustering is how to characterize the similarity between any two time series.  In this paper, we explore a new similarity metric called “cross-predictability”: the degree to which a future value in each time series is predicted by past values of the others.  However, it is challenging to estimate such cross-predictability among time series in the high-dimensional regime, where the number of time series is much larger than the length of each time series. We address this challenge with a sparsity assumption: only time series in the same cluster have significant cross-predictability with each other.  We demonstrate that this approach is computationally attractive, and  provide a theoretical proof that the proposed algorithm will identify the correct clustering structure with high probability under certain conditions. To the best of our knowledge, this is the first practical high-dimensional time series clustering algorithm with a provable guarantee.  We evaluate with experiments on both synthetic data and real-world data, and results indicate that our method can achieve more than 80% clustering accuracy on real-world data, which is 20% higher than the state-of-art baselines.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/hong17a.html
  PDF: http://proceedings.mlr.press/v54/hong17a/hong17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-hong17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dezhi
    family: Hong
  - given: Quanquan
    family: Gu
  - given: Kamin
    family: Whitehouse
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 642-651
  id: hong17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 642
  lastpage: 651
  published: 2017-04-10 00:00:00 +0000
- title: 'Minimax Approach to Variable Fidelity Data Interpolation'
  abstract: 'Engineering problems often involve data sources of variable fidelity with different costs of obtaining an observation.  In particular, one can use both a cheap low fidelity function (e.g. a computational experiment with a CFD code) and an expensive high fidelity function (e.g. a wind tunnel experiment) to generate a data sample in order to construct a regression model of a high fidelity function. The key question in this setting is how the sizes of the high and low fidelity data samples should be selected in order to stay within a given computational budget and maximize accuracy of the regression model prior to committing resources on data acquisition.   In this paper we obtain minimax interpolation errors for single and variable fidelity scenarios for a multivariate Gaussian process regression. Evaluation of the minimax errors allows us to identify cases when the variable fidelity data provides better interpolation accuracy than the exclusively high fidelity data for the same computational budget. These results allow us to calculate the optimal shares of variable fidelity data samples under the given computational budget constraint. Real and synthetic data experiments suggest that using the obtained optimal shares often outperforms natural heuristics in terms of the regression accuracy. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/zaytsev17a.html
  PDF: http://proceedings.mlr.press/v54/zaytsev17a/zaytsev17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-zaytsev17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexey
    family: Zaytsev
  - given: Evgeny
    family: Burnaev
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 652-661
  id: zaytsev17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 652
  lastpage: 661
  published: 2017-04-10 00:00:00 +0000
- title: 'Data Driven Resource Allocation for Distributed Learning'
  abstract: 'In distributed machine learning, data is dispatched to multiple machines for processing. Motivated by the fact that similar data points often belong to the same or similar classes, and more generally, classification rules of high accuracy tend to be “locally simple but globally complex” (Vapnik and Bottou, 1993), we propose data dependent dispatching that takes advantage of such structure. We present an in-depth analysis of this model, providing new algorithms with provable worst-case guarantees, analysis proving existing scalable heuristics perform well in natural non worst-case conditions, and techniques for extending a dispatching rule from a small sample to the entire distribution. We overcome novel technical challenges to satisfy important conditions for accurate distributed learning, including fault tolerance and balancedness. We empirically compare our approach with baselines based on random partitioning, balanced partition trees, and locality sensitive hashing, showing that we achieve significantly higher accuracy on both synthetic and real world image and advertising datasets. We also demonstrate that our technique strongly scales with the available computing power.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/dick17a.html
  PDF: http://proceedings.mlr.press/v54/dick17a/dick17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-dick17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Travis
    family: Dick
  - given: Mu
    family: Li
  - given: Venkata Krishna
    family: Pillutla
  - given: Colin
    family: White
  - given: Nina
    family: Balcan
  - given: Alex
    family: Smola
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 662-671
  id: dick17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 662
  lastpage: 671
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning Nonparametric Forest Graphical Models with Prior Information'
  abstract: 'We present a framework for incorporating prior information into nonparametric estimation of graphical models. To avoid distributional assumptions, we restrict the graph to be a forest and build on the work of forest density estimation (FDE). We reformulate the FDE approach from a Bayesian perspective, and introduce prior distributions on the graphs. As two concrete examples, we apply this framework to estimating scale-free graphs and learning multiple graphs with similar structures. The resulting algorithms are equivalent to finding a maximum spanning tree of a weighted graph with a penalty term on the  connectivity pattern of the graph. We solve the optimization problem via a minorize-maximization procedure with Kruskal’s algorithm. Simulations show that the proposed methods outperform competing parametric methods, and are robust to the true data distribution. They also lead to improvement in predictive power and interpretability in two real data sets.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/zhu17a.html
  PDF: http://proceedings.mlr.press/v54/zhu17a/zhu17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-zhu17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yuancheng
    family: Zhu
  - given: Zhe
    family: Liu
  - given: Siqi
    family: Sun
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 672-680
  id: zhu17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 672
  lastpage: 680
  published: 2017-04-10 00:00:00 +0000
- title: 'Sparse Randomized Partition Trees for Nearest Neighbor Search'
  abstract: 'Randomized partition trees have recently been shown to be very effective in solving nearest neighbor search problem. In spite of enjoying strong theoretical guarantee, it suffers from high space complexity, since each internal node of the tree needs to store a $d$ dimensional projection direction leading to a $O(nd)$ space complexity for a dataset of size $n$. Inspired by the fast Johnson-Lindenstrauss transform, in this paper, we propose a sparse version of randomized partition tree where each internal node needs to store only a few non-zero entries, as opposed to all $d$ entries,  leading to significant space savings without sacrificing  much in terms of nearest neighbor search accuracy. As a by product of this,  query time of our proposed method is slightly better than that of  its non-sparse counterpart for large dataset size.  Our theoretical results indicate that our proposed method enjoys the same theoretical guarantee as that of the original non-sparse RP-tree. Experimental evaluations on four real world dataset strongly  suggest that nearest neighbor search performance of our proposed sparse RP-tree is very similar to that of its non-sparse counterpart in terms of accuracy and number of retrieved points.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/sinha17a.html
  PDF: http://proceedings.mlr.press/v54/sinha17a/sinha17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-sinha17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kaushik
    family: Sinha
  - given: Omid
    family: Keivani
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 681-689
  id: sinha17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 681
  lastpage: 689
  published: 2017-04-10 00:00:00 +0000
- title: 'Horde of Bandits using Gaussian Markov Random Fields'
  abstract: 'The gang of bandits (GOB) model [7] is a recent contextual bandits framework that shares information between a set of bandit problems, related by a known (possibly noisy) graph. This model is useful in problems like recommender systems where the large number of users makes it vital to transfer information between users. Despite its effectiveness, the existing GOB model can only be applied to small problems due to its quadratic time-dependence on the number of nodes. Existing solutions to combat the scalability issue require an often-unrealistic  clustering assumption. By exploiting a connection to Gaussian Markov random fields (GMRFs), we show that the GOB model can be made to scale to much larger graphs without additional assumptions. In addition, we propose a Thompson sampling algorithm which uses the recent GMRF sampling-by-perturbation technique, allowing it to scale to even larger problems (leading to a “horde” of bandits). We give regret bounds and experimental results for GOB with Thompson sampling and epoch-greedy algorithms, indicating that these methods are as good as or significantly better than ignoring the graph or adopting a clustering-based approach. Finally, when an existing graph is not available, we propose a heuristic for learning it on the fly and show promising results. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/vaswani17a.html
  PDF: http://proceedings.mlr.press/v54/vaswani17a/vaswani17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-vaswani17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Sharan
    family: Vaswani
  - given: Mark
    family: Schmidt
  - given: Laks
    family: Lakshmanan
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 690-699
  id: vaswani17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 690
  lastpage: 699
  published: 2017-04-10 00:00:00 +0000
- title: 'Random projection design for scalable implicit smoothing of randomly observed stochastic processes'
  abstract: 'Sampling at random timestamps, long range dependencies, and scale hamper standard meth- ods for multivariate time series analysis. In this paper we present a novel estimator for cross-covariance of randomly observed time series which unravels the dynamics of an unobserved stochastic process. We analyze the statistical properties of our estimator without needing the assumption that observation timestamps are independent from the process of interest and show that our solution is not hindered by the issues affecting standard estimators for cross-covariance. We implement and evaluate our statistically sound and scalable approach in the distributed setting using Apache Spark and demonstrate its ability to unravel causal dynamics on both simulations and high-frequency financial trading data.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/belletti17a.html
  PDF: http://proceedings.mlr.press/v54/belletti17a/belletti17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-belletti17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Francois
    family: Belletti
  - given: Evan
    family: Sparks
  - given: Alexandre
    family: Bayen
  - given: Joseph
    family: Gonzalez
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 700-708
  id: belletti17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 700
  lastpage: 708
  published: 2017-04-10 00:00:00 +0000
- title: 'Trading off Rewards and Errors in Multi-Armed Bandits'
  abstract: 'In multi-armed bandits, the most common objective is the maximization of the cumulative reward. Alternative settings include active exploration, where a learner tries to gain accurate estimates of the rewards of all arms. While these objectives are contrasting, in many scenarios it is desirable to trade off rewards and errors. For instance, in educational games the designer wants to gather generalizable knowledge about the behavior of the students and teaching strategies (small estimation errors) but, at the same time, the system needs to avoid giving a bad experience to the players, who may leave the system permanently (large reward). In this paper, we formalize this tradeoff and introduce the ForcingBalance algorithm whose performance is provably close to the best possible tradeoff strategy. Finally, we demonstrate on real-world educational data that ForcingBalance returns useful information about the arms without compromising the overall reward.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/erraqabi17a.html
  PDF: http://proceedings.mlr.press/v54/erraqabi17a/erraqabi17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-erraqabi17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Akram
    family: Erraqabi
  - given: Alessandro
    family: Lazaric
  - given: Michal
    family: Valko
  - given: Emma
    family: Brunskill
  - given: Yun-En
    family: Liu
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 709-717
  id: erraqabi17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 709
  lastpage: 717
  published: 2017-04-10 00:00:00 +0000
- title: 'Adaptive ADMM with Spectral Penalty Parameter Selection'
  abstract: 'The alternating direction method of multipliers (ADMM) is a versatile tool for solving a wide range of constrained optimization problems. However, its performance is highly sensitive to a penalty parameter,  making ADMM often unreliable and hard to automate for a non-expert user.  We tackle this weakness of ADMM by proposing a method that adaptively tunes the penalty parameter to achieve fast convergence. The resulting adaptive ADMM (AADMM)  algorithm, inspired by the  successful Barzilai-Borwein spectral method for gradient descent, yields fast convergence and relative insensitivity to the initial stepsize  and problem scaling.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/xu17a.html
  PDF: http://proceedings.mlr.press/v54/xu17a/xu17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-xu17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Zheng
    family: Xu
  - given: Mario
    family: Figueiredo
  - given: Tom
    family: Goldstein
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 718-727
  id: xu17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 718
  lastpage: 727
  published: 2017-04-10 00:00:00 +0000
- title: 'The End of Optimism? An Asymptotic Analysis of Finite-Armed Linear Bandits'
  abstract: 'Stochastic linear bandits are a natural and simple generalisation of finite-armed bandits with numerous practical applications. Current approaches focus on generalising existing techniques for finite-armed bandits, notably the otimism principle and Thompson sampling. Prior analysis has mostly focussed on the worst-case setting. We analyse the asymptotic regret and show matching upper and lower bounds on what is achievable. Surprisingly, our results show that no algorithm based on optimism or Thompson sampling will ever achieve the optimal rate. In fact, they can be arbitrarily far from optimal, even in very simple cases. This is a disturbing result because these techniques are standard tools that are widely used for sequential optimisation, for example,  generalised linear bandits and reinforcement learning. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/lattimore17a.html
  PDF: http://proceedings.mlr.press/v54/lattimore17a/lattimore17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-lattimore17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tor
    family: Lattimore
  - given: Csaba
    family: Szepesvari
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 728-737
  id: lattimore17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 728
  lastpage: 737
  published: 2017-04-10 00:00:00 +0000
- title: 'Dynamic Collaborative Filtering With Compound Poisson Factorization'
  abstract: 'Model-based collaborative filtering (CF) analyzes user–item interactions to infer latent factors that represent user preferences and item characteristics in order to predict future interactions. Most CF approaches assume that these latent factors are static; however, in most CF data, user preferences and item perceptions drift over time. Here, we propose a new conjugate and numerically stable dynamic matrix factorization (DCPF) based on hierarchical Poisson factorization that models the smoothly drifting latent factors using gamma-Markov chains. We propose a conjugate gamma chain construction that is numerically stable within our compound-Poisson framework. We then derive a scalable stochastic variational inference approach to estimate the parameters of our model. We apply our model to time-stamped ratings data sets from Netflix, Yelp, and Last.fm. We empirically demonstrate that DCPF achieves a higher predictive accuracy than state-of-the-art static and dynamic factorization algorithms. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/jerfel17a.html
  PDF: http://proceedings.mlr.press/v54/jerfel17a/jerfel17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-jerfel17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ghassen
    family: Jerfel
  - given: Mehmet
    family: Basbug
  - given: Barbara
    family: Engelhardt
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 738-747
  id: jerfel17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 738
  lastpage: 747
  published: 2017-04-10 00:00:00 +0000
- title: 'Rank Aggregation and Prediction with Item Features'
  abstract: 'We study the problem of rank aggregation with features, where both pairwise comparisons and item features are available to help the rank aggregation task.  Observing that traditional rank aggregation methods disregard features, while models adapted from learning-to-rank task are sensitive to feature noise, we propose a general model to learn a total ranking by balancing between comparisons and feature information jointly.  As a result, our proposed model takes advantage of item features and is also robust to noise. More importantly, we study the effectiveness of item features in our model and show that given sufficiently informative features, the sample complexity of our model can be asymptotically lower than models based only on comparisons for deriving an $ε$-accurate ranking. The results theoretically justify that our model can achieve efficient learning by leveraging item feature information.  In addition, we show that the proposed model can also be extended to two other related problems—online rank aggregation and rank prediction of new items.  Finally, experiments show that our model is more effective and robust compared to existing methods on both synthetic and real datasets. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/chiang17a.html
  PDF: http://proceedings.mlr.press/v54/chiang17a/chiang17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-chiang17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kai-Yang
    family: Chiang
  - given: Cho-Jui
    family: Hsieh
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 748-756
  id: chiang17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 748
  lastpage: 756
  published: 2017-04-10 00:00:00 +0000
- title: 'Robust and Efficient Computation of Eigenvectors in a Generalized Spectral Method for Constrained Clustering'
  abstract: 'FAST-GE is a generalized spectral method for constrained clustering [Cucuringu et al., AISTATS 2016]. It incorporates the must-link and cannot-link constraints into two Laplacian matrices and then minimizes a Rayleigh quotient via solving a generalized eigenproblem, and is considered to be simple and scalable. However, there are two unsolved issues. Theoretically, since both Laplacian matrices are positive semi-definite and the corresponding pencil is singular, it is not proven whether the minimum of the Rayleigh quotient exists and is equivalent to an eigenproblem. Computationally, the locally optimal block preconditioned conjugate gradient (LOBPCG) method is not designed for solving the eigenproblem of a singular pencil. In fact, to the best of our knowledge, there is no existing eigensolver that is immediately applicable. In this paper, we provide solutions to these two critical issues. We prove a generalization of Courant-Fischer variational principle for the Laplacian singular pencil. We propose a regularization for the pencil so that LOBPCG is applicable. We demonstrate the robustness and efficiency of proposed solutions for constrained image segmentation. The proposed theoretical and computational solutions can be applied to eigenproblems of positive semi-definite pencils arising in other machine learning algorithms, such as generalized linear discriminant analysis in dimension reduction and multisurface classification via eigenvectors. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/jiang17b.html
  PDF: http://proceedings.mlr.press/v54/jiang17b/jiang17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-jiang17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Chengming
    family: Jiang
  - given: Huiqing
    family: Xie
  - given: Zhaojun
    family: Bai
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 757-766
  id: jiang17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 757
  lastpage: 766
  published: 2017-04-10 00:00:00 +0000
- title: 'Information-theoretic limits of Bayesian network structure learning'
  abstract: 'In this paper, we study the information-theoretic limits of learning the structure of Bayesian networks (BNs), on discrete as well as continuous random variables, from a finite number of samples. We show that the minimum number of samples required by any procedure to recover the correct structure grows as $Ω(m)$ and $Ω(k \log m + (k^2)/m)$ for non-sparse and sparse BNs respectively, where m is the number of variables and k is the maximum number of parents per node. We provide a simple recipe, based on an extension of the Fano’s inequality, to obtain information-theoretic limits of structure recovery for any exponential family BN. We instantiate our result for specific conditional distributions in the exponential family to characterize the fundamental limits of learning various commonly used BNs, such as conditional probability table based networks, Gaussian BNs, noisy-OR networks, and logistic regression networks. En route to obtaining our main results, we obtain tight bounds on the number of sparse and non-sparse essential-DAGs. Finally, as a byproduct, we recover the information-theoretic limits of sparse variable selection for logistic regression.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/ghoshal17a.html
  PDF: http://proceedings.mlr.press/v54/ghoshal17a/ghoshal17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-ghoshal17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Asish
    family: Ghoshal
  - given: Jean
    family: Honorio
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 767-775
  id: ghoshal17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 767
  lastpage: 775
  published: 2017-04-10 00:00:00 +0000
- title: 'Markov Chain Truncation for Doubly-Intractable Inference'
  abstract: 'Computing partition functions, the normalizing constants of probability distributions, is often hard. Variants of importance sampling give unbiased estimates of a normalizer Z, however, unbiased estimates of the reciprocal 1/Z are harder to obtain. Unbiased estimates of 1/Z allow Markov chain Monte Carlo sampling of “doubly-intractable” distributions, such as the parameter posterior for Markov Random Fields or Exponential Random Graphs. We demonstrate how to construct unbiased estimates for 1/Z given access to black-box importance sampling estimators for Z. We adapt recent work on random series truncation and Markov chain coupling, producing estimators with lower variance and a higher percentage of positive estimates than before. Our debiasing algorithms are simple to implement, and have some theoretical and empirical advantages over existing methods.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/wei17a.html
  PDF: http://proceedings.mlr.press/v54/wei17a/wei17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-wei17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Colin
    family: Wei
  - given: Iain
    family: Murray
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 776-784
  id: wei17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 776
  lastpage: 784
  published: 2017-04-10 00:00:00 +0000
- title: 'Regression Uncertainty on the Grassmannian'
  abstract: 'Trends in longitudinal or cross-sectional studies over time are often captured through regression models. In their simplest manifestation, these regression models are formulated in $R^n$. However, in the context of imaging studies, the objects of interest which are to be regressed are frequently best modeled as elements of a Riemannian manifold. Regression on such spaces can be accomplished through geodesic regression. This paper develops an approach to compute confidence intervals for geodesic regression models. The approach is general, but illustrated and specifically developed for the Grassmann manifold, which allows us, e.g., to regress shapes or linear dynamical systems. Extensions to other manifolds can be obtained in a similar manner. We demonstrate our approach for regression with 2D/3D shapes using synthetic and real data.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/hong17b.html
  PDF: http://proceedings.mlr.press/v54/hong17b/hong17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-hong17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yi
    family: Hong
  - given: Xiao
    family: Yang
  - given: Roland
    family: Kwitt
  - given: Martin
    family: Styner
  - given: Marc
    family: Niethammer
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 785-793
  id: hong17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 785
  lastpage: 793
  published: 2017-04-10 00:00:00 +0000
- title: 'Attributing Hacks'
  abstract: 'In this paper, we describe an algorithm for estimating the provenance of hacks on websites. That is, given properties of sites and the temporal occurrence of attacks, we are able to attribute individual attacks to joint causes and vulnerabilities, as well as estimating the evolution of these vulnerabilities over time. Specifically, we use hazard regression with a time-varying additive hazard function parameterized in a generalized linear form. The activation coefficients on each feature are continuous-time functions over time. We formulate the problem of learning these functions as a constrained variational maximum likelihood estimation problem with total variation penalty and show that the optimal solution is a 0th order spline (a piecewise constant function) with a finite number of adaptively chosen knots. This allows the inference problem to be solved efficiently and at scale by solving a finite dimensional optimization problem. Extensive experiments on real data sets show that our method significantly outperforms Cox’s proportional hazard model. We also conduct case studies and verify that the fitted functions are indeed recovering vulnerable features.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/liu17a.html
  PDF: http://proceedings.mlr.press/v54/liu17a/liu17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-liu17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ziqi
    family: Liu
  - given: Alex
    family: Smola
  - given: Kyle
    family: Soska
  - given: Yu-Xiang
    family: Wang
  - given: Qinghua
    family: Zheng
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 794-802
  id: liu17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 794
  lastpage: 802
  published: 2017-04-10 00:00:00 +0000
- title: 'Unsupervised Sequential Sensor Acquisition'
  abstract: 'In many security and healthcare systems a sequence of sensors/tests are used for detection and diagnosis. Each test outputs a prediction of the latent state, and carries with it inherent costs. Our objective is to learn strategies for selecting tests to optimize accuracy and costs. Unfortunately it is often impossible to acquire in-situ ground truth annotations and we are left with the problem of unsupervised sensor selection (USS). We pose USS as a version of stochastic partial monitoring problem with an unusual reward structure (even noisy annotations are unavailable). Unsurprisingly no learner can achieve sublinear regret without further assumptions. To this end we propose the notion of weak-dominance. This is a condition on the joint probability distribution of test outputs and latent state and says that whenever a test is accurate on an example, a later test in the sequence is likely to be accurate as well.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/hanawal17a.html
  PDF: http://proceedings.mlr.press/v54/hanawal17a/hanawal17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-hanawal17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Manjesh
    family: Hanawal
  - given: Csaba
    family: Szepesvari
  - given: Venkatesh
    family: Saligrama
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 803-811
  id: hanawal17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 803
  lastpage: 811
  published: 2017-04-10 00:00:00 +0000
- title: 'A Stochastic Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization'
  abstract: 'Symmetric nonnegative matrix factorization (SymNMF) plays an important role in applications of many data analytics problems such as community detection, document clustering and image segmentation. In this paper, we consider a stochastic SymNMF problem in which the observation matrix is generated in a random and sequential manner. We propose a stochastic nonconvex splitting method, which not only guarantees convergence to the set of stationary points of the problem (in the mean-square sense), but further achieves a sublinear convergence rate. Numerical results show that for clustering problems over both synthetic and real world datasets, the proposed algorithm converges quickly to the set of stationary points.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/lu17a.html
  PDF: http://proceedings.mlr.press/v54/lu17a/lu17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-lu17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Songtao
    family: Lu
  - given: Mingyi
    family: Hong
  - given: Zhengdao
    family: Wang
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 812-821
  id: lu17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 812
  lastpage: 821
  published: 2017-04-10 00:00:00 +0000
- title: 'Hierarchically-partitioned Gaussian Process Approximation'
  abstract: 'The Gaussian process (GP) is a simple yet powerful probabilistic framework for various machine learning tasks.  However, exact algorithms for learning and prediction are prohibitive to be applied to large datasets due to inherent computational complexity. To overcome this main limitation, various techniques have been proposed, and in particular, local GP algorithms that scales “truly linearly”  with respect to the dataset size. In this paper, we introduce a hierarchical model based on local GP for large-scale datasets, which stacks inducing points over inducing points in layers. By using different kernels in each layer, the overall model becomes multi-scale and is able to capture both long- and short-range dependencies.  We demonstrate the effectiveness of our model by speed-accuracy performance on challenging real-world datasets.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/lee17a.html
  PDF: http://proceedings.mlr.press/v54/lee17a/lee17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-lee17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Byung-Jun
    family: Lee
  - given: Jongmin
    family: Lee
  - given: Kee-Eung
    family: Kim
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 822-831
  id: lee17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 822
  lastpage: 831
  published: 2017-04-10 00:00:00 +0000
- title: 'Scalable Learning of Non-Decomposable Objectives'
  abstract: 'Modern retrieval systems are often driven by an underlying machine learning model. The goal of such systems is to identify and possibly rank the few most relevant items for a given query or context. Thus, such systems are typically evaluated using a ranking-based performance metric such as the area under the precision-recall curve, the F score, precision at fixed recall, etc. Obviously, it is desirable to train such systems to optimize the metric of interest.  In practice, due to the scalability limitations of existing approaches for optimizing such objectives, large-scale retrieval systems are instead trained to maximize classification accuracy, in the hope that performance as measured via the true objective will also be favorable. In this work we present a unified framework that, using straightforward building block bounds, allows for highly scalable optimization of a wide range of ranking-based objectives. We demonstrate the advantage of our approach on several real-life retrieval problems that are significantly larger than those considered in the literature, while achieving substantial improvement in performance over the accuracy-objective baseline. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/eban17a.html
  PDF: http://proceedings.mlr.press/v54/eban17a/eban17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-eban17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Elad
    family: Eban
  - given: Mariano
    family: Schain
  - given: Alan
    family: Mackey
  - given: Ariel
    family: Gordon
  - given: Ryan
    family: Rifkin
  - given: Gal
    family: Elidan
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 832-840
  id: eban17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 832
  lastpage: 840
  published: 2017-04-10 00:00:00 +0000
- title: 'CPSG-MCMC: Clustering-Based Preprocessing method for Stochastic Gradient MCMC'
  abstract: 'In recent years, stochastic gradient Markov Chain Monte Carlo (SG-MCMC) methods have been raised to process large-scale dataset by iterative learning from small minibatches.  However, the high variance caused by naive subsampling usually slows down the convergence to the desired posterior distribution. In this paper, we propose an effective subsampling strategy to reduce the variance based on a failed attempt to do importance sampling. In particular, before sampling,  we partition the dataset with k-means clustering algorithm in a preprocessing step and use the fixed clustering throughout the entire MCMC simulation.  Then during simulation, we approximate the gradient of log-posterior via summing the estimated gradient of each cluster.  The resulting procedure is surprisingly simple without enhancing the complexity of the original algorithm during sampling procedure. We apply our Clustering-based Preprocessing strategy on stochastic gradient Langevin dynamics, stochastic gradient Hamilton Monte Carlo and stochastic gradient Riemann Langevin dynamics. Empirically, we provide thorough numerical results to back up the effectiveness and efficiency of our approach.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/fu17a.html
  PDF: http://proceedings.mlr.press/v54/fu17a/fu17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-fu17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tianfan
    family: Fu
  - given: Zhihua
    family: Zhang
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 841-850
  id: fu17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 841
  lastpage: 850
  published: 2017-04-10 00:00:00 +0000
- title: 'Comparison-Based Nearest Neighbor Search'
  abstract: 'We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points i and j is smaller than the distance between the points i and k. We are concerned with data structures and algorithms to find nearest neighbors based on such comparisons. We focus on a simple yet effective algorithm that recursively splits the space by first selecting two random pivot points and then assigning all other points to the closer of the two (comparison tree). We prove that if the metric space satisfies certain expansion conditions, then with high probability the height of the comparison tree is logarithmic in the number of points, leading to efficient search performance. We also provide an upper bound for the failure  probability to return the true nearest neighbor. Experiments show that the comparison tree is competitive with algorithms that have access to the actual distance values, and needs less triplet comparisons than other competitors.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/haghiri17a.html
  PDF: http://proceedings.mlr.press/v54/haghiri17a/haghiri17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-haghiri17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Siavash
    family: Haghiri
  - given: Debarghya
    family: Ghoshdastidar
  - given: Ulrike von
    family: Luxburg
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 851-859
  id: haghiri17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 851
  lastpage: 859
  published: 2017-04-10 00:00:00 +0000
- title: 'A Unified Optimization View on Generalized Matching Pursuit and Frank-Wolfe'
  abstract: 'Two of the most fundamental prototypes of greedy optimization are the matching pursuit and Frank-Wolfe algorithms. In this paper we take a unified view on both classes of methods, leading to the first explicit convergence rates of matching pursuit methods in an optimization sense, for general sets of atoms. We derive sublinear (1/t) convergence for both classes on general smooth objectives, and linear convergence on strongly convex objectives, as well as a clear correspondence of algorithm variants. Our presented algorithms and rates are affine invariant, and do not need any incoherence or sparsity assumptions.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/locatello17a.html
  PDF: http://proceedings.mlr.press/v54/locatello17a/locatello17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-locatello17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Francesco
    family: Locatello
  - given: Rajiv
    family: Khanna
  - given: Michael
    family: Tschannen
  - given: Martin
    family: Jaggi
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 860-868
  id: locatello17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 860
  lastpage: 868
  published: 2017-04-10 00:00:00 +0000
- title: 'Faster Coordinate Descent via Adaptive Importance Sampling'
  abstract: 'Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive, we mean that our selection rules are based on the dual residual or the primal-dual gap estimates and can change at each iteration. We theoretically characterize the performance of our selection rules and demonstrate improvements over the state-of-the-art, and extend our theory and algorithms to general convex objectives. Numerical evidence with hinge-loss support vector machines and Lasso confirm that the practice follows the theory. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/perekrestenko17a.html
  PDF: http://proceedings.mlr.press/v54/perekrestenko17a/perekrestenko17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-perekrestenko17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Dmytro
    family: Perekrestenko
  - given: Volkan
    family: Cevher
  - given: Martin
    family: Jaggi
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 869-877
  id: perekrestenko17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 869
  lastpage: 877
  published: 2017-04-10 00:00:00 +0000
- title: 'Conjugate-Computation Variational Inference : Converting Variational Inference in Non-Conjugate Models to Inferences in Conjugate Models'
  abstract: 'Variational inference is computationally challenging in models that contain both conjugate and non-conjugate terms. Methods specifically designed for conjugate models, even though computationally efficient, find it difficult to deal with non-conjugate terms. On the other hand, stochastic-gradient methods can handle the non-conjugate terms but they usually ignore the conjugate structure of the model which might result in slow convergence. In this paper, we propose a new algorithm called Conjugate-computation Variational Inference (CVI) which brings the best of the two worlds together – it uses conjugate computations for the conjugate terms and employs stochastic gradients for the rest. We derive this algorithm by using a stochastic mirror-descent method in the mean-parameter space, and then expressing each gradient step as a variational inference in a conjugate model. We demonstrate our algorithm’s applicability to a large class of models and establish its convergence. Our experimental results show that our method converges much faster than the methods that ignore the conjugate structure of the model. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/khan17a.html
  PDF: http://proceedings.mlr.press/v54/khan17a/khan17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-khan17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mohammad
    family: Khan
  - given: Wu
    family: Lin
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 878-887
  id: khan17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 878
  lastpage: 887
  published: 2017-04-10 00:00:00 +0000
- title: 'Hit-and-Run for Sampling and Planning in Non-Convex Spaces'
  abstract: 'We propose the Hit-and-Run algorithm for planning and sampling problems in non- convex spaces. For sampling, we show the first analysis of the Hit-and-Run algorithm in non-convex spaces and show that it mixes fast as long as certain smoothness conditions are satisfied. In particular, our analysis reveals an intriguing connection between fast mixing and the existence of smooth measure-preserving mappings from a convex space to the non-convex space. For planning, we show advantages of Hit-and- Run compared to state-of-the-art planning methods such as Rapidly-Exploring Random Trees.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/abbasi-yadkori17a.html
  PDF: http://proceedings.mlr.press/v54/abbasi-yadkori17a/abbasi-yadkori17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-abbasi-yadkori17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yasin
    family: Abbasi-Yadkori
  - given: Peter
    family: Bartlett
  - given: Victor
    family: Gabillon
  - given: Alan
    family: Malek
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 888-895
  id: abbasi-yadkori17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 888
  lastpage: 895
  published: 2017-04-10 00:00:00 +0000
- title: 'DP-EM: Differentially Private Expectation Maximization'
  abstract: 'The iterative nature of the expectation maximization (EM) algorithm presents a challenge for privacy-preserving estimation, as each iteration increases the amount of noise needed. We propose a practical private EM algorithm that overcomes this challenge using two innovations: (1) a novel moment perturbation formulation for differentially private EM (DP-EM), and (2) the use of two recently developed composition methods to bound the privacy “cost” of multiple EM iterations: the moments accountant (MA) and zero-mean concentrated differential privacy (zCDP).  Both MA and zCDP bound the moment generating function of the privacy loss random variable and achieve a refined tail bound, which effectively decrease the amount of additive noise. We present empirical results showing the benefits of our approach, as well as similar performance between these two composition methods in the DP-EM setting for Gaussian mixture models. Our approach can be readily extended to many iterative learning algorithms,  opening up various exciting future directions. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/park17c.html
  PDF: http://proceedings.mlr.press/v54/park17c/park17c.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-park17c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mijung
    family: Park
  - given: James
    family: Foulds
  - given: Kamalika
    family: Choudhary
  - given: Max
    family: Welling
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 896-904
  id: park17c
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 896
  lastpage: 904
  published: 2017-04-10 00:00:00 +0000
- title: 'On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior'
  abstract: 'The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but as shown in this paper, the results can be sensitive to the prior choice for the global shrinkage hyperparameter. We argue that the previous default choices are dubious due to their tendency to favor solutions with more unshrunk coefficients than we typically expect a priori. This can lead to bad results if this parameter is not strongly identified by data. We derive the relationship between the global parameter and the effective number of nonzeros in the coefficient vector, and show an easy and intuitive way of setting up the prior for the global parameter based on our prior beliefs about the number of nonzero coefficients in the model. The results on real world data show that one can benefit greatly – in terms of improved parameter estimates, prediction accuracy, and reduced computation time – from transforming even a crude guess for the number of nonzero coefficients into the prior for the global parameter using our framework.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/piironen17a.html
  PDF: http://proceedings.mlr.press/v54/piironen17a/piironen17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-piironen17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Juho
    family: Piironen
  - given: Aki
    family: Vehtari
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 905-913
  id: piironen17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 905
  lastpage: 913
  published: 2017-04-10 00:00:00 +0000
- title: 'Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems'
  abstract: 'Many natural systems, such as neurons firing in the brain or basketball teams traversing a court, give rise to time series data with complex, nonlinear dynamics.  We can gain insight into these systems by decomposing the data into segments that are each explained by simpler dynamic units. Building on switching linear dynamical systems (SLDS), we develop a model class and Bayesian inference algorithms that not only discover these dynamical units but also, by learning how transition probabilities depend on observations or continuous latent states, explain their switching behavior.  Our key innovation is to design these recurrent SLDS models to enable recent Pólya-gamma auxiliary variable techniques and thus make approximate Bayesian learning and inference in these models easy, fast, and scalable.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/linderman17a.html
  PDF: http://proceedings.mlr.press/v54/linderman17a/linderman17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-linderman17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Scott
    family: Linderman
  - given: Matthew
    family: Johnson
  - given: Andrew
    family: Miller
  - given: Ryan
    family: Adams
  - given: David
    family: Blei
  - given: Liam
    family: Paninski
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 914-922
  id: linderman17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 914
  lastpage: 922
  published: 2017-04-10 00:00:00 +0000
- title: 'Efficient Algorithm for Sparse Tensor-variate Gaussian Graphical Models via Gradient Descent'
  abstract: 'We study the sparse tensor-variate Gaussian graphical model (STGGM), where each way of the tensor follows a multivariate normal distribution whose precision matrix has sparse structures. In order to estimate the precision matrices, we propose a sparsity constrained maximum likelihood estimator. However, due to the complex structure of the tensor-variate GGMs, the likelihood based estimator is non-convex, which poses great challenges for both computation and theoretical analysis. In order to address these challenges, we propose an efficient alternating gradient descent algorithm to solve this estimator, and prove that, under certain conditions on the initial estimator, our algorithm is guaranteed to linearly converge to the unknown precision matrices up to the optimal statistical error. Experiments on both synthetic data and real world brain imaging data corroborate our theory.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/xu17b.html
  PDF: http://proceedings.mlr.press/v54/xu17b/xu17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-xu17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Pan
    family: Xu
  - given: Tingting
    family: Zhang
  - given: Quanquan
    family: Gu
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 923-932
  id: xu17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 923
  lastpage: 932
  published: 2017-04-10 00:00:00 +0000
- title: 'Minimax-optimal semi-supervised regression on unknown manifolds'
  abstract: 'We consider semi-supervised regression when the predictor variables are drawn from an unknown manifold. A simple two step approach to this problem is to: (i) estimate the manifold geodesic distance between any pair of points using both the labeled and unlabeled instances; and (ii) apply a k nearest neighbor regressor based on these distance estimates. We prove that given sufficiently many unlabeled points, this simple method of geodesic kNN regression achieves the optimal finite-sample minimax bound on the mean squared error, as if the manifold were known. Furthermore, we show how this approach can be efficiently implemented, requiring only O(k N log N) operations to estimate the regression function at all N labeled and unlabeled points. We illustrate this approach on two datasets with a manifold structure: indoor localization using WiFi fingerprints and facial pose estimation. In both cases, geodesic kNN is more accurate and much faster than the popular Laplacian eigenvector regressor.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/moscovich17a.html
  PDF: http://proceedings.mlr.press/v54/moscovich17a/moscovich17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-moscovich17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Amit
    family: Moscovich
  - given: Ariel
    family: Jaffe
  - given: Nadler
    family: Boaz
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 933-942
  id: moscovich17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 933
  lastpage: 942
  published: 2017-04-10 00:00:00 +0000
- title: 'Improved Strongly Adaptive Online Learning using Coin Betting'
  abstract: 'This paper describes a new parameter-free online learning algorithm for changing environments.  In comparing against algorithms with the same time complexity as ours, we obtain a strongly adaptive regret bound that is a factor of at least $\sqrt\log(T)$ better, where $T$ is the time horizon.  Empirical results show that our algorithm outperforms state-of-the-art methods in learning with expert advice and metric learning scenarios.   '
  volume: 54
  URL: https://proceedings.mlr.press/v54/jun17a.html
  PDF: http://proceedings.mlr.press/v54/jun17a/jun17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-jun17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kwang-Sung
    family: Jun
  - given: Francesco
    family: Orabona
  - given: Stephen
    family: Wright
  - given: Rebecca
    family: Willett
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 943-951
  id: jun17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 943
  lastpage: 951
  published: 2017-04-10 00:00:00 +0000
- title: 'Black-box Importance Sampling'
  abstract: 'Importance sampling is widely used in machine learning and statistics, but its power is limited by the restriction of using simple proposals for which the importance weights can be tractably calculated. We address this problem by studying black-box importance sampling methods that calculate importance weights for samples generated from any unknown proposal or black-box mechanism. Our method allows us to use better and richer proposals to solve difficult problems, and (somewhat counter-intuitively) also has the additional benefit of improving the estimation accuracy beyond typical importance sampling. Both theoretical and empirical analyses are provided. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/liu17b.html
  PDF: http://proceedings.mlr.press/v54/liu17b/liu17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-liu17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Qiang
    family: Liu
  - given: Jason
    family: Lee
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 952-961
  id: liu17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 952
  lastpage: 961
  published: 2017-04-10 00:00:00 +0000
- title: 'Fairness Constraints: Mechanisms for Fair Classification'
  abstract: 'Algorithmic decision making systems are ubiquitous across a wide variety of online as well as offline services. These systems rely on complex learning methods and vast amounts of data to optimize the service functionality, satisfaction of the end user and profitability. However, there is a growing concern that these automated decisions can lead, even in the absence of intent, to a lack of fairness, i.e., their outcomes can disproportionately hurt (or, benefit) particular groups of people sharing one or more sensitive attributes (e.g., race, sex). In this paper, we introduce a flexible mechanism to design fair classifiers by leveraging a novel intuitive measure of decision boundary (un)fairness. We instantiate this mechanism with two well-known classifiers, logistic regression and support vector machines, and show on real-world data that our mechanism allows for a fine-grained control on the degree of fairness, often at a small cost in terms of accuracy.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/zafar17a.html
  PDF: http://proceedings.mlr.press/v54/zafar17a/zafar17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-zafar17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Muhammad Bilal
    family: Zafar
  - given: Isabel
    family: Valera
  - given: Manuel Gomez
    family: Rogriguez
  - given: Krishna P.
    family: Gummadi
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 962-970
  id: zafar17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 962
  lastpage: 970
  published: 2017-04-10 00:00:00 +0000
- title: 'Frequency Domain Predictive Modelling with Aggregated Data'
  abstract: 'Existing work in spatio-temporal data analysis invariably assumes data available as individual measurements with localised estimates. However, for many applications like econometrics, financial forecasting and climate science, data is often obtained as aggregates. Data aggregation presents severe mathematical challenges to learning and inference, and application of standard techniques is susceptible to ecological fallacy. In this manuscript we investigate the problem of predictive linear modelling in the scenario where data is aggregated in a non-uniform manner across targets and features. We introduce a novel formulation of the problem in the frequency domain, and develop algorithmic techniques that exploit the duality properties of Fourier analysis to bypass the inherent structural challenges of this setting. We provide theoretical guarantees for generalisation error for our estimation procedure and extend our analysis to capture approximation effects arising from aliasing. Finally, we perform empirical evaluation to demonstrate the efficacy of our algorithmic aproach in predictive modelling on synthetic data, and on three real datasets from agricultural studies, ecological surveys and climate science. approximation effects arising from aliasing. Finally, we perform empirical evaluation to demonstrate the efficacy of our algorithmic aproach in predictive modelling on synthetic data, and on three real datasets from agricultural studies, ecological surveys and climate science.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/bhowmik17a.html
  PDF: http://proceedings.mlr.press/v54/bhowmik17a/bhowmik17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-bhowmik17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Avradeep
    family: Bhowmik
  - given: Joydeep
    family: Ghosh
  - given: Oluwasanmi
    family: Koyejo
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 971-980
  id: bhowmik17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 971
  lastpage: 980
  published: 2017-04-10 00:00:00 +0000
- title: 'A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation'
  abstract: 'We propose a unified framework for estimating low-rank matrices through nonconvex optimization based on gradient descent algorithm. Our framework is quite general and can be applied to both noisy and noiseless observations. In the general case with noisy observations, we show that our algorithm is guaranteed to linearly converge to the unknown low-rank matrix up to a minimax optimal statistical error, provided an appropriate initial estimator. While in the generic noiseless setting, our algorithm converges to the unknown low-rank matrix at a linear rate and enables exact recovery with optimal sample complexity. In addition, we develop a new initialization algorithm to provide the desired initial estimator, which outperforms existing initialization algorithms for nonconvex low-rank matrix estimation. We illustrate the superiority of our framework through three examples: matrix regression, matrix completion, and one-bit matrix completion. We also corroborate our theory through extensive experiments on synthetic data.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/wang17b.html
  PDF: http://proceedings.mlr.press/v54/wang17b/wang17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-wang17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lingxiao
    family: Wang
  - given: Xiao
    family: Zhang
  - given: Quanquan
    family: Gu
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 981-990
  id: wang17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 981
  lastpage: 990
  published: 2017-04-10 00:00:00 +0000
- title: 'A New Class of Private Chi-Square Hypothesis Tests'
  abstract: 'In this paper, we develop new test statistics for hypothesis testing over differentially private data. These statistics are designed specifically so that their asymptotic distributions, after accounting for privacy noise, match the asymptotics of the non-private chi-square tests for testing if the multinomial data parameters lie in lower dimensional manifolds (examples include goodness-of-fit and independence testing). Empirically, these new test statistics outperform prior work, which focused on noisy versions of existing statistics.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/rogers17a.html
  PDF: http://proceedings.mlr.press/v54/rogers17a/rogers17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-rogers17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ryan
    family: Rogers
  - given: Daniel
    family: Kifer
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 991-1000
  id: rogers17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 991
  lastpage: 1000
  published: 2017-04-10 00:00:00 +0000
- title: 'A Learning Theory of Ranking Aggregation'
  abstract: 'Originally formulated in Social Choice theory, Ranking Aggregation, also referred to as Consensus Ranking, has motivated the development of numerous statistical models since the middle of the 20th century. Recently, the analysis of ranking/preference data has been the subject of a renewed interest in machine-learning, boosted by modern applications such as meta-search engines, giving rise to the design of various scalable algorithmic approaches for approximately computing ranking medians, viewed as solutions of a discrete (generally NP-hard) minimization problem. This paper develops a statistical learning theory for ranking aggregation in a general probabilistic setting (avoiding any rigid ranking model assumptions), assessing the generalization ability of empirical ranking medians. Universal rate bounds are established and the situations where convergence occurs at an exponential rate are fully characterized. Minimax lower bounds are also proved, showing that the rate bounds we obtain are optimal.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/korba17a.html
  PDF: http://proceedings.mlr.press/v54/korba17a/korba17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-korba17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Anna
    family: Korba
  - given: Stéphan
    family: Clemencon
  - given: Eric
    family: Sibony
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1001-1010
  id: korba17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1001
  lastpage: 1010
  published: 2017-04-10 00:00:00 +0000
- title: 'Anomaly Detection in Extreme Regions via Empirical MV-sets on the Sphere'
  abstract: 'Extreme regions in the feature space are of particular concern for anomaly detection: anomalies are likely to be located in the tails, whereas data scarcity in such regions makes it difficult to distinguish between large normal instances and anomalies. This paper presents an unsupervised algorithm for anomaly detection in extreme regions. We propose a Minimum Volume set (MV-set) approach relying on multivariate extreme value theory. This framework includes a canonical pre-processing step, which addresses the issue of output sensitivity to standardization choices. The resulting data representation on the sphere highlights the dependence structure of the extremal observations. Anomaly detection is then cast as a MV-set estimation problem on the sphere, where volume is measured by the spherical measure and mass refers to the angular measure. An anomaly then corresponds to an unusual observation given that one of its variables is large. A preliminary rate bound analysis is carried out for the learning method we introduce and its computational advantages are discussed and illustrated by numerical experiments.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/thomas17a.html
  PDF: http://proceedings.mlr.press/v54/thomas17a/thomas17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-thomas17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Albert
    family: Thomas
  - given: Stéphan
    family: Clemencon
  - given: Alexandre
    family: Gramfort
  - given: Anne
    family: Sabourin
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1011-1019
  id: thomas17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1011
  lastpage: 1019
  published: 2017-04-10 00:00:00 +0000
- title: 'Structured adaptive and random spinners for fast machine learning computations'
  abstract: 'We consider an efficient computational framework for speeding up several machine learning algorithms with almost no loss of accuracy. The proposed framework relies on projections via structured matrices that we call Structured Spinners, which are formed as products of three structured matrix-blocks that incorporate rotations. The approach is highly generic, i.e. i) structured matrices under consideration can either be fully-randomized or learned, ii) our structured family contains as special cases all previously considered structured schemes, iii) the setting extends to the non-linear case where the projections are followed by non-linear functions, and iv) the method finds numerous applications including kernel approximations via random feature maps, dimensionality reduction algorithms,new fast cross-polytope LSH techniques, deep learning, convex optimization algorithms via Newton sketches, quantization with random projection trees, and more. The proposed framework comes with theoretical guarantees characterizing the capacity of the structured model in reference to its unstructured counterpart and is based on a general theoretical principle that we describe in the paper. As a consequence of our theoretical analysis, we provide the first theoretical guarantees for one of the most efficient existing LSH algorithms based on the HD 3 HD 2 HD 1 structured matrix [Andoni et al., 2015]. The exhaustive experimental evaluation confirms the accuracy and efficiency of structured spinners for a variety of different applications.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/bojarski17a.html
  PDF: http://proceedings.mlr.press/v54/bojarski17a/bojarski17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-bojarski17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Mariusz
    family: Bojarski
  - given: Anna
    family: Choromanska
  - given: Krzysztof
    family: Choromanski
  - given: Francois
    family: Fagan
  - given: Cedric
    family: Gouy-Pailler
  - given: Anne
    family: Morvan
  - given: Nouri
    family: Sakr
  - given: Tamas
    family: Sarlos
  - given: Jamal
    family: Atif
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1020-1029
  id: bojarski17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1020
  lastpage: 1029
  published: 2017-04-10 00:00:00 +0000
- title: 'Complementary Sum Sampling for Likelihood Approximation in Large Scale Classification'
  abstract: 'We consider training probabilistic classifiers in the case that the number of classes is too large to perform exact normalisation over all classes.  We show that the source of high variance in standard sampling approximations is due to simply not including the correct class of the datapoint into the approximation. To account for this we explicitly sum over a subset of classes and sample the remaining. We show that this simple approach is competitive with recently introduced non likelihood-based approximations. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/botev17a.html
  PDF: http://proceedings.mlr.press/v54/botev17a/botev17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-botev17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Aleksandar
    family: Botev
  - given: Bowen
    family: Zheng
  - given: David
    family: Barber
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1030-1038
  id: botev17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1030
  lastpage: 1038
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning Optimal Interventions'
  abstract: 'Our goal is to identify beneficial interventions from observational data.  We consider interventions that are narrowly focused (impacting few covariates) and may be tailored to each individual or globally enacted over a population.  For applications where harmful intervention is drastically worse than proposing no change, we propose a conservative definition of the optimal intervention.  Assuming the underlying relationship remains invariant under intervention, we develop  efficient algorithms to identify the optimal intervention policy from limited data and provide theoretical guarantees for our approach in a Gaussian Process setting.  Although our methods assume covariates can be precisely adjusted, they remain capable of improving outcomes in misspecified settings with unintentional downstream effects.  Empirically, our approach identifies good interventions in two practical applications: gene perturbation and writing improvement. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/mueller17a.html
  PDF: http://proceedings.mlr.press/v54/mueller17a/mueller17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-mueller17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jonas
    family: Mueller
  - given: David
    family: Reshef
  - given: George
    family: Du
  - given: Tommi
    family: Jaakkola
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1039-1047
  id: mueller17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1039
  lastpage: 1047
  published: 2017-04-10 00:00:00 +0000
- title: 'A Lower Bound on the Partition Function of Attractive Graphical Models in the Continuous Case'
  abstract: 'Computing the partition function of an arbitrary graphical model is generally intractable. As a result, approximate inference techniques such as loopy belief propagation and expectation propagation are used to compute an approximation to the true partition function.  However, due to general issues of intractability in the continuous case, our understanding of these approximations is relatively limited.  In particular, a number of theoretical results known for these approximations in the discrete case are missing in the continuous case.  In this work, we use graph covers to extend several such results from the discrete case to the continuous case.  Specifically, we provide a graph cover based upper bound for continuous graphical models, and we use this characterization (along with a continuous analog of a discrete correlation-type inequality) to show that the Bethe partition function also provides a lower bound on the true partition function of attractive graphical models in the continuous case.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/ruozzi17a.html
  PDF: http://proceedings.mlr.press/v54/ruozzi17a/ruozzi17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-ruozzi17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nicholas
    family: Ruozzi
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1048-1056
  id: ruozzi17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1048
  lastpage: 1056
  published: 2017-04-10 00:00:00 +0000
- title: 'Scalable Variational Inference for Super Resolution Microscopy'
  abstract: 'Super-resolution microscopy methods have become essential tools in biology, opening up a variety of new questions that were previously inaccessible with standard light microscopy methods. In this paper we develop new Bayesian image processing methods that extend the reach of super-resolution microscopy even further. Our method couples variational inference techniques with a data summarization based on Laplace approximation to ensure computational scalability. Our formulation makes it straightforward to incorporate prior information about the underlying sample to further improve accuracy. The proposed method obtains dramatic resolution improvements over previous methods while retaining computational tractability.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/sun17a.html
  PDF: http://proceedings.mlr.press/v54/sun17a/sun17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-sun17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ruoxi
    family: Sun
  - given: Evan
    family: Archer
  - given: Liam
    family: Paninski
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1057-1065
  id: sun17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1057
  lastpage: 1065
  published: 2017-04-10 00:00:00 +0000
- title: 'Linear Convergence of Stochastic Frank Wolfe Variants'
  abstract: 'In this paper, we show that the Away-step Stochastic Frank-Wolfe (ASFW) and Pairwise Stochastic Frank-Wolfe (PSFW) algorithms converge linearly in expectation. We also show that if an algorithm convergences linearly in expectation then it converges linearly almost surely. In order to prove these results, we develop a novel proof technique based on concepts of empirical processes and concentration inequalities. As far as we know, this technique has not been used previously to derive the convergence rates of stochastic optimization algorithms. In large- scale numerical experiments, ASFW and PSFW perform as well as or better than their stochastic competitors in actual CPU time.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/goldfarb17a.html
  PDF: http://proceedings.mlr.press/v54/goldfarb17a/goldfarb17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-goldfarb17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Donald
    family: Goldfarb
  - given: Garud
    family: Iyengar
  - given: Chaoxu
    family: Zhou
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1066-1074
  id: goldfarb17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1066
  lastpage: 1074
  published: 2017-04-10 00:00:00 +0000
- title: 'Sequential Graph Matching with Sequential Monte Carlo'
  abstract: 'We develop a novel probabilistic model for graph matchings and develop practical inference methods for supervised and unsupervised learning of the parameters of this model. The framework we develop admits joint inference on the parameters and the matching. Furthermore, our framework generalizes naturally to $K$-partite hypergraph matchings or set packing problems. The sequential formulation of the graph matching process naturally leads to sequential Monte Carlo algorithms which can be combined with various parameter inference methods. We apply our method to image matching problems, document ranking, and our own novel quadripartite matching problem arising from the field of computational forestry.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/jun17b.html
  PDF: http://proceedings.mlr.press/v54/jun17b/jun17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-jun17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Seong-Hwan
    family: Jun
  - given: Samuel W.K.
    family: Wong
  - given: James
    family: Zidek
  - given: Alexandre
    family: Bouchard-Cote
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1075-1084
  id: jun17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1075
  lastpage: 1084
  published: 2017-04-10 00:00:00 +0000
- title: 'Fast rates with high probability in exp-concave statistical learning'
  abstract: 'We present an algorithm for the statistical learning setting with a bounded exp-concave loss in d dimensions that obtains excess risk $O(d \log(1/δ)/n)$ with probability $1 - δ$. The core technique is to boost the confidence of recent in-expectation O(d/n) excess risk bounds for empirical risk minimization (ERM), without sacrificing the rate, by leveraging a Bernstein condition which holds due to exp-concavity.  We also show that a regret bound for any online learner in this setting translates to a high probability excess risk bound for the corresponding online-to-batch conversion of the online learner.  Lastly, we present high probability bounds for the exp-concave model selection aggregation problem that are quantile-adaptive in a certain sense. One bound obtains a nearly optimal rate without requiring the loss to be Lipschitz continuous, and another requires Lipschitz continuity but obtains the optimal rate.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/mehta17a.html
  PDF: http://proceedings.mlr.press/v54/mehta17a/mehta17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-mehta17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Nishant
    family: Mehta
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1085-1093
  id: mehta17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1085
  lastpage: 1093
  published: 2017-04-10 00:00:00 +0000
- title: 'Generalization Error of Invariant Classifiers'
  abstract: 'This paper studies the generalization error of invariant classifiers. In particular, we consider the common scenario where  the classification task is invariant to certain transformations of the input, and that the classifier is constructed (or learned) to be invariant to these transformations. Our approach relies on factoring the input space into a product of a base space and a set of transformations. We show that whereas the generalization error of a non-invariant classifier is proportional to the complexity of the input space, the generalization error of an invariant classifier is proportional to the complexity of the base space. We also derive a set of sufficient conditions on the geometry of the base space and the set of transformations that ensure that the complexity of the base space is much smaller than the complexity of the input space. Our analysis applies to general classifiers such as convolutional neural networks. We demonstrate the implications of the developed theory for such classifiers with experiments on the MNIST and CIFAR-10 datasets.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/sokolic17a.html
  PDF: http://proceedings.mlr.press/v54/sokolic17a/sokolic17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-sokolic17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jure
    family: Sokolic
  - given: Raja
    family: Giryes
  - given: Guillermo
    family: Sapiro
  - given: Miguel
    family: Rodrigues
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1094-1103
  id: sokolic17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1094
  lastpage: 1103
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning with Feature Feedback: from Theory to Practice'
  abstract: 'In supervised learning, a human annotator only needs to assign each data point (document, image, etc.) its correct label. But in many situations, the human can also provide richer feedback at essentially no extra cost. In this paper, we examine a particular type of feature feedback that has been used, with some success, in information retrieval and in computer vision. We formalize two models of feature feedback, give learning algorithms for them, and quantify their usefulness in the learning process. Our experiments also show the efficacy of these methods.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/poulis17a.html
  PDF: http://proceedings.mlr.press/v54/poulis17a/poulis17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-poulis17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Stefanos
    family: Poulis
  - given: Sanjoy
    family: Dasgupta
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1104-1113
  id: poulis17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1104
  lastpage: 1113
  published: 2017-04-10 00:00:00 +0000
- title: 'Optimistic Planning for the Stochastic Knapsack Problem'
  abstract: 'The stochastic knapsack problem is a stochastic resource allocation problem that arises frequently and yet is exceptionally hard to solve. We derive and study an optimistic planning algorithm specifically designed for the stochastic knapsack problem. Unlike other optimistic planning algorithms for MDPs, our algorithm, OpStoK, avoids the use of discounting and is adaptive to the amount of resources available. We achieve this behavior by means of a concentration inequality that simultaneously applies to capacity and reward estimates. Crucially, we are able to guarantee that the aforementioned confidence regions hold collectively over all time steps by an application of Doob’s inequality. We demonstrate that the method returns an $ε$-optimal solution to the stochastic knapsack problem with high probability. To the best of our knowledge, our algorithm is the first which provides such guarantees for the stochastic knapsack problem. Furthermore, our algorithm is an anytime algorithm and will return a good solution even if stopped prematurely. This is particularly important given the difficulty of the problem. We also provide theoretical conditions to guarantee OpStoK does not expand all policies and demonstrate favorable performance in a simple experimental setting.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/pike-burke17a.html
  PDF: http://proceedings.mlr.press/v54/pike-burke17a/pike-burke17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-pike-burke17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ciara
    family: Pike-Burke
  - given: Steffen
    family: Grunewalder
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1114-1122
  id: pike-burke17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1114
  lastpage: 1122
  published: 2017-04-10 00:00:00 +0000
- title: 'Identifying Groups of Strongly Correlated Variables through Smoothed Ordered Weighted $L_1$-norms'
  abstract: 'The failure of LASSO   to identify groups of correlated predictors in linear  regression  has sparked significant research interest. Recently,   various norms were proposed, which can be  best described as instances of ordered weighted $\ell_1$ norms (OWL), as an alternative to $\ell_1$ regularization used in LASSO. OWL can identify groups of correlated variables but  it forces the model to be constant within a group.  This artifact induces unnecessary bias in the model estimation.  In this paper we take a submodular perspective and show that  OWL can be posed as the Lovász extension of a suitably defined submodular function. The submodular perspective not only explains the group-wise constant behavior of OWL, but also suggests alternatives. The main contribution of this paper is smoothed OWL (SOWL), a new family of norms, which  not only identifies the groups but also allows the model to be flexible inside a group.   We establish several algorithmic and theoretical properties of SOWL including group identification and model consistency. We also provide algorithmic tools to compute the SOWL norm and its proximal operator, whose computational complexity  $O(d\log d)$ is significantly better than that of general purpose solvers in $O(d^2\log d)$. In our experiments, SOWL compares favorably with respect to OWL in the regimes of interest.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/sankaran17a.html
  PDF: http://proceedings.mlr.press/v54/sankaran17a/sankaran17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-sankaran17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Raman
    family: Sankaran
  - given: Francis
    family: Bach
  - given: Chiranjib
    family: Bhattacharya
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1123-1131
  id: sankaran17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1123
  lastpage: 1131
  published: 2017-04-10 00:00:00 +0000
- title: 'Tracking Objects with Higher Order Interactions via Delayed Column Generation'
  abstract: 'We study the problem of multi-target tracking and data association in video. We formulate this in terms of selecting a subset of high-quality tracks subject to the constraint that no pair of selected tracks is associated with a common detection (of an object). This objective is equivalent to the classic NP-hard problem of finding a maximum-weight set packing (MWSP) where tracks correspond to sets and is made further difficult since the number of candidate tracks grows exponentially in the number of detections.  We present a relaxation of this combinatorial problem that uses a column generation formulation where the pricing problem is solved via dynamic programming to efficiently explore the space of tracks.  We employ row generation to tighten the bound in such a way as to preserve efficient inference in the pricing problem.  We show the practical utility of this algorithm for pedestrian and particle tracking.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/wang17c.html
  PDF: http://proceedings.mlr.press/v54/wang17c/wang17c.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-wang17c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shaofei
    family: Wang
  - given: Steffen
    family: Wolf
  - given: Charless
    family: Fowlkes
  - given: Julian
    family: Yarkony
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1132-1140
  id: wang17c
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1132
  lastpage: 1140
  published: 2017-04-10 00:00:00 +0000
- title: 'Belief Propagation in Conditional RBMs for Structured Prediction'
  abstract: 'Restricted Boltzmann machines (RBMs) and conditional RBMs (CRBMs) are popular models for a wide range of applications.  In previous work, learning on such models has been dominated by contrastive divergence (CD) and its variants.  Belief propagation (BP) algorithms are believed to be slow for structured prediction on conditional RBMs (e.g., Mnih et al. [2011]), and not as good as CD when applied in learning (e.g., Larochelle et al. [2012]).  In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units.  We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems.  We also include practical guidelines on training CRBMs with BP, and some insights on the interaction of learning and inference algorithms for CRBMs.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/ping17a.html
  PDF: http://proceedings.mlr.press/v54/ping17a/ping17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-ping17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Wei
    family: Ping
  - given: Alex
    family: Ihler
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1141-1149
  id: ping17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1141
  lastpage: 1149
  published: 2017-04-10 00:00:00 +0000
- title: 'Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data'
  abstract: 'Sketching techniques scale up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, without sacrificing their statistical properties. In this paper, we study sketching from an optimization point of view. We first show that the iterative Hessian sketch is an optimization process with \emphpreconditioning and develop an \emphaccelerated version using this insight together with conjugate gradient descent. Next, we establish a primal-dual connection between the Hessian sketch and dual random projection, which allows us to develop an \emphaccelerated iterative dual random projection method by applying the preconditioned conjugate gradient descent on the dual problem. Finally, we tackle the problems of large sample size and high-dimensionality in massive data sets by developing the \emphprimal-dual sketch.  The primal-dual sketch iteratively sketches the primal and dual formulations and requires only a logarithmic number of calls to solvers of small sub-problems to recover the optimum of the original problem up to \empharbitrary precision. Extensive experiments on synthetic and real data sets complement our theoretical results.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/wang17d.html
  PDF: http://proceedings.mlr.press/v54/wang17d/wang17d.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-wang17d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jialei
    family: Wang
  - given: Jason
    family: Lee
  - given: Mehrdad
    family: Mahdavi
  - given: Mladen
    family: Kolar
  - given: Nati
    family: Srebro
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1150-1158
  id: wang17d
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1150
  lastpage: 1158
  published: 2017-04-10 00:00:00 +0000
- title: 'Finite-sum Composition Optimization via Variance Reduced Gradient Descent'
  abstract: 'The stochastic composition optimization proposed recently by Wang et al. [2014] minimizes the objective with the composite expectation form: $\min_x (\mathbbE_iF_i ∘\mathbbE_j G_j)(x).$ It summarizes many important applications in machine learning, statistics, and finance. In this paper, we consider the finite-sum scenario for composition optimization: $\min_x f (x) := \frac1n \sum_i = 1^n F_i \left(   \frac1m \sum_j = 1^m G_j (x) \right)$. In this paper, two algorithms are proposed to solve this problem by combining the stochastic compositional gradient descent (SCGD) and the stochastic variance reduced gradient (SVRG) technique. A constant linear convergence rate is proved for strongly convex optimization, which substantially improves the sublinear rate $O(K^-0.8)$ of the best known algorithm. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/lian17a.html
  PDF: http://proceedings.mlr.press/v54/lian17a/lian17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-lian17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xiangru
    family: Lian
  - given: Mengdi
    family: Wang
  - given: Ji
    family: Liu
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1159-1167
  id: lian17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1159
  lastpage: 1167
  published: 2017-04-10 00:00:00 +0000
- title: 'A Fast and Scalable Joint Estimator for Learning Multiple Related Sparse Gaussian Graphical Models'
  abstract: 'Estimating multiple sparse Gaussian Graphical Models (sGGMs) jointly for many related tasks (large $K$) under a high-dimensional (large $p$) situation is an important task. Most previous studies for the joint estimation of multiple sGGMs rely on penalized log-likelihood estimators that involve expensive and difficult non-smooth optimizations. We propose a novel approach, FASJEM for \underlinefast and \underlinescalable \underlinejoint structure-\underlineestimation of \underlinemultiple sGGMs at a large scale. As the first study of joint sGGM using the M-estimator framework, our work has three major contributions: (1) We solve FASJEM through an entry-wise manner which is parallelizable. (2) We choose a proximal algorithm to optimize FASJEM. This improves the computational efficiency from $O(Kp^3)$ to $O(Kp^2)$ and reduces the memory requirement from $O(Kp^2)$ to $O(K)$. (3) We theoretically prove that FASJEM achieves a consistent estimation with a convergence rate of $O(\log(Kp)/n_tot)$. On several synthetic and four real-world datasets, FASJEM shows significant improvements over baselines on accuracy, computational complexity and memory costs.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/wang17e.html
  PDF: http://proceedings.mlr.press/v54/wang17e/wang17e.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-wang17e.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Beilun
    family: Wang
  - given: Ji
    family: Gao
  - given: Yanjun
    family: Qi
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1168-1177
  id: wang17e
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1168
  lastpage: 1177
  published: 2017-04-10 00:00:00 +0000
- title: 'Communication-efficient Distributed Sparse Linear Discriminant Analysis'
  abstract: 'We propose a communication-efficient distributed estimation method for sparse linear discriminant analysis (LDA) in the high dimensional regime. Our method distributes the data of size N into m machines, and estimates a local sparse LDA estimator on each machine using the data subset of size N/m. After the distributed estimation, our method aggregates the debiased local estimators from m machines, and sparsifies the aggregated estimator. We show that the aggregated estimator attains the same statistical rate as the centralized estimation method, as long as the number of machines m is chosen appropriately. Moreover, we prove that our method can attain the model selection consistency under a milder condition than the centralized method. Experiments on both synthetic and real datasets corroborate our theory.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/tian17a.html
  PDF: http://proceedings.mlr.press/v54/tian17a/tian17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-tian17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Lu
    family: Tian
  - given: Quanquan
    family: Gu
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1178-1187
  id: tian17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1178
  lastpage: 1187
  published: 2017-04-10 00:00:00 +0000
- title: 'Sketchy Decisions: Convex Low-Rank Matrix Optimization with Optimal Storage'
  abstract: 'This paper concerns a fundamental class of convex matrix optimization problems. It presents the first algorithm that uses optimal storage and provably computes a low-rank approximation of a solution. In particular, when all solutions have low rank, the algorithm converges to a solution. This algorithm, SketchyCGM, modifies a standard convex optimization scheme, the conditional gradient method, to store only a small randomized sketch of the matrix variable. After the optimization terminates, the algorithm extracts a low-rank approximation of the solution from the sketch. In contrast to nonconvex heuristics, the guarantees for SketchyCGM do not rely on statistical models for the problem data. Numerical work demonstrates the benefits of SketchyCGM over heuristics.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/yurtsever17a.html
  PDF: http://proceedings.mlr.press/v54/yurtsever17a/yurtsever17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-yurtsever17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alp
    family: Yurtsever
  - given: Madeleine
    family: Udell
  - given: Joel
    family: Tropp
  - given: Volkan
    family: Cevher
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1188-1196
  id: yurtsever17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1188
  lastpage: 1196
  published: 2017-04-10 00:00:00 +0000
- title: 'Modal-set estimation with an application to clustering'
  abstract: 'We present a procedure that can estimate – with statistical consistency guarantees – any local-maxima of a density, under benign distributional conditions. The procedure estimates all such local maxima, or modal-sets, of any bounded shape or dimension, including usual point-modes. In practice, modal-sets can arise as dense low-dimensional structures in noisy data, and more generally serve to better model the rich variety of locally dense structures in data.   The procedure is then shown to be competitive on clustering applications, and moreover is quite stable to a wide range of settings of its tuning parameter. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/jiang17c.html
  PDF: http://proceedings.mlr.press/v54/jiang17c/jiang17c.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-jiang17c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Heinrich
    family: Jiang
  - given: Samory
    family: Kpotufe
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1197-1206
  id: jiang17c
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1197
  lastpage: 1206
  published: 2017-04-10 00:00:00 +0000
- title: 'Compressed Least Squares Regression revisited'
  abstract: 'We revisit compressed least squares (CLS) regression as originally analyzed in Maillard and Munos (2009) and later on in Kaban (2014) with some refinements. Given a set of high-dimensional inputs, CLS applies a random projection and then performs least squares regression based on the projected inputs of lower dimension. This approach can be beneficial with regard to both computation (yielding a smaller least squares problem) and statistical performance (reducing the estimation error). We will argue below that the outcome of previous analysis of the procedure is not meaningful in typical situations, yielding a bound on the prediction error that is inferior to ordinary least squares while requiring the dimension of the projected data to be of the same order as the original dimension. As a fix, we subsequently present a modified analysis with meaningful implications that much better reflects empirical results with simulated and real data.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/slawski17a.html
  PDF: http://proceedings.mlr.press/v54/slawski17a/slawski17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-slawski17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Martin
    family: Slawski
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1207-1215
  id: slawski17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1207
  lastpage: 1215
  published: 2017-04-10 00:00:00 +0000
- title: 'Diverse Neural Network Learns True Target Functions'
  abstract: 'Neural networks are a powerful class of functions that can be trained with simple gradient descent to achieve state-of-the-art performance on a variety of applications. Despite their practical success, there is a paucity of results that provide theoretical guarantees on why they are so effective.  Lying in the center of the problem is the difficulty of analyzing the non-convex loss function with potentially numerous local minima and saddle points. Can neural networks corresponding to the stationary points of the loss function learn the true target function? If yes, what are the key factors contributing to such nice optimization properties?   In this paper, we answer these questions by analyzing one-hidden-layer neural networks with ReLU activation, and show that despite the non-convexity, neural networks with diverse units have no spurious local minima. We bypass the non-convexity issue by directly analyzing the first order optimality condition, and show that the loss can be made arbitrarily small if the minimum singular value of the “extended feature matrix” is large enough. We make novel use of techniques from kernel methods and geometric discrepancy, and identify a new relation linking the smallest singular value to the spectrum of a kernel function associated with the activation function and to the diversity of the units. Our results also suggest a novel regularization function to promote unit diversity for potentially better generalization ability.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/xie17a.html
  PDF: http://proceedings.mlr.press/v54/xie17a/xie17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-xie17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Bo
    family: Xie
  - given: Yingyu
    family: Liang
  - given: Le
    family: Song
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1216-1224
  id: xie17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1216
  lastpage: 1224
  published: 2017-04-10 00:00:00 +0000
- title: 'Local Group Invariant Representations via Orbit Embeddings'
  abstract: 'Invariance to nuisance transformations is one of the desirable properties of effective representations. We consider transformations that form a group and propose an approach based on kernel methods to derive local group invariant representations. Locality is achieved by defining a suitable probability distribution over the group which in turn induces distributions in the input feature space. We learn a decision function over these distributions by appealing to the powerful framework of kernel methods and generate local invariant random feature maps via kernel approximations. We show uniform convergence bounds for kernel approximation and provide generalization bounds for learning with these features. We evaluate our method on three real datasets, including Rotated MNIST and CIFAR-10, and observe that it outperforms competing kernel based approaches. The proposed method also outperforms deep CNN on Rotated MNIST and performs comparably to the recently proposed group-equivariant CNN.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/raj17a.html
  PDF: http://proceedings.mlr.press/v54/raj17a/raj17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-raj17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Anant
    family: Raj
  - given: Abhishek
    family: Kumar
  - given: Youssef
    family: Mroueh
  - given: Tom
    family: Fletcher
  - given: Bernhard
    family: Schoelkopf
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1225-1235
  id: raj17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1225
  lastpage: 1235
  published: 2017-04-10 00:00:00 +0000
- title: 'Relativistic Monte Carlo '
  abstract: 'Hamiltonian Monte Carlo (HMC) is a popular Markov chain Monte Carlo (MCMC) algorithm that generates proposals for a Metropolis-Hastings algorithm by simulating the dynamics of a Hamiltonian system. However, HMC is sensitive to large time discretizations and performs poorly if there is a mismatch between the spatial geometry of the target distribution and the scales of the momentum distribution. In particular the mass matrix of HMC is hard to tune well. In order to alleviate these problems we propose relativistic Hamiltonian Monte Carlo, a version of HMC based on relativistic dynamics that introduces a maximum velocity on particles. We also derive stochastic gradient versions of the algorithm and show that the resulting algorithms bear interesting relationships to gradient clipping, RMSprop, Adagrad and Adam, popular optimisation methods in deep learning. Based on this, we develop relativistic stochastic gradient descent by taking the zero-temperature limit of relativistic stochastic gradient Hamiltonian Monte Carlo. In experiments we show that the relativistic algorithms perform better than classical Newtonian variants and Adam.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/lu17b.html
  PDF: http://proceedings.mlr.press/v54/lu17b/lu17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-lu17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xiaoyu
    family: Lu
  - given: Valerio
    family: Perrone
  - given: Leonard
    family: Hasenclever
  - given: Yee Whye
    family: Teh
  - given: Sebastian
    family: Vollmer
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1236-1245
  id: lu17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1236
  lastpage: 1245
  published: 2017-04-10 00:00:00 +0000
- title: 'Thompson Sampling for Linear-Quadratic Control Problems'
  abstract: 'We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the empirical and theoretical success in a wide range of problems from multi-armed bandit to linear bandit, we show that when studying the frequentist regret TS in control problems, we need to trade-off the frequency of sampling optimistic parameters and the frequency of switches in the control policy. This results in an overall regret of $O(T^2/3)$, which is significantly worse than the regret $O(\sqrtT)$ achieved by the optimism-in-face-of-uncertainty algorithm in LQ control problems.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/abeille17b.html
  PDF: http://proceedings.mlr.press/v54/abeille17b/abeille17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-abeille17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Marc
    family: Abeille
  - given: Alessandro
    family: Lazaric
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1246-1254
  id: abeille17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1246
  lastpage: 1254
  published: 2017-04-10 00:00:00 +0000
- title: 'Fast Classification with Binary Prototypes'
  abstract: 'In this work, we propose a new technique for \emphfast k-nearest neighbor (k-NN) classification in which the original database is represented via a small set of learned binary prototypes. The training phase simultaneously learns a hash function which maps the data points to binary codes, and a set of representative binary prototypes. In the prediction phase, we first hash the query into a binary code and then do the k-NN classification using the binary prototypes as the database. Our approach speeds up k-NN classification in two aspects. First, we compress the database into a smaller set of prototypes such that k-NN search only goes through a smaller set rather than the whole dataset. Second, we reduce the original space to a compact binary embedding, where the Hamming distance between two binary codes is very efficient to compute. We propose a formulation to learn the hash function and prototypes such that the classification error is minimized. We also provide a novel theoretical analysis of the proposed technique in terms of Bayes error consistency. Empirically, our method is much faster than the state-of-the-art k-NN compression methods with comparable accuracy.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/zhong17a.html
  PDF: http://proceedings.mlr.press/v54/zhong17a/zhong17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-zhong17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Kai
    family: Zhong
  - given: Ruiqi
    family: Guo
  - given: Sanjiv
    family: Kumar
  - given: Bowei
    family: Yan
  - given: David
    family: Simcha
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1255-1263
  id: zhong17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1255
  lastpage: 1263
  published: 2017-04-10 00:00:00 +0000
- title: 'Prediction Performance After Learning in Gaussian Process Regression'
  abstract: 'This paper considers the quantification of the prediction performance in Gaussian process regression. The standard approach is to base the prediction error bars on the theoretical predictive variance, which is a lower bound on the mean square-error (MSE). This approach, however, does not take into account that the statistical model is learned from the data. We show that this omission leads to a systematic underestimation of the prediction errors. Starting from a generalization of the Cramér-Rao bound, we derive a more accurate MSE bound which provides a measure of uncertainty for prediction of Gaussian processes. The improved bound is easily computed and we illustrate it using synthetic and real data examples.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/wagberg17a.html
  PDF: http://proceedings.mlr.press/v54/wagberg17a/wagberg17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-wagberg17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Johan
    family: Wagberg
  - given: Dave
    family: Zachariah
  - given: Thomas
    family: Schon
  - given: Petre
    family: Stoica
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1264-1272
  id: wagberg17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1264
  lastpage: 1272
  published: 2017-04-10 00:00:00 +0000
- title: 'Communication-Efficient Learning of Deep Networks from Decentralized Data'
  abstract: 'Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches.  We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning.  We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/mcmahan17a.html
  PDF: http://proceedings.mlr.press/v54/mcmahan17a/mcmahan17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-mcmahan17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Brendan
    family: McMahan
  - given: Eider
    family: Moore
  - given: Daniel
    family: Ramage
  - given: Seth
    family: Hampson
  - given: Blaise Aguera y
    family: Arcas
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1273-1282
  id: mcmahan17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1273
  lastpage: 1282
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning Structured Weight Uncertainty in Bayesian Neural Networks'
  abstract: 'Deep neural networks (DNNs) are increasingly popular in modern machine learning. Bayesian learning affords the opportunity to quantify posterior uncertainty on DNN model parameters. Most existing work adopts independent Gaussian priors on the model weights, ignoring possible structural information. In this paper, we consider the matrix variate Gaussian (MVG) distribution to model structured correlations within the weights of a DNN. To make posterior inference feasible, a reparametrization is proposed for the MVG prior, simplifying the complex MVG-based model to an equivalent yet simpler model with independent Gaussian priors on the transformed weights. Consequently, we develop a scalable Bayesian online inference algorithm by adopting the recently proposed probabilistic backpropagation framework. Experiments on several synthetic and real datasets indicate the superiority of our model, achieving competitive performance in terms of model likelihood and predictive root mean square error. Importantly, it also yields faster convergence speed compared to related Bayesian DNN models.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/sun17b.html
  PDF: http://proceedings.mlr.press/v54/sun17b/sun17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-sun17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Shengyang
    family: Sun
  - given: Changyou
    family: Chen
  - given: Lawrence
    family: Carin
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1283-1292
  id: sun17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1283
  lastpage: 1292
  published: 2017-04-10 00:00:00 +0000
- title: 'Signal-based Bayesian Seismic Monitoring'
  abstract: 'Detecting weak seismic events from noisy sensors is a difficult perceptual task. We formulate this task as Bayesian inference and propose a generative model of seismic events and signals across a network of spatially distributed stations. Our system, SIGVISA, is the first to directly model seismic waveforms, allowing it to incorporate a rich representation of the physics underlying the signal generation process. We use Gaussian processes over wavelet parameters to predict detailed waveform fluctuations based on historical events, while degrading smoothly to simple parametric envelopes in regions with no historical seismicity. Evaluating on data from the western US, we recover three times as many events as previous work, and reduce mean location errors by a factor of four while greatly increasing sensitivity to low-magnitude events.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/moore17a.html
  PDF: http://proceedings.mlr.press/v54/moore17a/moore17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-moore17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: David
    family: Moore
  - given: Stuart
    family: Russell
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1293-1301
  id: moore17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1293
  lastpage: 1301
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning the Network Structure of Heterogeneous Data via Pairwise Exponential Markov Random Fields'
  abstract: 'Markov random fields (MRFs) are a useful tool for modeling relationships present in large and high-dimensional data. Often, this data comes from various sources and can have diverse distributions, for example a combination of numerical, binary, and categorical variables. Here, we define the pairwise exponential Markov random field (PE-MRF), an approach capable of modeling exponential family distributions in heterogeneous domains. We develop a scalable method of learning the graphical structure across the variables by solving a regularized approximated maximum likelihood problem. Specifically, we first derive a tractable upper bound on the log-partition function. We then use this upper bound to derive the group graphical lasso, a generalization of the classic graphical lasso problem to heterogeneous domains. To solve this problem, we develop a fast algorithm based on the alternating direction method of multipliers (ADMM). We also prove that our estimator is sparsistent, with guaranteed recovery of the true underlying graphical structure, and that it has a polynomially faster runtime than the current state-of-the-art method for learning such distributions. Experiments on synthetic and real-world examples demonstrate that our approach is both efficient and accurate at uncovering the structure of heterogeneous data.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/park17d.html
  PDF: http://proceedings.mlr.press/v54/park17d/park17d.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-park17d.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Youngsuk
    family: Park
  - given: David
    family: Hallac
  - given: Stephen
    family: Boyd
  - given: Jure
    family: Leskovec
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1302-1310
  id: park17d
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1302
  lastpage: 1310
  published: 2017-04-10 00:00:00 +0000
- title: 'Discovering and Exploiting Additive Structure for Bayesian Optimization'
  abstract: 'Bayesian optimization has proven invaluable for black-box optimization of expensive functions. Its main limitation is its exponential complexity with respect to the dimensionality of the search space using typical kernels. Luckily, many objective functions can be decomposed into additive subproblems, which can be optimized independently. We investigate how to automatically discover such (typically unknown) additive structure while simultaneously exploiting it through Bayesian optimization. We propose an efficient algorithm based on Metropolis-Hastings sampling and demonstrate its efficacy empirically on synthetic and real-world data sets. Throughout all our experiments we reliably discover hidden additive structure whenever it exists and exploit it to yield significantly faster convergence.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/gardner17a.html
  PDF: http://proceedings.mlr.press/v54/gardner17a/gardner17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-gardner17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jacob
    family: Gardner
  - given: Chuan
    family: Guo
  - given: Kilian
    family: Weinberger
  - given: Roman
    family: Garnett
  - given: Roger
    family: Grosse
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1311-1319
  id: gardner17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1311
  lastpage: 1319
  published: 2017-04-10 00:00:00 +0000
- title: 'Lipschitz Density-Ratios, Structured Data, and Data-driven Tuning'
  abstract: 'Density-ratio estimation (i.e. estimating $f = f_Q/f_P$ for two unknown distributions Q and P) has proved useful in many Machine Learning tasks, e.g., risk-calibration in transfer-learning, two-sample tests, and also useful in common techniques such importance sampling and bias correction. While there are many important analyses of this estimation problem, the present paper derives convergence rates in other practical settings that are less understood, namely, extensions of traditional Lipschitz smoothness conditions, and common high-dimensional settings with structured data (e.g. manifold data, sparse data).   Various interesting facts, which hold in earlier settings, are shown to extend to these settings. Namely, (1) optimal rates depend only on the smoothness of the ratio f, and not on the densities $f_Q$, $f_P$, supporting the belief that plugging in estimates for $f_Q$, $f_P$ is suboptimal; (2) optimal rates depend only on the intrinsic dimension of data, i.e. this problem – unlike density estimation – escapes the curse of dimension.   We further show that near-optimal rates are attainable by estimators tuned from data alone, i.e. with no prior distributional information. This last fact is of special interest in unsupervised settings such as this one, where only oracle rates seem to be known, i.e., rates which assume critical distributional information usually unavailable in practice. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/kpotufe17a.html
  PDF: http://proceedings.mlr.press/v54/kpotufe17a/kpotufe17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-kpotufe17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Samory
    family: Kpotufe
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1320-1328
  id: kpotufe17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1320
  lastpage: 1328
  published: 2017-04-10 00:00:00 +0000
- title: 'Spatial Decompositions for Large Scale SVMs'
  abstract: 'Although support vector machines (SVMs) are theoretically well understood, their underlying optimization problem becomes very expensive if, for example, hundreds of thousands of samples and a non-linear kernel are considered. Several approaches have been proposed in the past to address this serious limitation. In this work we investigate a decomposition strategy that learns on small, spatially defined data chunks. Our contributions are two fold: On the theoretical side we establish an oracle inequality for the overall learning method using the hinge loss, and show that the resulting rates match those known for SVMs solving the complete optimization problem with Gaussian kernels. On the practical result we compare our approach to learning SVMs on small, randomly chosen chunks. Here it turns out that for comparable training times our approach is significantly faster during testing and also reduces the test error in most cases significantly. Furthermore, we show that our approach easily scales up to 10 million training samples: including hyper-parameter selection using cross validation, the entire training only takes a few hours on a single machine. Finally, we report an experiment on 32 million training samples. All experiments used liquidSVM (Steinwart and Thomann, 2017)'
  volume: 54
  URL: https://proceedings.mlr.press/v54/thomann17a.html
  PDF: http://proceedings.mlr.press/v54/thomann17a/thomann17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-thomann17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Philipp
    family: Thomann
  - given: Ingrid
    family: Blaschzyk
  - given: Mona
    family: Meister
  - given: Ingo
    family: Steinwart
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1329-1337
  id: thomann17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1329
  lastpage: 1337
  published: 2017-04-10 00:00:00 +0000
- title: 'Inference Compilation and Universal Probabilistic Programming'
  abstract: 'We introduce a method for using  deep neural networks to amortize the cost of inference in models from the family induced by universal probabilistic programming languages, establishing a framework that combines the strengths of probabilistic programming and deep learning methods.  We call what we do “compilation of inference” because our method transforms a denotational specification of an inference problem in the form of a probabilistic program written in a universal programming language into a trained neural network denoted in a neural network specification language.  When at test time this neural network is fed observational data and executed, it performs approximate inference in the original model specified by the probabilistic program.  Our training objective and learning procedure are designed to allow the trained neural network to be used as a proposal distribution in a sequential importance sampling inference engine.  We illustrate our method on mixture models and Captcha solving and show significant speedups in the efficiency of inference.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/le17a.html
  PDF: http://proceedings.mlr.press/v54/le17a/le17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-le17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Tuan Anh
    family: Le
  - given: Atilim Gunes
    family: Baydin
  - given: Frank
    family: Wood
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1338-1348
  id: le17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1338
  lastpage: 1348
  published: 2017-04-10 00:00:00 +0000
- title: 'Active Positive Semidefinite Matrix Completion: Algorithms, Theory and Applications'
  abstract: 'In this paper we provide simple, computationally efficient, active algorithms for completion of symmetric positive semidefinite matrices. Our proposed algorithms are based on adaptive Nyström sampling, and are allowed to actively query any element in the matrix, and obtain a possibly noisy estimate of the queried element. We establish sample complexity guarantees on the recovery of the matrix in the max-norm and in the process establish new theoretical results, potentially of independent interest, on adaptive Nyström sampling. We demonstrate the efficacy of our algorithms on problems in multi-armed bandits and kernel dimensionality reduction.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/bhargava17a.html
  PDF: http://proceedings.mlr.press/v54/bhargava17a/bhargava17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-bhargava17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Aniruddha
    family: Bhargava
  - given: Ravi
    family: Ganti
  - given: Rob
    family: Nowak
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1349-1357
  id: bhargava17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1349
  lastpage: 1357
  published: 2017-04-10 00:00:00 +0000
- title: 'Information Projection and Approximate Inference for Structured Sparse Variables'
  abstract: 'Approximate inference via information projection has been recently introduced as a general-purpose technique  for efficient probabilistic inference given sparse variables. This manuscript goes beyond classical sparsity by proposing efficient algorithms for approximate inference via information projection that are applicable to any structure on the set of variables that admits enumeration using matroid or knapsack constraints.  Further, leveraging recent advances in submodular optimization, we provide an efficient greedy algorithm with strong optimization-theoretic guarantees. The class of probabilistic models that can be expressed in this way is quite broad and, as we show, includes group sparse regression, group sparse principal components analysis and sparse collective matrix factorization, among others. Empirical results on simulated data and high dimensional neuroimaging data  highlight the superior performance of the information projection approach as compared to established baselines for a range of probabilistic models.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/khanna17a.html
  PDF: http://proceedings.mlr.press/v54/khanna17a/khanna17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-khanna17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rajiv
    family: Khanna
  - given: Joydeep
    family: Ghosh
  - given: Rusell
    family: Poldrack
  - given: Oluwasanmi
    family: Koyejo
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1358-1366
  id: khanna17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1358
  lastpage: 1366
  published: 2017-04-10 00:00:00 +0000
- title: 'On the Interpretability of Conditional Probability Estimates in the Agnostic Setting'
  abstract: 'We study the interpretability of conditional probability estimates for binary classification under the agnostic setting or scenario. Under the agnostic setting, conditional probability estimates do not necessarily reflect the true conditional probabilities. Instead, they have a certain calibration property: among all data points that the classifier has predicted $P(Y = 1|X) = p$, $p$ portion of them actually have label $Y = 1$. For cost-sensitive decision problems, this calibration property provides adequate support for us to use Bayes Decision Theory. In this paper, we define a novel measure for the calibration property together with its empirical counterpart, and prove an uniform convergence result between them. This new measure enables us to formally justify the calibration property of conditional probability estimations, and provides new insights on the problem of estimating and calibrating conditional probabilities. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/gao17a.html
  PDF: http://proceedings.mlr.press/v54/gao17a/gao17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-gao17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yihan
    family: Gao
  - given: Aditya
    family: Parameswaran
  - given: Jian
    family: Peng
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1367-1374
  id: gao17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1367
  lastpage: 1374
  published: 2017-04-10 00:00:00 +0000
- title: 'Linking Micro Event History to Macro Prediction in Point Process Models'
  abstract: 'User behaviors in social networks are microscopic with fine grained temporal information. Predicting a macroscopic quantity based on users’ collective behaviors is an important problem. However, existing works are mainly problem-specific models for the microscopic behaviors and typically design approximation or heuristic algorithms to compute the macroscopic quantity. In this paper, we propose a unifying framework with a jump stochastic differential equation model that systematically links the microscopic event data and macroscopic inference, and the theory to approximate its probability distribution. We showed that our method can better predict the user behaviors in real-world applications.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/wang17f.html
  PDF: http://proceedings.mlr.press/v54/wang17f/wang17f.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-wang17f.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Yichen
    family: Wang
  - given: Xiaojing
    family: Ye
  - given: Haomin
    family: Zhou
  - given: Hongyuan
    family: Zha
  - given: Le
    family: Song
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1375-1384
  id: wang17f
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1375
  lastpage: 1384
  published: 2017-04-10 00:00:00 +0000
- title: 'Initialization and Coordinate Optimization for Multi-way Matching'
  abstract: 'We consider the problem of consistently matching multiple sets of elements to each other, which is a common task in fields such as computer vision. To solve the underlying NP-hard objective, existing methods often relax or approximate it, but end up with unsatisfying empirical performance due to a misaligned objective. We propose a coordinate update algorithm that directly optimizes the target objective. By using pairwise alignment information to build an undirected graph and initializing the permutation matrices along the edges of its Maximum Spanning Tree, our algorithm successfully avoids bad local optima. Theoretically, with high probability our algorithm guarantees an optimal solution under reasonable noise assumptions. Empirically, our algorithm consistently and significantly outperforms existing methods on several benchmark tasks on real datasets.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/tang17a.html
  PDF: http://proceedings.mlr.press/v54/tang17a/tang17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-tang17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Da
    family: Tang
  - given: Tony
    family: Jebara
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1385-1393
  id: tang17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1385
  lastpage: 1393
  published: 2017-04-10 00:00:00 +0000
- title: 'Optimal Recovery of Tensor Slices'
  abstract: 'We consider the problem of large scale matrix recovery given side information in the form of additional matrices of conforming dimension. This is a parsimonious model that captures a number of interesting problems including context and location aware recommendations, personalized ‘tag’ learning, demand learning with side information, etc. Viewing the matrix we seek to recover and the side information we have as slices of a tensor, we consider the problem of Slice Recovery, which is to recover specific slices of a tensor from noisy observations of the tensor. We provide an efficient algorithm to recover slices of structurally ’simple’ tensors given noisy observations of the tensor’s entries; our definition of simplicity subsumes low-rank tensors for a variety of definitions of tensor rank. Our algorithm is practical for large datasets and provides a significant performance improvement over state of the art incumbent approaches to tensor recovery. We establish theoretical recovery guarantees that under reasonable assumptions are minimax optimal for slice recovery. These guarantees also imply the first minimax optimal guarantees for recovering tensors of low Tucker rank and general noise. Experiments on data from a music streaming service demonstrate the performance and scalability of our algorithm.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/farias17a.html
  PDF: http://proceedings.mlr.press/v54/farias17a/farias17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-farias17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Vivek
    family: Farias
  - given: Andrew
    family: Li
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1394-1402
  id: farias17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1394
  lastpage: 1402
  published: 2017-04-10 00:00:00 +0000
- title: 'Efficient Online Multiclass Prediction on Graphs via Surrogate Losses'
  abstract: '    We develop computationally efficient algorithms for online multi-class prediction. Our construction is based on carefully-chosen data-dependent surrogate loss functions, and the new methods enjoy strong mistake bound guarantees.           To illustrate the technique, we study the combinatorial problem of node classification and develop a prediction strategy that is linear-time per round. In contrast, the offline benchmark is NP-hard to compute in general. We demonstrate the empirical performance of the method on several datasets.  '
  volume: 54
  URL: https://proceedings.mlr.press/v54/rakhlin17a.html
  PDF: http://proceedings.mlr.press/v54/rakhlin17a/rakhlin17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-rakhlin17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexander
    family: Rakhlin
  - given: Karthik
    family: Sridharan
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1403-1411
  id: rakhlin17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1403
  lastpage: 1411
  published: 2017-04-10 00:00:00 +0000
- title: 'Distribution of Gaussian Process Arc Lengths'
  abstract: '   We present the first treatment of the arc length of the GP with more than a single output dimension. GPs are commonly used for tasks such as trajectory modelling, where path length is a crucial quantity of interest. Previously, only paths in one dimension have been considered, with no theoretical consideration of higher dimensional problems.  We fill the gap in the existing literature by deriving the moments of the arc length for a stationary GP with multiple output dimensions. A new method is used to derive the mean of a one-dimensional GP over a finite interval, by considering the distribution of the arc length integrand. This technique is used to derive an approximate distribution over the arc length of a vector valued GP in $\mathbbR^n$ by moment matching the distribution. Numerical simulations confirm our theoretical derivations.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/bewsher17a.html
  PDF: http://proceedings.mlr.press/v54/bewsher17a/bewsher17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-bewsher17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Justin
    family: Bewsher
  - given: Alessandra
    family: Tosi
  - given: Michael
    family: Osborne
  - given: Stephen
    family: Roberts
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1412-1420
  id: bewsher17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1412
  lastpage: 1420
  published: 2017-04-10 00:00:00 +0000
- title: 'Distributed Adaptive Sampling for Kernel Matrix Approximation'
  abstract: 'Most kernel-based methods, such as kernel regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $K_n$ requires at least $O(n^2)$ time and space for $n$ samples. Recent works (Alaoui 2014, Musco 2016) show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of relevant points with strong spectral approximation guarantees for $K_n$. The drawback of RLS-based methods is that computing exact RLS requires constructing and storing the whole kernel matrix. In this paper, we introduce SQUEAK, a new algorithm for kernel approximation based on RLS sampling that \emphsequentially processes the dataset, storing a dictionary which creates accurate kernel matrix approximations with a number of points that only depends on the effective dimension $d_eff(γ)$ of the dataset. Moreover since all the RLS estimations are efficiently performed using only the small dictionary, SQUEAK never constructs the whole matrix $K_n$, runs in linear time $\widetildeO(nd_eff(γ)^3)$ w.r.t. $n$, and requires only a single pass over the dataset. We also propose a parallel and distributed version of SQUEAK achieving similar accuracy in as little as $\widetildeO(\log(n)d_eff(γ)^3)$ time.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/calandriello17a.html
  PDF: http://proceedings.mlr.press/v54/calandriello17a/calandriello17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-calandriello17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Daniele
    family: Calandriello
  - given: Alessandro
    family: Lazaric
  - given: Michal
    family: Valko
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1421-1429
  id: calandriello17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1421
  lastpage: 1429
  published: 2017-04-10 00:00:00 +0000
- title: 'Binary and Multi-Bit Coding for Stable Random Projections'
  abstract: 'The recent work [17] developed a 1-bit compressed sensing (CS) algorithm based on $α$-stable random projections. Although the work in [17] showed that the method is a strong competitor of other existing 1-bit algorithms, the procedure requires knowing $K$, the sparsity. Note that $K$ is the $l_0$ norm of the signal. Other existing  1-bit CS algorithms require  the $l_2$ norm of the signal.  In this paper, we develop an estimation procedure for the $l_α$ norm of the signal, where $0<α\leq2$ from binary or multi-bit measurements. We demonstrate that using a  simple closed-form estimator with merely 1-bit information does not result in a significant loss of accuracy if the parameter is chosen appropriately.  Theoretical tail bounds are  also provided. Using 2 or more bits  per measurement reduces the variance and  importantly, stabilizes the estimate so that  the  variance is not too sensitive to chosen parameters.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/li17c.html
  PDF: http://proceedings.mlr.press/v54/li17c/li17c.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-li17c.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ping
    family: Li
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1430-1438
  id: li17c
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1430
  lastpage: 1438
  published: 2017-04-10 00:00:00 +0000
- title: 'Spectral Methods for Correlated Topic Models'
  abstract: 'In this paper we propose guaranteed spectral methods for learning a broad range of topic models, which generalize the popular Latent Dirichlet Allocation (LDA). We overcome the limitation of LDA to incorporate arbitrary topic correlations, by assuming that the hidden topic proportions are drawn from a flexible class of Normalized Infinitely Divisible (NID) distributions. NID distributions are generated by normalizing a family of independent Infinitely Divisible (ID) random variables. The Dirichlet distribution is a special case obtained by normalizing a set of Gamma random variables. We prove that this flexible topic model class can be learnt via spectral methods using only moments up to the third order, with (low order) polynomial sample and computational complexity. The proof is based on a key new technique derived here that allows us to diagonalize the moments of the NID distribution through an efficient procedure that requires evaluating only univariate integrals, despite the fact that we are handling high dimensional multivariate moments. In order to assess the performance of our proposed Latent NID topic model, we use two real datasets of articles collected from New York Times and Pubmed. Our experiments yield improved perplexity on both datasets compared with the baseline.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/arabshahi17a.html
  PDF: http://proceedings.mlr.press/v54/arabshahi17a/arabshahi17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-arabshahi17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Forough
    family: Arabshahi
  - given: Anima
    family: Anandkumar
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1439-1447
  id: arabshahi17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1439
  lastpage: 1447
  published: 2017-04-10 00:00:00 +0000
- title: 'Label Filters for Large Scale Multilabel Classification'
  abstract: 'When assigning labels to a test instance, most multilabel and multiclass classifiers systematically evaluate every single label to decide whether it is relevant or not. This linear scan over labels becomes prohibitive when the number of labels is very large. To alleviate this problem we propose a two step approach where computationally efficient label filters pre-select a small set of candidate labels before the base multiclass or multilabel classifier is applied. The label filters select candidate labels by projecting a test instance on a filtering line, and retaining only the labels that have training instances in the vicinity of this projection.  The filter parameters are learned directly from data by solving a constraint optimization problem, and are independent of the base multilabel classifier. The proposed label filters can be used in conjunction with any multiclass or multilabel classifier that requires a linear scan over the labels, and speed up prediction by orders of magnitude without significant impact on performance.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/niculescu-mizil17a.html
  PDF: http://proceedings.mlr.press/v54/niculescu-mizil17a/niculescu-mizil17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-niculescu-mizil17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alexandru
    family: Niculescu-Mizil
  - given: Ehsan
    family: Abbasnejad
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1448-1457
  id: niculescu-mizil17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1448
  lastpage: 1457
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning from Conditional Distributions via Dual Embeddings'
  abstract: 'Many machine learning tasks, such as learning with invariance and policy evaluation in reinforcement learning, can be characterized as problems of learning from conditional distributions. In such problems, each sample x itself is associated with a conditional distribution $p(z|x)$ represented by samples $z_i_i=1^M$, and the goal is to learn a function f that links these conditional distributions to target values y. These problems become very challenging when we only have limited samples or in the extreme case only one sample from each conditional distribution. Commonly used approaches either assume that z is independent of x, or require an overwhelmingly large set of samples from each conditional distribution.   To address these challenges, we propose a novel approach which employs a new min-max reformulation of the learning from conditional distribution problem.  With such new reformulation, we only need to deal with the joint distribution p(z,x). We also design an efficient learning algorithm, Embedding-SGD, and establish theoretical sample complexity for such problems. Finally, our numerical experiments, on both synthetic and real-world datasets, show that the proposed approach can significantly improve over existing algorithms.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/dai17a.html
  PDF: http://proceedings.mlr.press/v54/dai17a/dai17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-dai17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Bo
    family: Dai
  - given: Niao
    family: He
  - given: Yunpeng
    family: Pan
  - given: Byron
    family: Boots
  - given: Le
    family: Song
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1458-1467
  id: dai17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1458
  lastpage: 1467
  published: 2017-04-10 00:00:00 +0000
- title: 'Sequential Multiple Hypothesis Testing with Type I Error Control'
  abstract: 'This work studies multiple hypothesis testing in the setting when we obtain data sequentially and may choose when to stop sampling. We summarize the notion of a sequential p-value (one that can be continually updated and still maintain a type I error guarantee) and provide several examples from the literature. This tool allows us to convert step-up or step-down multiple hypothesis testing procedures in the fixed-horizon setting (which includes Benjamini-Hochberg, Holm, and Bonferroni)  into sequential versions that allow the statistician to reject a hypothesis as soon as the sequential p-value reaches a threshold. We show that if the original procedure has a type I error guarantee in a certain family (including FDR and FWER), then the sequential conversion inherits an analogous guarantee. The conversion also allows for allocating samples in a data-dependent way, and we provide simulated experiments demonstrating an increased number of rejections when compared to the fixed-horizon setting.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/malek17a.html
  PDF: http://proceedings.mlr.press/v54/malek17a/malek17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-malek17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Alan
    family: Malek
  - given: Sumeet
    family: Katariya
  - given: Yinlam
    family: Chow
  - given: Mohammad
    family: Ghavamzadeh
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1468-1476
  id: malek17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1468
  lastpage: 1476
  published: 2017-04-10 00:00:00 +0000
- title: 'A Maximum Matching Algorithm for Basis Selection in Spectral Learning'
  abstract: 'We present a solution to scale spectral algorithms for learning sequence functions.  We are interested in the case where these functions are sparse (that is, for most sequences they return 0). Spectral algorithms reduce the learning problem to the task of computing an SVD decomposition over a special type of matrix called the Hankel matrix. This matrix is designed to capture the relevant statistics of the training sequences. What is crucial is that to capture long range dependencies we must consider very large Hankel matrices. Thus the computation of the SVD becomes a critical bottleneck. Our solution finds a subset of rows and columns of the Hankel that realizes a compact and informative Hankel submatrix. The novelty lies in the way that this subset is selected: we exploit a maximal bipartite matching combinatorial algorithm to look for a sub-block with full structural rank, and show how computation of this sub-block can be further improved by exploiting the specific structure of Hankel matrices.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/quattoni17a.html
  PDF: http://proceedings.mlr.press/v54/quattoni17a/quattoni17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-quattoni17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ariadna
    family: Quattoni
  - given: Xavier
    family: Carreras
  - given: Matthias
    family: Gallé
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1477-1485
  id: quattoni17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1477
  lastpage: 1485
  published: 2017-04-10 00:00:00 +0000
- title: 'Value-Aware Loss Function for Model-based Reinforcement Learning'
  abstract: 'We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, is an overkill because it does not take into account the underlying structure of decision problem and the RL algorithm that intends to solve it. We introduce a loss function that takes the structure of the value function into account. We provide a finite-sample upper bound for the loss function showing the dependence of the error on model approximation error, number of samples, and the complexity of the model space. We also empirically compare the method with the maximum likelihood estimator on a simple problem.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/farahmand17a.html
  PDF: http://proceedings.mlr.press/v54/farahmand17a/farahmand17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-farahmand17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Amir-Massoud
    family: Farahmand
  - given: Andre
    family: Barreto
  - given: Daniel
    family: Nikovski
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1486-1494
  id: farahmand17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1486
  lastpage: 1494
  published: 2017-04-10 00:00:00 +0000
- title: 'Convergence Rate of Stochastic k-means'
  abstract: 'We analyze online (Bottou & Bengio, 1994) and mini-batch (Sculley, 2010) k-means variants. Both scale up the widely used Lloyd’s algorithm via stochastic approximation, and have become popular for large-scale clustering and unsupervised feature learning. We show, for the first time, that they have global convergence towards “local optima” at rate $O(1/t)$ under general conditions. In addition, we show that if the dataset is clusterable, stochastic k-means with suitable initialization converges to an optimal k-means solution at rate $O(1/t)$ with high probability. The k-means objective is non-convex and non-differentiable; we exploit ideas from non-convex gradient-based optimization by providing a novel characterization of the trajectory of the k-means algorithm on its solution space, and circumvent its non-differentiability via geometric insights about the k-means update.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/tang17b.html
  PDF: http://proceedings.mlr.press/v54/tang17b/tang17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-tang17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Cheng
    family: Tang
  - given: Claire
    family: Monteleoni
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1495-1503
  id: tang17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1495
  lastpage: 1503
  published: 2017-04-10 00:00:00 +0000
- title: 'Automated Inference with Adaptive Batches'
  abstract: 'Classical stochastic gradient methods for optimization rely on noisy gradient approximations that become progressively less accurate as iterates approach a solution. The large noise and small signal in the resulting gradients makes it difficult to use them for adaptive stepsize selection and automatic stopping. We propose alternative “big batch” SGD schemes that adaptively grow the batch size over time to maintain a nearly constant signal-to-noise ratio in the gradient approximation. The resulting methods have similar convergence rates to classical SGD, and do not require convexity of the objective. The high fidelity gradients enable automated learning rate selection and do not require stepsize decay. Big batch methods are thus easily automated and can run with little or no oversight.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/de17a.html
  PDF: http://proceedings.mlr.press/v54/de17a/de17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-de17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Soham
    family: De
  - given: Abhay
    family: Yadav
  - given: David
    family: Jacobs
  - given: Tom
    family: Goldstein
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1504-1513
  id: de17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1504
  lastpage: 1513
  published: 2017-04-10 00:00:00 +0000
- title: 'Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition'
  abstract: 'Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm, which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an accelerated dual decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/zhang17b.html
  PDF: http://proceedings.mlr.press/v54/zhang17b/zhang17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-zhang17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Jiong
    family: Zhang
  - given: Ian En-Hsu
    family: Yen
  - given: Pradeep
    family: Ravikumar
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1514-1522
  id: zhang17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1514
  lastpage: 1522
  published: 2017-04-10 00:00:00 +0000
- title: 'Robust Causal Estimation in the Large-Sample Limit without Strict Faithfulness'
  abstract: 'Causal effect estimation from observational data is an important and much studied research topic. The instrumental variable (IV) and local causal discovery (LCD) patterns are canonical examples of settings where a closed-form expression exists for the causal effect of one variable on another, given the presence of a third variable. Both rely on faithfulness to infer that the latter only influences the target effect via the cause variable. In reality, it is likely that this assumption only holds approximately and that there will be at least some form of weak interaction. This brings about the paradoxical situation that, in the large-sample limit, no predictions are made, as detecting the weak edge invalidates the setting.  We introduce an alternative approach by replacing strict faithfulness with a prior that reflects the existence of many ’weak’ (irrelevant) and ’strong’ interactions. We obtain a posterior distribution over the target causal effect estimator which shows that, in many cases, we can still make good estimates. We demonstrate the approach in an application on a simple linear-Gaussian setting, using the MultiNest sampling algorithm, and compare it with established techniques to show our method is robust even when strict faithfulness is violated.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/bucur17a.html
  PDF: http://proceedings.mlr.press/v54/bucur17a/bucur17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-bucur17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ioan Gabriel
    family: Bucur
  - given: Tom
    family: Claassen
  - given: Tom
    family: Heskes
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1523-1531
  id: bucur17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1523
  lastpage: 1531
  published: 2017-04-10 00:00:00 +0000
- title: 'Learning Graphical Games from Behavioral Data: Sufficient and Necessary Conditions'
  abstract: 'In this paper we obtain sufficient and necessary conditions on the number of samples required for exact recovery of the pure-strategy Nash equilibria (PSNE) set of a graphical game from noisy observations of joint actions. We consider sparse linear influence games — a parametric class of graphical games with linear payoffs, and represented by directed graphs of n nodes (players) and in-degree of at most k. We show that one can efficiently recover the PSNE set of a linear influence game with $O(k^2 \log n)$ samples, under very general observation models. On the other hand, we show that $Ω(k \log n)$ samples are necessary for any procedure to recover the PSNE set from observations of joint actions.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/ghoshal17b.html
  PDF: http://proceedings.mlr.press/v54/ghoshal17b/ghoshal17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-ghoshal17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Asish
    family: Ghoshal
  - given: Jean
    family: Honorio
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1532-1540
  id: ghoshal17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1532
  lastpage: 1540
  published: 2017-04-10 00:00:00 +0000
- title: 'Non-Count Symmetries in Boolean & Multi-Valued Prob. Graphical Models'
  abstract: 'Lifted inference algorithms commonly exploit symmetries in a probabilistic graphical model (PGM) for efficient inference.  However, existing algorithms for Boolean-valued domains can identify only those pairs of states as symmetric, in which the number of ones and zeros match exactly (count symmetries). Moreover,  algorithms for lifted inference in multi-valued domains also compute a multi-valued extension of count symmetries only. These algorithms miss many symmetries in a domain.  In this paper, we present first algorithms to compute non-count symmetries in both Boolean-valued and multi-valued domains. Our methods can also find symmetries between multi-valued variables that have different domain cardinalities.  The key insight in the algorithms is that they change the unit of symmetry computation from a variable to a variable-value (VV) pair.  Our experiments find that exploiting these symmetries in MCMC  can obtain substantial computational gains over existing algorithms.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/anand17a.html
  PDF: http://proceedings.mlr.press/v54/anand17a/anand17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-anand17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Ankit
    family: Anand
  - given: Ritesh
    family: Noothigattu
  - given: Parag
    family: Singla
  - given: 
    family: Mausam
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1541-1549
  id: anand17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1541
  lastpage: 1549
  published: 2017-04-10 00:00:00 +0000
- title: 'Greedy Direction Method of Multiplier for MAP Inference of Large Output Domain'
  abstract: 'Maximum-a-Posteriori (MAP) inference lies at the heart of Graphical Models and Structured Prediction. Despite the intractability of exact MAP inference, approximated methods based on LP relaxations have exhibited superior performance across a wide range of applications. Yet for problems involving large output domains (i.e., the state space for each variable is large), standard LP relaxations can easily give rise to a large number of variables and constraints which are beyond the limit of existing optimization algorithms. In this paper, we introduce an effective MAP inference method for problems with large output domains. The method builds upon alternating minimization of an Augmented Lagrangian that exploits the sparsity of messages through greedy optimization techniques. A key feature of our greedy approach is to introduce variables in an on-demand manner with a pre-built data structure over local factors. This results in a single-loop algorithm of sublinear cost per iteration and O(log(1/epsilon))-type iteration complexity to achieve epsilon sub-optimality. In addition, we introduce a variant of GDMM for binary MAP inference problems with a large number of factors. Empirically, the proposed algorithms demonstrate orders of magnitude speedup over state-of-the-art MAP inference techniques on MAP inference problems including Segmentation, Alignment, Protein Folding, Graph Matching, and Pairwise-Interacted Multilabel Prediction.'
  volume: 54
  URL: https://proceedings.mlr.press/v54/huang17a.html
  PDF: http://proceedings.mlr.press/v54/huang17a/huang17a.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-huang17a.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Xiangru
    family: Huang
  - given: Ian En-Hsu
    family: Yen
  - given: Ruohan
    family: Zhang
  - given: Qixing
    family: Huang
  - given: Pradeep
    family: Ravikumar
  - given: Inderjit
    family: Dhillon
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1550-1559
  id: huang17a
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1550
  lastpage: 1559
  published: 2017-04-10 00:00:00 +0000
- title: 'Scalable Greedy Feature Selection via Weak Submodularity'
  abstract: 'Greedy algorithms are widely used for problems in machine learning such as feature selection and set function optimization.  Unfortunately, for large datasets, the running time of even greedy algorithms can be quite high.  This is because for each greedy step we need to refit a model or calculate a function using the previously selected choices and the new candidate.   Two algorithms that are faster approximations to the greedy forward selection were introduced recently [Mirzasoleiman et al., 2013, 2015]. They achieve better performance by exploiting stochastic evaluation and distributed computation respectively. Both algorithms have provable performance guarantees for submodular functions.  In this paper we show that divergent from previously held opinion, submodularity is not required to obtain approximation guarantees for these two algorithms. Specifically, we show that a generalized concept of weak submodularity suffices to give multiplicative approximation guarantees. Our result extends the applicability of these algorithms to a larger class of functions. Furthermore, we show that a bounded submodularity ratio can be used to provide data dependent bounds that can sometimes be tighter also for submodular functions. We empirically validate our work by showing superior performance of fast greedy approximations versus several established baselines on artificial and real datasets. '
  volume: 54
  URL: https://proceedings.mlr.press/v54/khanna17b.html
  PDF: http://proceedings.mlr.press/v54/khanna17b/khanna17b.pdf
  edit: https://github.com/mlresearch//v54/edit/gh-pages/_posts/2017-04-10-khanna17b.md
  series: 'Proceedings of Machine Learning Research'
  container-title: 'Proceedings of the 20th International Conference on Artificial Intelligence and Statistics'
  publisher: 'PMLR'
  author: 
  - given: Rajiv
    family: Khanna
  - given: Ethan
    family: Elenberg
  - given: Alex
    family: Dimakis
  - given: Sahand
    family: Negahban
  - given: Joydeep
    family: Ghosh
  editor: 
  - given: Aarti
    family: Singh
  - given: Jerry
    family: Zhu
  page: 1560-1568
  id: khanna17b
  issued:
    date-parts: 
      - 2017
      - 4
      - 10
  firstpage: 1560
  lastpage: 1568
  published: 2017-04-10 00:00:00 +0000