- title: 'Improved Regret Bounds for Thompson Sampling in Linear Quadratic Control Problems'
abstract: 'Thompson sampling (TS) is an effective approach to trade off exploration and exploration in reinforcement learning. Despite its empirical success and recent advances, its theoretical analysis is often limited to the Bayesian setting, finite state-action spaces, or finite-horizon problems. In this paper, we study an instance of TS in the challenging setting of the infinite-horizon linear quadratic (LQ) control, which models problems with continuous state-action variables, linear dynamics, and quadratic cost. In particular, we analyze the regret in the frequentist sense (i.e., for a fixed unknown environment) in one-dimensional systems. We derive the first $O(\sqrt{T})$ frequentist regret bound for this problem, thus significantly improving the $O(T^{2/3})$ bound of Abeille & Lazaric (2017) and matching the frequentist performance derived by Abbasi-Yadkori & Szepesvári (2011) for an optimistic approach and the Bayesian result Ouyang et al. (2017) We obtain this result by developing a novel bound on the regret due to policy switches, which holds for LQ systems of any dimensionality and it allows updating the parameters and the policy at each step, thus overcoming previous limitations due to lazy updates. Finally, we report numerical simulations supporting the conjecture that our result extends to multi-dimensional systems.'
volume: 80
URL: https://proceedings.mlr.press/v80/abeille18a.html
PDF: http://proceedings.mlr.press/v80/abeille18a/abeille18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-abeille18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marc
family: Abeille
- given: Alessandro
family: Lazaric
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1-9
id: abeille18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1
lastpage: 9
published: 2018-07-03 00:00:00 +0000
- title: 'State Abstractions for Lifelong Reinforcement Learning'
abstract: 'In lifelong reinforcement learning, agents must effectively transfer knowledge across tasks while simultaneously addressing exploration, credit assignment, and generalization. State abstraction can help overcome these hurdles by compressing the representation used by an agent, thereby reducing the computational and statistical burdens of learning. To this end, we here develop theory to compute and use state abstractions in lifelong reinforcement learning. We introduce two new classes of abstractions: (1) transitive state abstractions, whose optimal form can be computed efficiently, and (2) PAC state abstractions, which are guaranteed to hold with respect to a distribution of tasks. We show that the joint family of transitive PAC abstractions can be acquired efficiently, preserve near optimal-behavior, and experimentally reduce sample complexity in simple domains, thereby yielding a family of desirable abstractions for use in lifelong reinforcement learning. Along with these positive results, we show that there are pathological cases where state abstractions can negatively impact performance.'
volume: 80
URL: https://proceedings.mlr.press/v80/abel18a.html
PDF: http://proceedings.mlr.press/v80/abel18a/abel18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-abel18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Abel
- given: Dilip
family: Arumugam
- given: Lucas
family: Lehnert
- given: Michael
family: Littman
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 10-19
id: abel18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 10
lastpage: 19
published: 2018-07-03 00:00:00 +0000
- title: 'Policy and Value Transfer in Lifelong Reinforcement Learning'
abstract: 'We consider the problem of how best to use prior experience to bootstrap lifelong learning, where an agent faces a series of task instances drawn from some task distribution. First, we identify the initial policy that optimizes expected performance over the distribution of tasks for increasingly complex classes of policy and task distributions. We empirically demonstrate the relative performance of each policy class’ optimal element in a variety of simple task distributions. We then consider value-function initialization methods that preserve PAC guarantees while simultaneously minimizing the learning required in two learning algorithms, yielding MaxQInit, a practical new method for value-function-based transfer. We show that MaxQInit performs well in simple lifelong RL experiments.'
volume: 80
URL: https://proceedings.mlr.press/v80/abel18b.html
PDF: http://proceedings.mlr.press/v80/abel18b/abel18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-abel18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Abel
- given: Yuu
family: Jinnai
- given: Sophie Yue
family: Guo
- given: George
family: Konidaris
- given: Michael
family: Littman
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 20-29
id: abel18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 20
lastpage: 29
published: 2018-07-03 00:00:00 +0000
- title: 'INSPECTRE: Privately Estimating the Unseen'
abstract: 'We develop differentially private methods for estimating various distributional properties. Given a sample from a discrete distribution p, some functional f, and accuracy and privacy parameters alpha and epsilon, the goal is to estimate f(p) up to accuracy alpha, while maintaining epsilon-differential privacy of the sample. We prove almost-tight bounds on the sample size required for this problem for several functionals of interest, including support size, support coverage, and entropy. We show that the cost of privacy is negligible in a variety of settings, both theoretically and experimentally. Our methods are based on a sensitivity analysis of several state-of-the-art methods for estimating these properties with sublinear sample complexities'
volume: 80
URL: https://proceedings.mlr.press/v80/acharya18a.html
PDF: http://proceedings.mlr.press/v80/acharya18a/acharya18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-acharya18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jayadev
family: Acharya
- given: Gautam
family: Kamath
- given: Ziteng
family: Sun
- given: Huanyu
family: Zhang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 30-39
id: acharya18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 30
lastpage: 39
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Representations and Generative Models for 3D Point Clouds'
abstract: 'Three-dimensional geometric data offer an excellent domain for studying representation learning and generative modeling. In this paper, we look at geometric data represented as point clouds. We introduce a deep AutoEncoder (AE) network with state-of-the-art reconstruction quality and generalization ability. The learned representations outperform existing methods on 3D recognition tasks and enable shape editing via simple algebraic manipulations, such as semantic part editing, shape analogies and shape interpolation, as well as shape completion. We perform a thorough study of different generative models including GANs operating on the raw point clouds, significantly improved GANs trained in the fixed latent space of our AEs, and Gaussian Mixture Models (GMMs). To quantitatively evaluate generative models we introduce measures of sample fidelity and diversity based on matchings between sets of point clouds. Interestingly, our evaluation of generalization, fidelity and diversity reveals that GMMs trained in the latent space of our AEs yield the best results overall.'
volume: 80
URL: https://proceedings.mlr.press/v80/achlioptas18a.html
PDF: http://proceedings.mlr.press/v80/achlioptas18a/achlioptas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-achlioptas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Panos
family: Achlioptas
- given: Olga
family: Diamanti
- given: Ioannis
family: Mitliagkas
- given: Leonidas
family: Guibas
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 40-49
id: achlioptas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 40
lastpage: 49
published: 2018-07-03 00:00:00 +0000
- title: 'Discovering Interpretable Representations for Both Deep Generative and Discriminative Models'
abstract: 'Interpretability of representations in both deep generative and discriminative models is highly desirable. Current methods jointly optimize an objective combining accuracy and interpretability. However, this may reduce accuracy, and is not applicable to already trained models. We propose two interpretability frameworks. First, we provide an interpretable lens for an existing model. We use a generative model which takes as input the representation in an existing (generative or discriminative) model, weakly supervised by limited side information. Applying a flexible and invertible transformation to the input leads to an interpretable representation with no loss in accuracy. We extend the approach using an active learning strategy to choose the most useful side information to obtain, allowing a human to guide what "interpretable" means. Our second framework relies on joint optimization for a representation which is both maximally informative about the side information and maximally compressive about the non-interpretable data factors. This leads to a novel perspective on the relationship between compression and regularization. We also propose a new interpretability evaluation metric based on our framework. Empirically, we achieve state-of-the-art results on three datasets using the two proposed algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/adel18a.html
PDF: http://proceedings.mlr.press/v80/adel18a/adel18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-adel18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tameem
family: Adel
- given: Zoubin
family: Ghahramani
- given: Adrian
family: Weller
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 50-59
id: adel18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 50
lastpage: 59
published: 2018-07-03 00:00:00 +0000
- title: 'A Reductions Approach to Fair Classification'
abstract: 'We present a systematic approach for achieving fairness in a binary classification setting. While we focus on two well-known quantitative definitions of fairness, our approach encompasses many other previously studied definitions as special cases. The key idea is to reduce fair classification to a sequence of cost-sensitive classification problems, whose solutions yield a randomized classifier with the lowest (empirical) error subject to the desired constraints. We introduce two reductions that work for any representation of the cost-sensitive classifier and compare favorably to prior baselines on a variety of data sets, while overcoming several of their disadvantages.'
volume: 80
URL: https://proceedings.mlr.press/v80/agarwal18a.html
PDF: http://proceedings.mlr.press/v80/agarwal18a/agarwal18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-agarwal18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alekh
family: Agarwal
- given: Alina
family: Beygelzimer
- given: Miroslav
family: Dudik
- given: John
family: Langford
- given: Hanna
family: Wallach
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 60-69
id: agarwal18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 60
lastpage: 69
published: 2018-07-03 00:00:00 +0000
- title: 'Accelerated Spectral Ranking'
abstract: 'The problem of rank aggregation from pairwise and multiway comparisons has a wide range of implications, ranging from recommendation systems to sports rankings to social choice. Some of the most popular algorithms for this problem come from the class of spectral ranking algorithms; these include the rank centrality (RC) algorithm for pairwise comparisons, which returns consistent estimates under the Bradley-Terry-Luce (BTL) model for pairwise comparisons (Negahban et al., 2017), and its generalization, the Luce spectral ranking (LSR) algorithm, which returns consistent estimates under the more general multinomial logit (MNL) model for multiway comparisons (Maystre & Grossglauser, 2015). In this paper, we design a provably faster spectral ranking algorithm, which we call accelerated spectral ranking (ASR), that is also consistent under the MNL/BTL models. Our accelerated algorithm is achieved by designing a random walk that has a faster mixing time than the random walks associated with previous algorithms. In addition to a faster algorithm, our results yield improved sample complexity bounds for recovery of the MNL/BTL parameters: to the best of our knowledge, we give the first general sample complexity bounds for recovering the parameters of the MNL model from multiway comparisons under any (connected) comparison graph (and improve significantly over previous bounds for the BTL model for pairwise comparisons). We also give a message-passing interpretation of our algorithm, which suggests a decentralized distributed implementation. Our experiments on several real-world and synthetic datasets confirm that our new ASR algorithm is indeed orders of magnitude faster than existing algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/agarwal18b.html
PDF: http://proceedings.mlr.press/v80/agarwal18b/agarwal18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-agarwal18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arpit
family: Agarwal
- given: Prathamesh
family: Patil
- given: Shivani
family: Agarwal
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 70-79
id: agarwal18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 70
lastpage: 79
published: 2018-07-03 00:00:00 +0000
- title: 'MISSION: Ultra Large-Scale Feature Selection using Count-Sketches'
abstract: 'Feature selection is an important challenge in machine learning. It plays a crucial role in the explainability of machine-driven decisions that are rapidly permeating throughout modern society. Unfortunately, the explosion in the size and dimensionality of real-world datasets poses a severe challenge to standard feature selection algorithms. Today, it is not uncommon for datasets to have billions of dimensions. At such scale, even storing the feature vector is impossible, causing most existing feature selection methods to fail. Workarounds like feature hashing, a standard approach to large-scale machine learning, helps with the computational feasibility, but at the cost of losing the interpretability of features. In this paper, we present MISSION, a novel framework for ultra large-scale feature selection that performs stochastic gradient descent while maintaining an efficient representation of the features in memory using a Count-Sketch data structure. MISSION retains the simplicity of feature hashing without sacrificing the interpretability of the features while using only O(log^2(p)) working memory. We demonstrate that MISSION accurately and efficiently performs feature selection on real-world, large-scale datasets with billions of dimensions.'
volume: 80
URL: https://proceedings.mlr.press/v80/aghazadeh18a.html
PDF: http://proceedings.mlr.press/v80/aghazadeh18a/aghazadeh18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-aghazadeh18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amirali
family: Aghazadeh
- given: Ryan
family: Spring
- given: Daniel
family: Lejeune
- given: Gautam
family: Dasarathy
- given: Anshumali
family: Shrivastava
- given:
family: baraniuk
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 80-88
id: aghazadeh18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 80
lastpage: 88
published: 2018-07-03 00:00:00 +0000
- title: 'Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models'
abstract: 'Learning a Bayesian network (BN) from data can be useful for decision-making or discovering causal relationships. However, traditional methods often fail in modern applications, which exhibit a larger number of observed variables than data points. The resulting uncertainty about the underlying network as well as the desire to incorporate prior information recommend a Bayesian approach to learning the BN, but the highly combinatorial structure of BNs poses a striking challenge for inference. The current state-of-the-art methods such as order MCMC are faster than previous methods but prevent the use of many natural structural priors and still have running time exponential in the maximum indegree of the true directed acyclic graph (DAG) of the BN. We here propose an alternative posterior approximation based on the observation that, if we incorporate empirical conditional independence tests, we can focus on a high-probability DAG associated with each order of the vertices. We show that our method allows the desired flexibility in prior specification, removes timing dependence on the maximum indegree, and yields provably good posterior approximations; in addition, we show that it achieves superior accuracy, scalability, and sampler mixing on several datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/agrawal18a.html
PDF: http://proceedings.mlr.press/v80/agrawal18a/agrawal18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-agrawal18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Raj
family: Agrawal
- given: Caroline
family: Uhler
- given: Tamara
family: Broderick
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 89-98
id: agrawal18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 89
lastpage: 98
published: 2018-07-03 00:00:00 +0000
- title: 'Proportional Allocation: Simple, Distributed, and Diverse Matching with High Entropy'
abstract: 'Inspired by many applications of bipartite matching in online advertising and machine learning, we study a simple and natural iterative proportional allocation algorithm: Maintain a priority score $\priority_a$ for each node $a\in \mathds{A}$ on one side of the bipartition, initialized as $\priority_a=1$. Iteratively allocate the nodes $i\in \impressions$ on the other side to eligible nodes in $\mathds{A}$ in proportion of their priority scores. After each round, for each node $a\in \mathds{A}$, decrease or increase the score $\priority_a$ based on whether it is over- or under- allocated. Our first result is that this simple, distributed algorithm converges to a $(1-\epsilon)$-approximate fractional $b$-matching solution in $O({\log n\over \epsilon^2} )$ rounds. We also extend the proportional allocation algorithm and convergence results to the maximum weighted matching problem, and show that the algorithm can be naturally tuned to produce maximum matching with *high entropy*. High entropy, in turn, implies additional desirable properties of this matching, e.g., it satisfies certain diversity and fairness (aka anonymity) properties that are desirable in a variety of applications in online advertising and machine learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/agrawal18b.html
PDF: http://proceedings.mlr.press/v80/agrawal18b/agrawal18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-agrawal18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shipra
family: Agrawal
- given: Morteza
family: Zadimoghaddam
- given: Vahab
family: Mirrokni
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 99-108
id: agrawal18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 99
lastpage: 108
published: 2018-07-03 00:00:00 +0000
- title: 'Bucket Renormalization for Approximate Inference'
abstract: 'Probabilistic graphical models are a key tool in machine learning applications. Computing the partition function, i.e., normalizing constant, is a fundamental task of statistical inference but is generally computationally intractable, leading to extensive study of approximation methods. Iterative variational methods are a popular and successful family of approaches. However, even state of the art variational methods can return poor results or fail to converge on difficult instances. In this paper, we instead consider computing the partition function via sequential summation over variables. We develop robust approximate algorithms by combining ideas from mini-bucket elimination with tensor network and renormalization group methods from statistical physics. The resulting “convergence-free” methods show good empirical performance on both synthetic and real-world benchmark models, even for difficult instances.'
volume: 80
URL: https://proceedings.mlr.press/v80/ahn18a.html
PDF: http://proceedings.mlr.press/v80/ahn18a/ahn18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ahn18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sungsoo
family: Ahn
- given: Michael
family: Chertkov
- given: Adrian
family: Weller
- given: Jinwoo
family: Shin
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 109-118
id: ahn18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 109
lastpage: 118
published: 2018-07-03 00:00:00 +0000
- title: 'oi-VAE: Output Interpretable VAEs for Nonlinear Group Factor Analysis'
abstract: 'Deep generative models have recently yielded encouraging results in producing subjectively realistic samples of complex data. Far less attention has been paid to making these generative models interpretable. In many scenarios, ranging from scientific applications to finance, the observed variables have a natural grouping. It is often of interest to understand systems of interaction amongst these groups, and latent factor models (LFMs) are an attractive approach. However, traditional LFMs are limited by assuming a linear correlation structure. We present an output interpretable VAE (oi-VAE) for grouped data that models complex, nonlinear latent-to-observed relationships. We combine a structured VAE comprised of group-specific generators with a sparsity-inducing prior. We demonstrate that oi-VAE yields meaningful notions of interpretability in the analysis of motion capture and MEG data. We further show that in these situations, the regularization inherent to oi-VAE can actually lead to improved generalization and learned generative processes.'
volume: 80
URL: https://proceedings.mlr.press/v80/ainsworth18a.html
PDF: http://proceedings.mlr.press/v80/ainsworth18a/ainsworth18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ainsworth18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Samuel K.
family: Ainsworth
- given: Nicholas J.
family: Foti
- given: Adrian K. C.
family: Lee
- given: Emily B.
family: Fox
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 119-128
id: ainsworth18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 119
lastpage: 128
published: 2018-07-03 00:00:00 +0000
- title: 'Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design'
abstract: 'Estimating heterogeneous treatment effects from observational data is a central problem in many domains. Because counterfactual data is inaccessible, the problem differs fundamentally from supervised learning, and entails a more complex set of modeling choices. Despite a variety of recently proposed algorithmic solutions, a principled guideline for building estimators of treatment effects using machine learning algorithms is still lacking. In this paper, we provide such a guideline by characterizing the fundamental limits of estimating heterogeneous treatment effects, and establishing conditions under which these limits can be achieved. Our analysis reveals that the relative importance of the different aspects of observational data vary with the sample size. For instance, we show that selection bias matters only in small-sample regimes, whereas with a large sample size, the way an algorithm models the control and treated outcomes is what bottlenecks its performance. Guided by our analysis, we build a practical algorithm for estimating treatment effects using a non-stationary Gaussian processes with doubly-robust hyperparameters. Using a standard semi-synthetic simulation setup, we show that our algorithm outperforms the state-of-the-art, and that the behavior of existing algorithms conforms with our analysis.'
volume: 80
URL: https://proceedings.mlr.press/v80/alaa18a.html
PDF: http://proceedings.mlr.press/v80/alaa18a/alaa18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-alaa18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ahmed
family: Alaa
- given: Mihaela
family: Schaar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 129-138
id: alaa18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 129
lastpage: 138
published: 2018-07-03 00:00:00 +0000
- title: 'AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning'
abstract: 'Clinical prognostic models derived from largescale healthcare data can inform critical diagnostic and therapeutic decisions. To enable off-theshelf usage of machine learning (ML) in prognostic research, we developed AUTOPROGNOSIS: a system for automating the design of predictive modeling pipelines tailored for clinical prognosis. AUTOPROGNOSIS optimizes ensembles of pipeline configurations efficiently using a novel batched Bayesian optimization (BO) algorithm that learns a low-dimensional decomposition of the pipelines’ high-dimensional hyperparameter space in concurrence with the BO procedure. This is achieved by modeling the pipelines’ performances as a black-box function with a Gaussian process prior, and modeling the “similarities” between the pipelines’ baseline algorithms via a sparse additive kernel with a Dirichlet prior. Meta-learning is used to warmstart BO with external data from “similar” patient cohorts by calibrating the priors using an algorithm that mimics the empirical Bayes method. The system automatically explains its predictions by presenting the clinicians with logical association rules that link patients’ features to predicted risk strata. We demonstrate the utility of AUTOPROGNOSIS using 10 major patient cohorts representing various aspects of cardiovascular patient care.'
volume: 80
URL: https://proceedings.mlr.press/v80/alaa18b.html
PDF: http://proceedings.mlr.press/v80/alaa18b/alaa18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-alaa18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ahmed
family: Alaa
- given: Mihaela
family: Schaar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 139-148
id: alaa18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 139
lastpage: 148
published: 2018-07-03 00:00:00 +0000
- title: 'Information Theoretic Guarantees for Empirical Risk Minimization with Applications to Model Selection and Large-Scale Optimization'
abstract: 'In this paper, we derive bounds on the mutual information of the empirical risk minimization (ERM) procedure for both 0-1 and strongly-convex loss classes. We prove that under the Axiom of Choice, the existence of an ERM learning rule with a vanishing mutual information is equivalent to the assertion that the loss class has a finite VC dimension, thus bridging information theory with statistical learning theory. Similarly, an asymptotic bound on the mutual information is established for strongly-convex loss classes in terms of the number of model parameters. The latter result rests on a central limit theorem (CLT) that we derive in this paper. In addition, we use our results to analyze the excess risk in stochastic convex optimization and unify previous works. Finally, we present two important applications. First, we show that the ERM of strongly-convex loss classes can be trivially scaled to big data using a naive parallelization algorithm with provable guarantees. Second, we propose a simple information criterion for model selection and demonstrate experimentally that it outperforms the popular Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC).'
volume: 80
URL: https://proceedings.mlr.press/v80/alabdulmohsin18a.html
PDF: http://proceedings.mlr.press/v80/alabdulmohsin18a/alabdulmohsin18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-alabdulmohsin18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ibrahim
family: Alabdulmohsin
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 149-158
id: alabdulmohsin18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 149
lastpage: 158
published: 2018-07-03 00:00:00 +0000
- title: 'Fixing a Broken ELBO'
abstract: 'Recent work in unsupervised representation learning has focused on learning deep directed latentvariable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good latent representation, as we demonstrate both theoretically and empirically. In particular, we derive variational lower and upper bounds on the mutual information between the input and the latent variable, and use these bounds to derive a rate-distortion curve that characterizes the tradeoff between compression and reconstruction accuracy. Using this framework, we demonstrate that there is a family of models with identical ELBO, but different quantitative and qualitative characteristics. Our framework also suggests a simple new method to ensure that latent variable models with powerful stochastic decoders do not ignore their latent code.'
volume: 80
URL: https://proceedings.mlr.press/v80/alemi18a.html
PDF: http://proceedings.mlr.press/v80/alemi18a/alemi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-alemi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexander
family: Alemi
- given: Ben
family: Poole
- given: Ian
family: Fischer
- given: Joshua
family: Dillon
- given: Rif A.
family: Saurous
- given: Kevin
family: Murphy
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 159-168
id: alemi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 159
lastpage: 168
published: 2018-07-03 00:00:00 +0000
- title: 'Differentially Private Identity and Equivalence Testing of Discrete Distributions'
abstract: 'We study the fundamental problems of identity and equivalence testing over a discrete population from random samples. Our goal is to develop efficient testers while guaranteeing differential privacy to the individuals of the population. We provide sample-efficient differentially private testers for these problems. Our theoretical results significantly improve over the best known algorithms for identity testing, and are the first results for private equivalence testing. The conceptual message of our work is that there exist private hypothesis testers that are nearly as sample-efficient as their non-private counterparts. We perform an experimental evaluation of our algorithms on synthetic data. Our experiments illustrate that our private testers achieve small type I and type II errors with sample size *sublinear* in the domain size of the underlying distributions.'
volume: 80
URL: https://proceedings.mlr.press/v80/aliakbarpour18a.html
PDF: http://proceedings.mlr.press/v80/aliakbarpour18a/aliakbarpour18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-aliakbarpour18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maryam
family: Aliakbarpour
- given: Ilias
family: Diakonikolas
- given: Ronitt
family: Rubinfeld
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 169-178
id: aliakbarpour18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 169
lastpage: 178
published: 2018-07-03 00:00:00 +0000
- title: 'Katyusha X: Simple Momentum Method for Stochastic Sum-of-Nonconvex Optimization'
abstract: 'The problem of minimizing sum-of-nonconvex functions (i.e., convex functions that are average of non-convex ones) is becoming increasing important in machine learning, and is the core machinery for PCA, SVD, regularized Newton’s method, accelerated non-convex optimization, and more. We show how to provably obtain an accelerated stochastic algorithm for minimizing sum-of-nonconvex functions, by adding one additional line to the well-known SVRG method. This line corresponds to momentum, and shows how to directly apply momentum to the finite-sum stochastic minimization of sum-of-nonconvex functions. As a side result, our method enjoys linear parallel speed-up using mini-batch.'
volume: 80
URL: https://proceedings.mlr.press/v80/allen-zhu18a.html
PDF: http://proceedings.mlr.press/v80/allen-zhu18a/allen-zhu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-allen-zhu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zeyuan
family: Allen-Zhu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 179-185
id: allen-zhu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 179
lastpage: 185
published: 2018-07-03 00:00:00 +0000
- title: 'Make the Minority Great Again: First-Order Regret Bound for Contextual Bandits'
abstract: 'Regret bounds in online learning compare the player’s performance to $L*$, the optimal performance in hindsight with a fixed strategy. Typically such bounds scale with the square root of the time horizon $T$. The more refined concept of first-order regret bound replaces this with a scaling $\sqrt{L*}$, which may be much smaller than $\sqrt{T}$. It is well known that minor variants of standard algorithms satisfy first-order regret bounds in the full information and multi-armed bandit settings. In a COLT 2017 open problem, Agarwal, Krishnamurthy, Langford, Luo, and Schapire raised the issue that existing techniques do not seem sufficient to obtain first-order regret bounds for the contextual bandit problem. In the present paper, we resolve this open problem by presenting a new strategy based on augmenting the policy space.'
volume: 80
URL: https://proceedings.mlr.press/v80/allen-zhu18b.html
PDF: http://proceedings.mlr.press/v80/allen-zhu18b/allen-zhu18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-allen-zhu18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zeyuan
family: Allen-Zhu
- given: Sebastien
family: Bubeck
- given: Yuanzhi
family: Li
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 186-194
id: allen-zhu18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 186
lastpage: 194
published: 2018-07-03 00:00:00 +0000
- title: 'Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data'
abstract: 'Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by reducing the need for paired data. CycleGAN was recently proposed for this problem, but critically assumes the underlying inter-domain mapping is approximately deterministic and one-to-one. This assumption renders the model ineffective for tasks requiring flexible, many-to-many mappings. We propose a new model, called Augmented CycleGAN, which learns many-to-many mappings between domains. We examine Augmented CycleGAN qualitatively and quantitatively on several image datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/almahairi18a.html
PDF: http://proceedings.mlr.press/v80/almahairi18a/almahairi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-almahairi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amjad
family: Almahairi
- given: Sai
family: Rajeshwar
- given: Alessandro
family: Sordoni
- given: Philip
family: Bachman
- given: Aaron
family: Courville
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 195-204
id: almahairi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 195
lastpage: 204
published: 2018-07-03 00:00:00 +0000
- title: 'Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory'
abstract: 'In meta-learning an agent extracts knowledge from observed tasks, aiming to facilitate learning of novel future tasks. Under the assumption that future tasks are ‘related’ to previous tasks, accumulated knowledge should be learned in such a way that they capture the common structure across learned tasks, while allowing the learner sufficient flexibility to adapt to novel aspects of a new task. We present a framework for meta-learning that is based on generalization error bounds, allowing us to extend various PAC-Bayes bounds to meta-learning. Learning takes place through the construction of a distribution over hypotheses based on the observed tasks, and its utilization for learning a new task. Thus, prior knowledge is incorporated through setting an experience-dependent prior for novel tasks. We develop a gradient-based algorithm, and implement it for deep neural networks, based on minimizing an objective function derived from the bounds, and demonstrate its effectiveness numerically. In addition to establishing the improved performance available through meta-learning, we demonstrate the intuitive way by which prior information is manifested at different levels of the network.'
volume: 80
URL: https://proceedings.mlr.press/v80/amit18a.html
PDF: http://proceedings.mlr.press/v80/amit18a/amit18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-amit18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ron
family: Amit
- given: Ron
family: Meir
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 205-214
id: amit18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 205
lastpage: 214
published: 2018-07-03 00:00:00 +0000
- title: 'MAGAN: Aligning Biological Manifolds'
abstract: 'It is increasingly common in many types of natural and physical systems (especially biological systems) to have different types of measurements performed on the same underlying system. In such settings, it is important to align the manifolds arising from each measurement in order to integrate such data and gain an improved picture of the system; we tackle this problem using generative adversarial networks (GANs). Recent attempts to use GANs to find correspondences between sets of samples do not explicitly perform proper alignment of manifolds. We present the new Manifold Aligning GAN (MAGAN) that aligns two manifolds such that related points in each measurement space are aligned. We demonstrate applications of MAGAN in single-cell biology in integrating two different measurement types together: cells from the same tissue are measured with both genomic (single-cell RNA-sequencing) and proteomic (mass cytometry) technologies. We show that MAGAN successfully aligns manifolds such that known correlations between measured markers are improved compared to other recently proposed models.'
volume: 80
URL: https://proceedings.mlr.press/v80/amodio18a.html
PDF: http://proceedings.mlr.press/v80/amodio18a/amodio18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-amodio18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Amodio
- given: Smita
family: Krishnaswamy
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 215-223
id: amodio18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 215
lastpage: 223
published: 2018-07-03 00:00:00 +0000
- title: 'Subspace Embedding and Linear Regression with Orlicz Norm'
abstract: 'We consider a generalization of the classic linear regression problem to the case when the loss is an Orlicz norm. An Orlicz norm is parameterized by a non-negative convex function G: R_+ - > R_+ with G(0) = 0: the Orlicz norm of a n-dimensional vector x is defined as |x|_G = inf{ alpha > 0 | sum_{i = 1}^n G( |x_i| / alpha ) < = 1 }. We consider the cases where the function G grows subquadratically. Our main result is based on a new oblivious embedding which embeds the column space of a given nxd matrix A with Orlicz norm into a lower dimensional space with L2 norm. Specifically, we show how to efficiently find an mxn embedding matrix S (m < n), such that for every d-dimensional vector x, we have Omega(1/(d log n)) |Ax|_G < = |SAx|_2 < = O(d^2 log n) |Ax|_G. By applying this subspace embedding technique, we show an approximation algorithm for the regression problem min_x |Ax-b|_G, up to a O( d log^2 n ) factor. As a further application of our techniques, we show how to also use them to improve on the algorithm for the Lp low rank matrix approximation problem for 1 < = p < 2.'
volume: 80
URL: https://proceedings.mlr.press/v80/andoni18a.html
PDF: http://proceedings.mlr.press/v80/andoni18a/andoni18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-andoni18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexandr
family: Andoni
- given: Chengyu
family: Lin
- given: Ying
family: Sheng
- given: Peilin
family: Zhong
- given: Ruiqi
family: Zhong
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 224-233
id: andoni18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 224
lastpage: 233
published: 2018-07-03 00:00:00 +0000
- title: 'Efficient Gradient-Free Variational Inference using Policy Search'
abstract: 'Inference from complex distributions is a common problem in machine learning needed for many Bayesian methods. We propose an efficient, gradient-free method for learning general GMM approximations of multimodal distributions based on recent insights from stochastic search methods. Our method establishes information-geometric trust regions to ensure efficient exploration of the sampling space and stability of the GMM updates, allowing for efficient estimation of multi-variate Gaussian variational distributions. For GMMs, we apply a variational lower bound to decompose the learning objective into sub-problems given by learning the individual mixture components and the coefficients. The number of mixture components is adapted online in order to allow for arbitrary exact approximations. We demonstrate on several domains that we can learn significantly better approximations than competing variational inference methods and that the quality of samples drawn from our approximations is on par with samples created by state-of-the-art MCMC samplers that require significantly more computational resources.'
volume: 80
URL: https://proceedings.mlr.press/v80/arenz18a.html
PDF: http://proceedings.mlr.press/v80/arenz18a/arenz18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-arenz18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Oleg
family: Arenz
- given: Gerhard
family: Neumann
- given: Mingjun
family: Zhong
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 234-243
id: arenz18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 234
lastpage: 243
published: 2018-07-03 00:00:00 +0000
- title: 'On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization'
abstract: 'Conventional wisdom in deep learning states that increasing depth improves expressiveness but complicates optimization. This paper suggests that, sometimes, increasing depth can speed up optimization. The effect of depth on optimization is decoupled from expressiveness by focusing on settings where additional layers amount to overparameterization – linear neural networks, a well-studied model. Theoretical analysis, as well as experiments, show that here depth acts as a preconditioner which may accelerate convergence. Even on simple convex problems such as linear regression with $\ell_p$ loss, $p>2$, gradient descent can benefit from transitioning to a non-convex overparameterized objective, more than it would from some common acceleration schemes. We also prove that it is mathematically impossible to obtain the acceleration effect of overparametrization via gradients of any regularizer.'
volume: 80
URL: https://proceedings.mlr.press/v80/arora18a.html
PDF: http://proceedings.mlr.press/v80/arora18a/arora18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-arora18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sanjeev
family: Arora
- given: Nadav
family: Cohen
- given: Elad
family: Hazan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 244-253
id: arora18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 244
lastpage: 253
published: 2018-07-03 00:00:00 +0000
- title: 'Stronger Generalization Bounds for Deep Nets via a Compression Approach'
abstract: 'Deep nets generalize well despite having more parameters than the number of training samples. Recent works try to give an explanation using PAC-Bayes and Margin-based analyses, but do not as yet result in sample complexity bounds better than naive parameter counting. The current paper shows generalization bounds that are orders of magnitude better in practice. These rely upon new succinct reparametrizations of the trained net — a compression that is explicit and efficient. These yield generalization bounds via a simple compression-based framework introduced here. Our results also provide some theoretical justification for widespread empirical success in compressing deep nets. Analysis of correctness of our compression relies upon some newly identified noise stability properties of trained deep nets, which are also experimentally verified. The study of these properties and resulting generalization bounds are also extended to convolutional nets, which had eluded earlier attempts on proving generalization.'
volume: 80
URL: https://proceedings.mlr.press/v80/arora18b.html
PDF: http://proceedings.mlr.press/v80/arora18b/arora18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-arora18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sanjeev
family: Arora
- given: Rong
family: Ge
- given: Behnam
family: Neyshabur
- given: Yi
family: Zhang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 254-263
id: arora18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 254
lastpage: 263
published: 2018-07-03 00:00:00 +0000
- title: 'Lipschitz Continuity in Model-based Reinforcement Learning'
abstract: 'We examine the impact of learning Lipschitz continuous models in the context of model-based reinforcement learning. We provide a novel bound on multi-step prediction error of Lipschitz models where we quantify the error using the Wasserstein metric. We go on to prove an error bound for the value-function estimate arising from Lipschitz models and show that the estimated value function is itself Lipschitz. We conclude with empirical results that show the benefits of controlling the Lipschitz constant of neural-network models.'
volume: 80
URL: https://proceedings.mlr.press/v80/asadi18a.html
PDF: http://proceedings.mlr.press/v80/asadi18a/asadi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-asadi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kavosh
family: Asadi
- given: Dipendra
family: Misra
- given: Michael
family: Littman
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 264-273
id: asadi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 264
lastpage: 273
published: 2018-07-03 00:00:00 +0000
- title: 'Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples'
abstract: 'We identify obfuscated gradients, a kind of gradient masking, as a phenomenon that leads to a false sense of security in defenses against adversarial examples. While defenses that cause obfuscated gradients appear to defeat iterative optimization-based attacks, we find defenses relying on this effect can be circumvented. We describe characteristic behaviors of defenses exhibiting the effect, and for each of the three types of obfuscated gradients we discover, we develop attack techniques to overcome it. In a case study, examining non-certified white-box-secure defenses at ICLR 2018, we find obfuscated gradients are a common occurrence, with 7 of 9 defenses relying on obfuscated gradients. Our new attacks successfully circumvent 6 completely, and 1 partially, in the original threat model each paper considers.'
volume: 80
URL: https://proceedings.mlr.press/v80/athalye18a.html
PDF: http://proceedings.mlr.press/v80/athalye18a/athalye18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-athalye18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anish
family: Athalye
- given: Nicholas
family: Carlini
- given: David
family: Wagner
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 274-283
id: athalye18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 274
lastpage: 283
published: 2018-07-03 00:00:00 +0000
- title: 'Synthesizing Robust Adversarial Examples'
abstract: 'Standard methods for generating adversarial examples for neural networks do not consistently fool neural network classifiers in the physical world due to a combination of viewpoint shifts, camera noise, and other natural transformations, limiting their relevance to real-world systems. We demonstrate the existence of robust 3D adversarial objects, and we present the first algorithm for synthesizing examples that are adversarial over a chosen distribution of transformations. We synthesize two-dimensional adversarial images that are robust to noise, distortion, and affine transformation. We apply our algorithm to complex three-dimensional objects, using 3D-printing to manufacture the first physical adversarial objects. Our results demonstrate the existence of 3D adversarial objects in the physical world.'
volume: 80
URL: https://proceedings.mlr.press/v80/athalye18b.html
PDF: http://proceedings.mlr.press/v80/athalye18b/athalye18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-athalye18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anish
family: Athalye
- given: Logan
family: Engstrom
- given: Andrew
family: Ilyas
- given: Kevin
family: Kwok
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 284-293
id: athalye18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 284
lastpage: 293
published: 2018-07-03 00:00:00 +0000
- title: 'Contextual Graph Markov Model: A Deep and Generative Approach to Graph Processing'
abstract: 'We introduce the Contextual Graph Markov Model, an approach combining ideas from generative models and neural networks for the processing of graph data. It founds on a constructive methodology to build a deep architecture comprising layers of probabilistic models that learn to encode the structured information in an incremental fashion. Context is diffused in an efficient and scalable way across the graph vertexes and edges. The resulting graph encoding is used in combination with discriminative models to address structure classification benchmarks.'
volume: 80
URL: https://proceedings.mlr.press/v80/bacciu18a.html
PDF: http://proceedings.mlr.press/v80/bacciu18a/bacciu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bacciu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Davide
family: Bacciu
- given: Federico
family: Errica
- given: Alessio
family: Micheli
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 294-303
id: bacciu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 294
lastpage: 303
published: 2018-07-03 00:00:00 +0000
- title: 'Greed is Still Good: Maximizing Monotone Submodular+Supermodular (BP) Functions'
abstract: 'We analyze the performance of the greedy algorithm, and also a discrete semi-gradient based algorithm, for maximizing the sum of a suBmodular and suPermodular (BP) function (both of which are non-negative monotone non-decreasing) under two types of constraints, either a cardinality constraint or $p\geq 1$ matroid independence constraints. These problems occur naturally in several real-world applications in data science, machine learning, and artificial intelligence. The problems are ordinarily inapproximable to any factor. Using the curvature $\curv_f$ of the submodular term, and introducing $\curv^g$ for the supermodular term (a natural dual curvature for supermodular functions), however, both of which are computable in linear time, we show that BP maximization can be efficiently approximated by both the greedy and the semi-gradient based algorithm. The algorithms yield multiplicative guarantees of $\frac{1}{\curv_f}\left[1-e^{-(1-\curv^g)\curv_f}\right]$ and $\frac{1-\curv^g}{(1-\curv^g)\curv_f + p}$ for the two types of constraints respectively. For pure monotone supermodular constrained maximization, these yield $1-\curvg$ and $(1-\curvg)/p$ for the two types of constraints respectively. We also analyze the hardness of BP maximization and show that our guarantees match hardness by a constant factor and by $O(\ln(p))$ respectively. Computational experiments are also provided supporting our analysis.'
volume: 80
URL: https://proceedings.mlr.press/v80/bai18a.html
PDF: http://proceedings.mlr.press/v80/bai18a/bai18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bai18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wenruo
family: Bai
- given: Jeff
family: Bilmes
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 304-313
id: bai18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 304
lastpage: 313
published: 2018-07-03 00:00:00 +0000
- title: 'Comparing Dynamics: Deep Neural Networks versus Glassy Systems'
abstract: 'We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are the complexity of the loss-landscape and of the dynamics within it, and to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and data-sets, suggest that during the training process the dynamics slows down because of an increasingly large number of flat directions. At large times, when the loss is approaching zero, the system diffuses at the bottom of the landscape. Despite some similarities with the dynamics of mean-field glassy systems, in particular, the absence of barrier crossing, we find distinctive dynamical behaviors in the two cases, thus showing that the statistical properties of the corresponding loss and energy landscapes are different. In contrast, when the network is under-parametrized we observe a typical glassy behavior, thus suggesting the existence of different phases depending on whether the network is under-parametrized or over-parametrized.'
volume: 80
URL: https://proceedings.mlr.press/v80/baity-jesi18a.html
PDF: http://proceedings.mlr.press/v80/baity-jesi18a/baity-jesi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-baity-jesi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marco
family: Baity-Jesi
- given: Levent
family: Sagun
- given: Mario
family: Geiger
- given: Stefano
family: Spigler
- given: Gerard Ben
family: Arous
- given: Chiara
family: Cammarota
- given: Yann
family: LeCun
- given: Matthieu
family: Wyart
- given: Giulio
family: Biroli
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 314-323
id: baity-jesi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 314
lastpage: 323
published: 2018-07-03 00:00:00 +0000
- title: 'SMAC: Simultaneous Mapping and Clustering Using Spectral Decompositions'
abstract: 'We introduce a principled approach for *simultaneous mapping and clustering* (SMAC) for establishing consistent maps across heterogeneous object collections (e.g., 2D images or 3D shapes). Our approach takes as input a heterogeneous object collection and a set of maps computed between some pairs of objects, and outputs a homogeneous object clustering together with a new set of maps possessing optimal intra- and inter-cluster consistency. Our approach is based on the spectral decomposition of a data matrix storing all pairwise maps in its blocks. We additionally provide tight theoretical guarantees on the exactness of SMAC under established noise models. We also demonstrate the usefulness of the approach on synthetic and real datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/bajaj18a.html
PDF: http://proceedings.mlr.press/v80/bajaj18a/bajaj18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bajaj18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chandrajit
family: Bajaj
- given: Tingran
family: Gao
- given: Zihang
family: He
- given: Qixing
family: Huang
- given: Zhenxiao
family: Liang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 324-333
id: bajaj18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 324
lastpage: 333
published: 2018-07-03 00:00:00 +0000
- title: 'A Boo(n) for Evaluating Architecture Performance'
abstract: 'We point out important problems with the common practice of using the best single model performance for comparing deep learning architectures, and we propose a method that corrects these flaws. Each time a model is trained, one gets a different result due to random factors in the training process, which include random parameter initialization and random data shuffling. Reporting the best single model performance does not appropriately address this stochasticity. We propose a normalized expected best-out-of-$n$ performance ($\text{Boo}_n$) as a way to correct these problems.'
volume: 80
URL: https://proceedings.mlr.press/v80/bajgar18a.html
PDF: http://proceedings.mlr.press/v80/bajgar18a/bajgar18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bajgar18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ondrej
family: Bajgar
- given: Rudolf
family: Kadlec
- given: Jan
family: Kleindienst
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 334-343
id: bajgar18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 334
lastpage: 343
published: 2018-07-03 00:00:00 +0000
- title: 'Learning to Branch'
abstract: 'Tree search algorithms, such as branch-and-bound, are the most widely used tools for solving combinatorial problems. These algorithms recursively partition the search space to find an optimal solution. To keep the tree small, it is crucial to carefully decide, when expanding a tree node, which variable to branch on at that node to partition the remaining space. Many partitioning techniques have been proposed, but no theory describes which is optimal. We show how to use machine learning to determine an optimal weighting of any set of partitioning procedures for the instance distribution at hand using samples. Via theory and experiments, we show that learning to branch is both practical and hugely beneficial.'
volume: 80
URL: https://proceedings.mlr.press/v80/balcan18a.html
PDF: http://proceedings.mlr.press/v80/balcan18a/balcan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-balcan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maria-Florina
family: Balcan
- given: Travis
family: Dick
- given: Tuomas
family: Sandholm
- given: Ellen
family: Vitercik
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 344-353
id: balcan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 344
lastpage: 353
published: 2018-07-03 00:00:00 +0000
- title: 'The Mechanics of n-Player Differentiable Games'
abstract: 'The cornerstone underpinning deep learning is the guarantee that gradient descent on an objective converges to local minima. Unfortunately, this guarantee fails in settings, such as generative adversarial nets, where there are multiple interacting losses. The behavior of gradient-based methods in games is not well understood – and is becoming increasingly important as adversarial and multi-objective architectures proliferate. In this paper, we develop new techniques to understand and control the dynamics in general games. The key result is to decompose the second-order dynamics into two components. The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems. The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in general games. Basic experiments show SGA is competitive with recently proposed algorithms for finding local Nash equilibria in GANs – whilst at the same time being applicable to – and having guarantees in – much more general games.'
volume: 80
URL: https://proceedings.mlr.press/v80/balduzzi18a.html
PDF: http://proceedings.mlr.press/v80/balduzzi18a/balduzzi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-balduzzi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Balduzzi
- given: Sebastien
family: Racaniere
- given: James
family: Martens
- given: Jakob
family: Foerster
- given: Karl
family: Tuyls
- given: Thore
family: Graepel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 354-363
id: balduzzi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 354
lastpage: 363
published: 2018-07-03 00:00:00 +0000
- title: 'Spline Filters For End-to-End Deep Learning'
abstract: 'We propose to tackle the problem of end-to-end learning for raw waveform signals by introducing learnable continuous time-frequency atoms. The derivation of these filters is achieved by defining a functional space with a given smoothness order and boundary conditions. From this space, we derive the parametric analytical filters. Their differentiability property allows gradient-based optimization. As such, one can utilize any Deep Neural Network (DNN) with these filters. This enables us to tackle in a front-end fashion a large scale bird detection task based on the freefield1010 dataset known to contain key challenges, such as the dimensionality of the inputs data ($>100,000$) and the presence of additional noises: multiple sources and soundscapes.'
volume: 80
URL: https://proceedings.mlr.press/v80/balestriero18a.html
PDF: http://proceedings.mlr.press/v80/balestriero18a/balestriero18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-balestriero18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Randall
family: Balestriero
- given: Romain
family: Cosentino
- given: Herve
family: Glotin
- given: Richard
family: Baraniuk
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 364-373
id: balestriero18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 364
lastpage: 373
published: 2018-07-03 00:00:00 +0000
- title: 'A Spline Theory of Deep Learning'
abstract: 'We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of *max-affine spline operators* (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent, class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classification performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space opens up a new geometric avenue to study how DNs organize signals in a hierarchical fashion. As an application, we develop and validate a new distance metric for signals that quantifies the difference between their partition encodings.'
volume: 80
URL: https://proceedings.mlr.press/v80/balestriero18b.html
PDF: http://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-balestriero18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Randall
family: Balestriero
- given:
family: baraniuk
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 374-383
id: balestriero18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 374
lastpage: 383
published: 2018-07-03 00:00:00 +0000
- title: 'Approximation Guarantees for Adaptive Sampling'
abstract: 'In this paper we analyze an adaptive sampling approach for submodular maximization. Adaptive sampling is a technique that has recently been shown to achieve a constant factor approximation guarantee for submodular maximization under a cardinality constraint with exponentially fewer adaptive rounds than any previously studied constant factor approximation algorithm for this problem. Adaptivity quantifies the number of sequential rounds that an algorithm makes when function evaluations can be executed in parallel and is the parallel running time of an algorithm, up to low order terms. Adaptive sampling achieves its exponential speedup at the expense of approximation. In theory, it is guaranteed to produce a solution that is a 1/3 approximation to the optimum. Nevertheless, experiments show that adaptive sampling techniques achieve far better values in practice. In this paper we provide theoretical justification for this phenomenon. In particular, we show that under very mild conditions of curvature of a function, adaptive sampling techniques achieve an approximation arbitrarily close to 1/2 while maintaining their low adaptivity. Furthermore, we show that the approximation ratio approaches 1 in direct relationship to a homogeneity property of the submodular function. In addition, we conduct experiments on real data sets in which the curvature and homogeneity properties can be easily manipulated and demonstrate the relationship between approximation and curvature, as well as the effectiveness of adaptive sampling in practice.'
volume: 80
URL: https://proceedings.mlr.press/v80/balkanski18a.html
PDF: http://proceedings.mlr.press/v80/balkanski18a/balkanski18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-balkanski18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eric
family: Balkanski
- given: Yaron
family: Singer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 384-393
id: balkanski18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 384
lastpage: 393
published: 2018-07-03 00:00:00 +0000
- title: 'Improving the Gaussian Mechanism for Differential Privacy: Analytical Calibration and Optimal Denoising'
abstract: 'The Gaussian mechanism is an essential building block used in multitude of differentially private data analysis algorithms. In this paper we revisit the Gaussian mechanism and show that the original analysis has several important limitations. Our analysis reveals that the variance formula for the original mechanism is far from tight in the high privacy regime ($\varepsilon \to 0$) and it cannot be extended to the low privacy regime ($\varepsilon \to \infty$). We address these limitations by developing an optimal Gaussian mechanism whose variance is calibrated directly using the Gaussian cumulative density function instead of a tail bound approximation. We also propose to equip the Gaussian mechanism with a post-processing step based on adaptive estimation techniques by leveraging that the distribution of the perturbation is known. Our experiments show that analytical calibration removes at least a third of the variance of the noise compared to the classical Gaussian mechanism, and that denoising dramatically improves the accuracy of the Gaussian mechanism in the high-dimensional regime.'
volume: 80
URL: https://proceedings.mlr.press/v80/balle18a.html
PDF: http://proceedings.mlr.press/v80/balle18a/balle18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-balle18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Borja
family: Balle
- given: Yu-Xiang
family: Wang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 394-403
id: balle18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 394
lastpage: 403
published: 2018-07-03 00:00:00 +0000
- title: 'Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients'
abstract: 'The ADAM optimizer is exceedingly popular in the deep learning community. Often it works very well, sometimes it doesn’t. Why? We interpret ADAM as a combination of two aspects: for each weight, the update direction is determined by the sign of stochastic gradients, whereas the update magnitude is determined by an estimate of their relative variance. We disentangle these two aspects and analyze them in isolation, gaining insight into the mechanisms underlying ADAM. This analysis also extends recent results on adverse effects of ADAM on generalization, isolating the sign aspect as the problematic one. Transferring the variance adaptation to SGD gives rise to a novel method, completing the practitioner’s toolbox for problems where ADAM fails.'
volume: 80
URL: https://proceedings.mlr.press/v80/balles18a.html
PDF: http://proceedings.mlr.press/v80/balles18a/balles18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-balles18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lukas
family: Balles
- given: Philipp
family: Hennig
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 404-413
id: balles18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 404
lastpage: 413
published: 2018-07-03 00:00:00 +0000
- title: 'Differentially Private Database Release via Kernel Mean Embeddings'
abstract: 'We lay theoretical foundations for new database release mechanisms that allow third-parties to construct consistent estimators of population statistics, while ensuring that the privacy of each individual contributing to the database is protected. The proposed framework rests on two main ideas. First, releasing (an estimate of) the kernel mean embedding of the data generating random variable instead of the database itself still allows third-parties to construct consistent estimators of a wide class of population statistics. Second, the algorithm can satisfy the definition of differential privacy by basing the released kernel mean embedding on entirely synthetic data points, while controlling accuracy through the metric available in a Reproducing Kernel Hilbert Space. We describe two instantiations of the proposed framework, suitable under different scenarios, and prove theoretical results guaranteeing differential privacy of the resulting algorithms and the consistency of estimators constructed from their outputs.'
volume: 80
URL: https://proceedings.mlr.press/v80/balog18a.html
PDF: http://proceedings.mlr.press/v80/balog18a/balog18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-balog18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matej
family: Balog
- given: Ilya
family: Tolstikhin
- given: Bernhard
family: Schölkopf
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 414-422
id: balog18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 414
lastpage: 422
published: 2018-07-03 00:00:00 +0000
- title: 'Improving Optimization for Models With Continuous Symmetry Breaking'
abstract: 'Many loss functions in representation learning are invariant under a continuous symmetry transformation. For example, the loss function of word embeddings (Mikolov et al., 2013) remains unchanged if we simultaneously rotate all word and context embedding vectors. We show that representation learning models for time series possess an approximate continuous symmetry that leads to slow convergence of gradient descent. We propose a new optimization algorithm that speeds up convergence using ideas from gauge theory in physics. Our algorithm leads to orders of magnitude faster convergence and to more interpretable representations, as we show for dynamic extensions of matrix factorization and word embedding models. We further present an example application of our proposed algorithm that translates modern words into their historic equivalents.'
volume: 80
URL: https://proceedings.mlr.press/v80/bamler18a.html
PDF: http://proceedings.mlr.press/v80/bamler18a/bamler18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bamler18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Robert
family: Bamler
- given: Stephan
family: Mandt
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 423-432
id: bamler18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 423
lastpage: 432
published: 2018-07-03 00:00:00 +0000
- title: 'Improved Training of Generative Adversarial Networks Using Representative Features'
abstract: 'Despite the success of generative adversarial networks (GANs) for image generation, the trade-off between visual quality and image diversity remains a significant issue. This paper achieves both aims simultaneously by improving the stability of training GANs. The key idea of the proposed approach is to implicitly regularize the discriminator using representative features. Focusing on the fact that standard GAN minimizes reverse Kullback-Leibler (KL) divergence, we transfer the representative feature, which is extracted from the data distribution using a pre-trained autoencoder (AE), to the discriminator of standard GANs. Because the AE learns to minimize forward KL divergence, our GAN training with representative features is influenced by both reverse and forward KL divergence. Consequently, the proposed approach is verified to improve visual quality and diversity of state of the art GANs using extensive evaluations.'
volume: 80
URL: https://proceedings.mlr.press/v80/bang18a.html
PDF: http://proceedings.mlr.press/v80/bang18a/bang18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bang18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Duhyeon
family: Bang
- given: Hyunjung
family: Shim
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 433-442
id: bang18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 433
lastpage: 442
published: 2018-07-03 00:00:00 +0000
- title: 'Using Inherent Structures to design Lean 2-layer RBMs'
abstract: 'Understanding the representational power of Restricted Boltzmann Machines (RBMs) with multiple layers is an ill-understood problem and is an area of active research. Motivated from the approach of *Inherent Structure formalism* (Stillinger & Weber, 1982), extensively used in analysing Spin Glasses, we propose a novel measure called *Inherent Structure Capacity* (ISC), which characterizes the representation capacity of a fixed architecture RBM by the expected number of modes of distributions emanating from the RBM with parameters drawn from a prior distribution. Though ISC is intractable, we show that for a single layer RBM architecture ISC approaches a finite constant as number of hidden units are increased and to further improve the ISC, one needs to add a second layer. Furthermore, we introduce *Lean* RBMs, which are multi-layer RBMs where each layer can have at-most O(n) units with the number of visible units being n. We show that for every single layer RBM with Omega(n^{2+r}), r >= 0, hidden units there exists a two-layered *lean* RBM with Theta(n^2) parameters with the same ISC, establishing that 2 layer RBMs can achieve the same representational power as single-layer RBMs but using far fewer number of parameters. To the best of our knowledge, this is the first result which quantitatively establishes the need for layering.'
volume: 80
URL: https://proceedings.mlr.press/v80/bansal18a.html
PDF: http://proceedings.mlr.press/v80/bansal18a/bansal18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bansal18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Abhishek
family: Bansal
- given: Abhinav
family: Anand
- given: Chiranjib
family: Bhattacharyya
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 443-451
id: bansal18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 443
lastpage: 451
published: 2018-07-03 00:00:00 +0000
- title: 'Classification from Pairwise Similarity and Unlabeled Data'
abstract: 'Supervised learning needs a huge amount of labeled data, which can be a big bottleneck under the situation where there is a privacy concern or labeling cost is high. To overcome this problem, we propose a new weakly-supervised learning setting where only similar (S) data pairs (two examples belong to the same class) and unlabeled (U) data points are needed instead of fully labeled data, which is called SU classification. We show that an unbiased estimator of the classification risk can be obtained only from SU data, and the estimation error of its empirical risk minimizer achieves the optimal parametric convergence rate. Finally, we demonstrate the effectiveness of the proposed method through experiments.'
volume: 80
URL: https://proceedings.mlr.press/v80/bao18a.html
PDF: http://proceedings.mlr.press/v80/bao18a/bao18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bao18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Han
family: Bao
- given: Gang
family: Niu
- given: Masashi
family: Sugiyama
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 452-461
id: bao18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 452
lastpage: 461
published: 2018-07-03 00:00:00 +0000
- title: 'Bayesian Optimization of Combinatorial Structures'
abstract: 'The optimization of expensive-to-evaluate black-box functions over combinatorial structures is an ubiquitous task in machine learning, engineering and the natural sciences. The combinatorial explosion of the search space and costly evaluations pose challenges for current techniques in discrete optimization and machine learning, and critically require new algorithmic ideas. This article proposes, to the best of our knowledge, the first algorithm to overcome these challenges, based on an adaptive, scalable model that identifies useful combinatorial structure even when data is scarce. Our acquisition function pioneers the use of semidefinite programming to achieve efficiency and scalability. Experimental evaluations demonstrate that this algorithm consistently outperforms other methods from combinatorial and Bayesian optimization.'
volume: 80
URL: https://proceedings.mlr.press/v80/baptista18a.html
PDF: http://proceedings.mlr.press/v80/baptista18a/baptista18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-baptista18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ricardo
family: Baptista
- given: Matthias
family: Poloczek
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 462-471
id: baptista18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 462
lastpage: 471
published: 2018-07-03 00:00:00 +0000
- title: 'Geodesic Convolutional Shape Optimization'
abstract: 'Aerodynamic shape optimization has many industrial applications. Existing methods, however, are so computationally demanding that typical engineering practices are to either simply try a limited number of hand-designed shapes or restrict oneself to shapes that can be parameterized using only few degrees of freedom. In this work, we introduce a new way to optimize complex shapes fast and accurately. To this end, we train Geodesic Convolutional Neural Networks to emulate a fluidynamics simulator. The key to making this approach practical is remeshing the original shape using a poly-cube map, which makes it possible to perform the computations on GPUs instead of CPUs. The neural net is then used to formulate an objective function that is differentiable with respect to the shape parameters, which can then be optimized using a gradient-based technique. This outperforms state-of-the-art methods by 5 to 20% for standard problems and, even more importantly, our approach applies to cases that previous methods cannot handle.'
volume: 80
URL: https://proceedings.mlr.press/v80/baque18a.html
PDF: http://proceedings.mlr.press/v80/baque18a/baque18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-baque18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pierre
family: Baque
- given: Edoardo
family: Remelli
- given: Francois
family: Fleuret
- given: Pascal
family: Fua
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 472-481
id: baque18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 472
lastpage: 481
published: 2018-07-03 00:00:00 +0000
- title: 'Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems'
abstract: 'Learning to coordinate between multiple agents is an important problem in many reinforcement learning problems. Key to learning to coordinate is exploiting loose couplings, i.e., conditional independences between agents. In this paper we study learning in repeated fully cooperative games, multi-agent multi-armed bandits (MAMABs), in which the expected rewards can be expressed as a coordination graph. We propose multi-agent upper confidence exploration (MAUCE), a new algorithm for MAMABs that exploits loose couplings, which enables us to prove a regret bound that is logarithmic in the number of arm pulls and only linear in the number of agents. We empirically compare MAUCE to sparse cooperative Q-learning, and a state-of-the-art combinatorial bandit approach, and show that it performs much better on a variety of settings, including learning control policies for wind farms.'
volume: 80
URL: https://proceedings.mlr.press/v80/bargiacchi18a.html
PDF: http://proceedings.mlr.press/v80/bargiacchi18a/bargiacchi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bargiacchi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eugenio
family: Bargiacchi
- given: Timothy
family: Verstraeten
- given: Diederik
family: Roijers
- given: Ann
family: Nowé
- given: Hado
family: Hasselt
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 482-490
id: bargiacchi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 482
lastpage: 490
published: 2018-07-03 00:00:00 +0000
- title: 'Testing Sparsity over Known and Unknown Bases'
abstract: 'Sparsity is a basic property of real vectors that is exploited in a wide variety of machine learning applications. In this work, we describe property testing algorithms for sparsity that observe a low-dimensional projec- tion of the input. We consider two settings. In the first setting, we test sparsity with respect to an unknown basis: given input vectors $y_1 ,...,y_p \in R^d$ whose concatenation as columns forms $Y \in R^{d \times p}$ , does $Y = AX$ for matrices $A \in R^{d\times m}$ and $X \in R^{m \times p}$ such that each column of $X$ is $k$-sparse, or is $Y$ “far” from having such a decomposition? In the second setting, we test sparsity with respect to a known basis: for a fixed design ma- trix $A \in R^{d \times m}$ , given input vector $y \in R^d$ , is $y = Ax$ for some $k$-sparse vector $x$ or is $y$ “far” from having such a decomposition? We analyze our algorithms using tools from high-dimensional geometry and probability.'
volume: 80
URL: https://proceedings.mlr.press/v80/barman18a.html
PDF: http://proceedings.mlr.press/v80/barman18a/barman18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-barman18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siddharth
family: Barman
- given: Arnab
family: Bhattacharyya
- given: Suprovat
family: Ghoshal
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 491-500
id: barman18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 491
lastpage: 500
published: 2018-07-03 00:00:00 +0000
- title: 'Transfer in Deep Reinforcement Learning Using Successor Features and Generalised Policy Improvement'
abstract: 'The ability to transfer skills across tasks has the potential to scale up reinforcement learning (RL) agents to environments currently out of reach. Recently, a framework based on two ideas, successor features (SFs) and generalised policy improvement (GPI), has been introduced as a principled way of transferring skills. In this paper we extend the SF&GPI framework in two ways. One of the basic assumptions underlying the original formulation of SF&GPI is that rewards for all tasks of interest can be computed as linear combinations of a fixed set of features. We relax this constraint and show that the theoretical guarantees supporting the framework can be extended to any set of tasks that only differ in the reward function. Our second contribution is to show that one can use the reward functions themselves as features for future tasks, without any loss of expressiveness, thus removing the need to specify a set of features beforehand. This makes it possible to combine SF&GPI with deep learning in a more stable way. We empirically verify this claim on a complex 3D environment where observations are images from a first-person perspective. We show that the transfer promoted by SF&GPI leads to very good policies on unseen tasks almost instantaneously. We also describe how to learn policies specialised to the new tasks in a way that allows them to be added to the agent’s set of skills, and thus be reused in the future.'
volume: 80
URL: https://proceedings.mlr.press/v80/barreto18a.html
PDF: http://proceedings.mlr.press/v80/barreto18a/barreto18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-barreto18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andre
family: Barreto
- given: Diana
family: Borsa
- given: John
family: Quan
- given: Tom
family: Schaul
- given: David
family: Silver
- given: Matteo
family: Hessel
- given: Daniel
family: Mankowitz
- given: Augustin
family: Zidek
- given: Remi
family: Munos
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 501-510
id: barreto18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 501
lastpage: 510
published: 2018-07-03 00:00:00 +0000
- title: 'Measuring abstract reasoning in neural networks'
abstract: 'Whether neural networks can learn abstract reasoning or whether they merely rely on superficial statistics is a topic of recent debate. Here, we propose a dataset and challenge designed to probe abstract reasoning, inspired by a well-known human IQ test. To succeed at this challenge, models must cope with various generalisation ’regimes’ in which the training data and test questions differ in clearly-defined ways. We show that popular models such as ResNets perform poorly, even when the training and test sets differ only minimally, and we present a novel architecture, with structure designed to encourage reasoning, that does significantly better. When we vary the way in which the test questions and training data differ, we find that our model is notably proficient at certain forms of generalisation, but notably weak at others. We further show that the model’s ability to generalise improves markedly if it is trained to predict symbolic explanations for its answers. Altogether, we introduce and explore ways to both measure and induce stronger abstract reasoning in neural networks. Our freely-available dataset should motivate further progress in this direction.'
volume: 80
URL: https://proceedings.mlr.press/v80/barrett18a.html
PDF: http://proceedings.mlr.press/v80/barrett18a/barrett18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-barrett18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Barrett
- given: Felix
family: Hill
- given: Adam
family: Santoro
- given: Ari
family: Morcos
- given: Timothy
family: Lillicrap
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 511-520
id: barrett18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 511
lastpage: 520
published: 2018-07-03 00:00:00 +0000
- title: 'Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks'
abstract: 'We analyze algorithms for approximating a function $f(x) = \Phi x$ mapping $\Re^d$ to $\Re^d$ using deep linear neural networks, i.e. that learn a function $h$ parameterized by matrices $\Theta_1,...,\Theta_L$ and defined by $h(x) = \Theta_L \Theta_{L-1} ... \Theta_1 x$. We focus on algorithms that learn through gradient descent on the population quadratic loss in the case that the distribution over the inputs is isotropic. We provide polynomial bounds on the number of iterations for gradient descent to approximate the least squares matrix $\Phi$, in the case where the initial hypothesis $\Theta_1 = ... = \Theta_L = I$ has excess loss bounded by a small enough constant. On the other hand, we show that gradient descent fails to converge for $\Phi$ whose distance from the identity is a larger constant, and we show that some forms of regularization toward the identity in each layer do not help. If $\Phi$ is symmetric positive definite, we show that an algorithm that initializes $\Theta_i = I$ learns an $\epsilon$-approximation of $f$ using a number of updates polynomial in $L$, the condition number of $\Phi$, and $\log(d/\epsilon)$. In contrast, we show that if the least squares matrix $\Phi$ is symmetric and has a negative eigenvalue, then all members of a class of algorithms that perform gradient descent with identity initialization, and optionally regularize toward the identity in each layer, fail to converge. We analyze an algorithm for the case that $\Phi$ satisfies $u^{\top} \Phi u > 0$ for all $u$, but may not be symmetric. This algorithm uses two regularizers: one that maintains the invariant $u^{\top} \Theta_L \Theta_{L-1} ... \Theta_1 u > 0$ for all $u$, and another that "balances" $\Theta_1, ..., \Theta_L$ so that they have the same singular values.'
volume: 80
URL: https://proceedings.mlr.press/v80/bartlett18a.html
PDF: http://proceedings.mlr.press/v80/bartlett18a/bartlett18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bartlett18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Peter
family: Bartlett
- given: Dave
family: Helmbold
- given: Philip
family: Long
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 521-530
id: bartlett18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 521
lastpage: 530
published: 2018-07-03 00:00:00 +0000
- title: 'Mutual Information Neural Estimation'
abstract: 'We argue that the estimation of mutual information between high dimensional continuous random variables can be achieved by gradient descent over neural networks. We present a Mutual Information Neural Estimator (MINE) that is linearly scalable in dimensionality as well as in sample size, trainable through back-prop, and strongly consistent. We present a handful of applications on which MINE can be used to minimize or maximize mutual information. We apply MINE to improve adversarially trained generative models. We also use MINE to implement the Information Bottleneck, applying it to supervised classification; our results demonstrate substantial improvement in flexibility and performance in these settings.'
volume: 80
URL: https://proceedings.mlr.press/v80/belghazi18a.html
PDF: http://proceedings.mlr.press/v80/belghazi18a/belghazi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-belghazi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mohamed Ishmael
family: Belghazi
- given: Aristide
family: Baratin
- given: Sai
family: Rajeshwar
- given: Sherjil
family: Ozair
- given: Yoshua
family: Bengio
- given: Aaron
family: Courville
- given: Devon
family: Hjelm
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 531-540
id: belghazi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 531
lastpage: 540
published: 2018-07-03 00:00:00 +0000
- title: 'To Understand Deep Learning We Need to Understand Kernel Learning'
abstract: 'Generalization performance of classifiers in deep learning has recently become a subject of intense study. Deep models, which are typically heavily over-parametrized, tend to fit the training data exactly. Despite this “overfitting", they perform well on test data, a phenomenon not yet fully understood. The first point of our paper is that strong performance of overfitted classifiers is not a unique feature of deep learning. Using six real-world and two synthetic datasets, we establish experimentally that kernel machines trained to have zero classification error or near zero regression error (interpolation) perform very well on test data. We proceed to give a lower bound on the norm of zero loss solutions for smooth kernels, showing that they increase nearly exponentially with data size. None of the existing bounds produce non-trivial results for interpolating solutions. We also show experimentally that (non-smooth) Laplacian kernels easily fit random labels, a finding that parallels results recently reported for ReLU neural networks. In contrast, fitting noisy data requires many more epochs for smooth Gaussian kernels. Similar performance of overfitted Laplacian and Gaussian classifiers on test, suggests that generalization is tied to the properties of the kernel function rather than the optimization process. Some key phenomena of deep learning are manifested similarly in kernel methods in the modern “overfitted" regime. The combination of the experimental and theoretical results presented in this paper indicates a need for new theoretical ideas for understanding properties of classical kernel methods. We argue that progress on understanding deep learning will be difficult until more tractable “shallow” kernel methods are better understood.'
volume: 80
URL: https://proceedings.mlr.press/v80/belkin18a.html
PDF: http://proceedings.mlr.press/v80/belkin18a/belkin18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-belkin18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mikhail
family: Belkin
- given: Siyuan
family: Ma
- given: Soumik
family: Mandal
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 541-549
id: belkin18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 541
lastpage: 549
published: 2018-07-03 00:00:00 +0000
- title: 'Understanding and Simplifying One-Shot Architecture Search'
abstract: 'There is growing interest in automating neural network architecture design. Existing architecture search methods can be computationally expensive, requiring thousands of different architectures to be trained from scratch. Recent work has explored *weight sharing* across models to amortize the cost of training. Although previous methods reduced the cost of architecture search by orders of magnitude, they remain complex, requiring hypernetworks or reinforcement learning controllers. We aim to understand weight sharing for one-shot architecture search. With careful experimental analysis, we show that it is possible to efficiently identify promising architectures from a complex search space without either hypernetworks or RL.'
volume: 80
URL: https://proceedings.mlr.press/v80/bender18a.html
PDF: http://proceedings.mlr.press/v80/bender18a/bender18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bender18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gabriel
family: Bender
- given: Pieter-Jan
family: Kindermans
- given: Barret
family: Zoph
- given: Vijay
family: Vasudevan
- given: Quoc
family: Le
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 550-559
id: bender18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 550
lastpage: 559
published: 2018-07-03 00:00:00 +0000
- title: 'signSGD: Compressed Optimisation for Non-Convex Problems'
abstract: 'Training large neural networks requires distributing learning across multiple workers, where the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this problem by transmitting just the sign of each minibatch stochastic gradient. We prove that it can get the best of both worlds: compressed gradients and SGD-level convergence rate. The relative $\ell_1/\ell_2$ geometry of gradients, noise and curvature informs whether signSGD or SGD is theoretically better suited to a particular problem. On the practical side we find that the momentum counterpart of signSGD is able to match the accuracy and convergence speed of Adam on deep Imagenet models. We extend our theory to the distributed setting, where the parameter server uses majority vote to aggregate gradient signs from each worker enabling 1-bit compression of worker-server communication in both directions. Using a theorem by Gauss we prove that majority vote can achieve the same reduction in variance as full precision distributed SGD. Thus, there is great promise for sign-based optimisation schemes to achieve fast communication and fast convergence. Code to reproduce experiments is to be found at https://github.com/jxbz/signSGD.'
volume: 80
URL: https://proceedings.mlr.press/v80/bernstein18a.html
PDF: http://proceedings.mlr.press/v80/bernstein18a/bernstein18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bernstein18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jeremy
family: Bernstein
- given: Yu-Xiang
family: Wang
- given: Kamyar
family: Azizzadenesheli
- given: Animashree
family: Anandkumar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 560-569
id: bernstein18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 560
lastpage: 569
published: 2018-07-03 00:00:00 +0000
- title: 'Distributed Clustering via LSH Based Data Partitioning'
abstract: 'Given the importance of clustering in the analysisof large scale data, distributed algorithms for formulations such as k-means, k-median, etc. have been extensively studied. A successful approach here has been the “reduce and merge” paradigm, in which each machine reduces its input size to {Õ}(k), and this data reduction continues (possibly iteratively) until all the data fits on one machine, at which point the problem is solved locally. This approach has the intrinsic bottleneck that each machine must solve a problem of size $\geq$ k, and needs to communicate at least $\Omega$(k) points to the other machines. We propose a novel data partitioning idea to overcome this bottleneck, and in effect, have different machines focus on “finding different clusters”. Under the assumption that we know the optimum value of the objective up to a poly(n) factor (arbitrary polynomial), we establish worst-case approximation guarantees for our method. We see that our algorithm results in lower communication as well as a near-optimal number of ‘rounds’ of computation (in the popular MapReduce framework).'
volume: 80
URL: https://proceedings.mlr.press/v80/bhaskara18a.html
PDF: http://proceedings.mlr.press/v80/bhaskara18a/bhaskara18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bhaskara18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aditya
family: Bhaskara
- given: Maheshakya
family: Wijewardena
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 570-579
id: bhaskara18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 570
lastpage: 579
published: 2018-07-03 00:00:00 +0000
- title: 'Autoregressive Convolutional Neural Networks for Asynchronous Time Series'
abstract: 'We propose Significance-Offset Convolutional Neural Network, a deep convolutional network architecture for regression of multivariate asynchronous time series. The model is inspired by standard autoregressive (AR) models and gating mechanisms used in recurrent neural networks. It involves an AR-like weighting system, where the final predictor is obtained as a weighted sum of adjusted regressors, while the weights are data-dependent functions learnt through a convolutional network. The architecture was designed for applications on asynchronous time series and is evaluated on such datasets: a hedge fund proprietary dataset of over 2 million quotes for a credit derivative index, an artificially generated noisy autoregressive series and UCI household electricity consumption dataset. The proposed architecture achieves promising results as compared to convolutional and recurrent neural networks.'
volume: 80
URL: https://proceedings.mlr.press/v80/binkowski18a.html
PDF: http://proceedings.mlr.press/v80/binkowski18a/binkowski18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-binkowski18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mikolaj
family: Binkowski
- given: Gautier
family: Marti
- given: Philippe
family: Donnat
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 580-589
id: binkowski18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 580
lastpage: 589
published: 2018-07-03 00:00:00 +0000
- title: 'Adaptive Sampled Softmax with Kernel Based Sampling'
abstract: 'Softmax is the most commonly used output function for multiclass problems and is widely used in areas such as vision, natural language processing, and recommendation. A softmax model has linear costs in the number of classes which makes it too expensive for many real-world problems. A common approach to speed up training involves sampling only some of the classes at each training step. It is known that this method is biased and that the bias increases the more the sampling distribution deviates from the output distribution. Nevertheless, almost all recent work uses simple sampling distributions that require a large sample size to mitigate the bias. In this work, we propose a new class of kernel based sampling methods and develop an efficient sampling algorithm. Kernel based sampling adapts to the model as it is trained, thus resulting in low bias. It can also be easily applied to many models because it relies only on the model’s last hidden layer. We empirically study the trade-off of bias, sampling distribution and sample size and show that kernel based sampling results in low bias with few samples.'
volume: 80
URL: https://proceedings.mlr.press/v80/blanc18a.html
PDF: http://proceedings.mlr.press/v80/blanc18a/blanc18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-blanc18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Guy
family: Blanc
- given: Steffen
family: Rendle
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 590-599
id: blanc18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 590
lastpage: 599
published: 2018-07-03 00:00:00 +0000
- title: 'Optimizing the Latent Space of Generative Networks'
abstract: 'Generative Adversarial Networks (GANs) have achieved remarkable results in the task of generating realistic natural images. In most successful applications, GAN models share two common aspects: solving a challenging saddle point optimization problem, interpreted as an adversarial game between a generator and a discriminator functions; and parameterizing the generator and the discriminator as deep convolutional neural networks. The goal of this paper is to disentangle the contribution of these two factors to the success of GANs. In particular, we introduce Generative Latent Optimization (GLO), a framework to train deep convolutional generators using simple reconstruction losses. Throughout a variety of experiments, we show that GLO enjoys many of the desirable properties of GANs: synthesizing visually-appealing samples, interpolating meaningfully between samples, and performing linear arithmetic with noise vectors; all of this without the adversarial optimization scheme.'
volume: 80
URL: https://proceedings.mlr.press/v80/bojanowski18a.html
PDF: http://proceedings.mlr.press/v80/bojanowski18a/bojanowski18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bojanowski18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Piotr
family: Bojanowski
- given: Armand
family: Joulin
- given: David
family: Lopez-Pas
- given: Arthur
family: Szlam
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 600-609
id: bojanowski18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 600
lastpage: 609
published: 2018-07-03 00:00:00 +0000
- title: 'NetGAN: Generating Graphs via Random Walks'
abstract: 'We propose NetGAN - the first implicit generative model for graphs able to mimic real-world networks. We pose the problem of graph generation as learning the distribution of biased random walks over the input graph. The proposed model is based on a stochastic neural network that generates discrete output samples and is trained using the Wasserstein GAN objective. NetGAN is able to produce graphs that exhibit well-known network patterns without explicitly specifying them in the model definition. At the same time, our model exhibits strong generalization properties, as highlighted by its competitive link prediction performance, despite not being trained specifically for this task. Being the first approach to combine both of these desirable properties, NetGAN opens exciting avenues for further research.'
volume: 80
URL: https://proceedings.mlr.press/v80/bojchevski18a.html
PDF: http://proceedings.mlr.press/v80/bojchevski18a/bojchevski18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bojchevski18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aleksandar
family: Bojchevski
- given: Oleksandr
family: Shchur
- given: Daniel
family: Zügner
- given: Stephan
family: Günnemann
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 610-619
id: bojchevski18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 610
lastpage: 619
published: 2018-07-03 00:00:00 +0000
- title: 'A Progressive Batching L-BFGS Method for Machine Learning'
abstract: 'The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic line search, and stable quasi-Newton updating - and that performs well on training logistic regression and deep neural networks. We provide supporting convergence theory for the method.'
volume: 80
URL: https://proceedings.mlr.press/v80/bollapragada18a.html
PDF: http://proceedings.mlr.press/v80/bollapragada18a/bollapragada18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bollapragada18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Raghu
family: Bollapragada
- given: Jorge
family: Nocedal
- given: Dheevatsa
family: Mudigere
- given: Hao-Jun
family: Shi
- given: Ping Tak Peter
family: Tang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 620-629
id: bollapragada18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 620
lastpage: 629
published: 2018-07-03 00:00:00 +0000
- title: 'Prediction Rule Reshaping'
abstract: 'Two methods are proposed for high-dimensional shape-constrained regression and classification. These methods reshape pre-trained prediction rules to satisfy shape constraints like monotonicity and convexity. The first method can be applied to any pre-trained prediction rule, while the second method deals specifically with random forests. In both cases, efficient algorithms are developed for computing the estimators, and experiments are performed to demonstrate their performance on four datasets. We find that reshaping methods enforce shape constraints without compromising predictive accuracy.'
volume: 80
URL: https://proceedings.mlr.press/v80/bonakdarpour18a.html
PDF: http://proceedings.mlr.press/v80/bonakdarpour18a/bonakdarpour18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-bonakdarpour18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matt
family: Bonakdarpour
- given: Sabyasachi
family: Chatterjee
- given: Rina Foygel
family: Barber
- given: John
family: Lafferty
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 630-638
id: bonakdarpour18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 630
lastpage: 638
published: 2018-07-03 00:00:00 +0000
- title: 'QuantTree: Histograms for Change Detection in Multivariate Data Streams'
abstract: 'We address the problem of detecting distribution changes in multivariate data streams by means of histograms. Histograms are very general and flexible models, which have been relatively ignored in the change-detection literature as they often require a number of bins that grows unfeasibly with the data dimension. We present QuantTree, a recursive binary splitting scheme that adaptively defines the histogram bins to ease the detection of any distribution change. Our design scheme implies that i) we can easily control the overall number of bins and ii) the bin probabilities do not depend on the distribution of stationary data. This latter is a very relevant aspect in change detection, since thresholds of tests statistics based on these histograms (e.g., the Pearson statistic or the total variation) can be numerically computed from univariate and synthetically generated data, yet guaranteeing a controlled false positive rate. Our experiments show that the proposed histograms are very effective in detecting changes in high dimensional data streams, and that the resulting thresholds can effectively control the false positive rate, even when the number of training samples is relatively small.'
volume: 80
URL: https://proceedings.mlr.press/v80/boracchi18a.html
PDF: http://proceedings.mlr.press/v80/boracchi18a/boracchi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-boracchi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Giacomo
family: Boracchi
- given: Diego
family: Carrera
- given: Cristiano
family: Cervellera
- given: Danilo
family: Macciò
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 639-648
id: boracchi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 639
lastpage: 648
published: 2018-07-03 00:00:00 +0000
- title: 'Matrix Norms in Data Streams: Faster, Multi-Pass and Row-Order'
abstract: 'A central problem in mining massive data streams is characterizing which functions of an underlying frequency vector can be approximated efficiently. Given the prevalence of large scale linear algebra problems in machine learning, recently there has been considerable effort in extending this data stream problem to that of estimating functions of a matrix. This setting generalizes classical problems to the analogous ones for matrices. For example, instead of estimating frequent-item counts, we now wish to estimate “frequent-direction” counts. A related example is to estimate norms, which now correspond to estimating a vector norm on the singular values of the matrix. Despite recent efforts, the current understanding for such matrix problems is considerably weaker than that for vector problems. We study a number of aspects of estimating matrix norms in a stream that have not previously been considered: (1) multi-pass algorithms, (2) algorithms that see the underlying matrix one row at a time, and (3) time-efficient algorithms. Our multi-pass and row-order algorithms use less memory than what is provably required in the single-pass and entrywise-update models, and thus give separations between these models (in terms of memory). Moreover, all of our algorithms are considerably faster than previous ones. We also prove a number of lower bounds, and obtain for instance, a near-complete characterization of the memory required of row-order algorithms for estimating Schatten $p$-norms of sparse matrices. We complement our results with numerical experiments.'
volume: 80
URL: https://proceedings.mlr.press/v80/braverman18a.html
PDF: http://proceedings.mlr.press/v80/braverman18a/braverman18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-braverman18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vladimir
family: Braverman
- given: Stephen
family: Chestnut
- given: Robert
family: Krauthgamer
- given: Yi
family: Li
- given: David
family: Woodruff
- given: Lin
family: Yang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 649-658
id: braverman18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 649
lastpage: 658
published: 2018-07-03 00:00:00 +0000
- title: 'Predict and Constrain: Modeling Cardinality in Deep Structured Prediction'
abstract: 'Many machine learning problems require the prediction of multi-dimensional labels. Such structured prediction models can benefit from modeling dependencies between labels. Recently, several deep learning approaches to structured prediction have been proposed. Here we focus on capturing cardinality constraints in such models. Namely, constraining the number of non-zero labels that the model outputs. Such constraints have proven very useful in previous structured prediction methods, but it is a challenge to introduce them into a deep learning approach. Here we show how to do this via a novel deep architecture. Our approach outperforms strong baselines, achieving state-of-the-art results on multi-label classification benchmarks.'
volume: 80
URL: https://proceedings.mlr.press/v80/brukhim18a.html
PDF: http://proceedings.mlr.press/v80/brukhim18a/brukhim18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-brukhim18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nataly
family: Brukhim
- given: Amir
family: Globerson
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 659-667
id: brukhim18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 659
lastpage: 667
published: 2018-07-03 00:00:00 +0000
- title: 'Quasi-Monte Carlo Variational Inference'
abstract: 'Many machine learning problems involve Monte Carlo gradient estimators. As a prominent example, we focus on Monte Carlo variational inference (MCVI) in this paper. The performance of MCVI crucially depends on the variance of its stochastic gradients. We propose variance reduction by means of Quasi-Monte Carlo (QMC) sampling. QMC replaces N i.i.d. samples from a uniform probability distribution by a deterministic sequence of samples of length N. This sequence covers the underlying random variable space more evenly than i.i.d. draws, reducing the variance of the gradient estimator. With our novel approach, both the score function and the reparameterization gradient estimators lead to much faster convergence. We also propose a new algorithm for Monte Carlo objectives, where we operate with a constant learning rate and increase the number of QMC samples per iteration. We prove that this way, our algorithm can converge asymptotically at a faster rate than SGD . We furthermore provide theoretical guarantees on qmc for Monte Carlo objectives that go beyond MCVI , and support our findings by several experiments on large-scale data sets from various domains.'
volume: 80
URL: https://proceedings.mlr.press/v80/buchholz18a.html
PDF: http://proceedings.mlr.press/v80/buchholz18a/buchholz18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-buchholz18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexander
family: Buchholz
- given: Florian
family: Wenzel
- given: Stephan
family: Mandt
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 668-677
id: buchholz18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 668
lastpage: 677
published: 2018-07-03 00:00:00 +0000
- title: 'Path-Level Network Transformation for Efficient Architecture Search'
abstract: 'We introduce a new function-preserving transformation for efficient neural architecture search. This network transformation allows reusing previously trained networks and existing successful architectures that improves sample efficiency. We aim to address the limitation of current network transformation operations that can only perform layer-level architecture modifications, such as adding (pruning) filters or inserting (removing) a layer, which fails to change the topology of connection paths. Our proposed path-level transformation operations enable the meta-controller to modify the path topology of the given network while keeping the merits of reusing weights, and thus allow efficiently designing effective structures with complex path topologies like Inception models. We further propose a bidirectional tree-structured reinforcement learning meta-controller to explore a simple yet highly expressive tree-structured architecture space that can be viewed as a generalization of multi-branch architectures. We experimented on the image classification datasets with limited computational resources (about 200 GPU-hours), where we observed improved parameter efficiency and better test results (97.70% test accuracy on CIFAR-10 with 14.3M parameters and 74.6% top-1 accuracy on ImageNet in the mobile setting), demonstrating the effectiveness and transferability of our designed architectures.'
volume: 80
URL: https://proceedings.mlr.press/v80/cai18a.html
PDF: http://proceedings.mlr.press/v80/cai18a/cai18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-cai18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Han
family: Cai
- given: Jiacheng
family: Yang
- given: Weinan
family: Zhang
- given: Song
family: Han
- given: Yong
family: Yu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 678-687
id: cai18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 678
lastpage: 687
published: 2018-07-03 00:00:00 +0000
- title: 'Improved large-scale graph learning through ridge spectral sparsification'
abstract: 'The representation and learning benefits of methods based on graph Laplacians, such as Laplacian smoothing or harmonic function solution for semi-supervised learning (SSL), are empirically and theoretically well supported. Nonetheless, the exact versions of these methods scale poorly with the number of nodes $n$ of the graph. In this paper, we combine a spectral sparsification routine with Laplacian learning. Given a graph $G$ as input, our algorithm computes a sparsifier in a distributed way in $O(n\log^3(n))$ time, $O(m\log^3(n))$ work and $O(n\log(n))$ memory, using only $\log(n)$ rounds of communication. Furthermore, motivated by the regularization often employed in learning algorithms, we show that constructing sparsifiers that preserve the spectrum of the Laplacian only up to the regularization level may drastically reduce the size of the final graph. By constructing a spectrally-similar graph, we are able to bound the error induced by the sparsification for a variety of downstream tasks (e.g., SSL). We empirically validate the theoretical guarantees on Amazon co-purchase graph and compare to the state-of-the-art heuristics.'
volume: 80
URL: https://proceedings.mlr.press/v80/calandriello18a.html
PDF: http://proceedings.mlr.press/v80/calandriello18a/calandriello18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-calandriello18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniele
family: Calandriello
- given: Alessandro
family: Lazaric
- given: Ioannis
family: Koutis
- given: Michal
family: Valko
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 688-697
id: calandriello18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 688
lastpage: 697
published: 2018-07-03 00:00:00 +0000
- title: 'Bayesian Coreset Construction via Greedy Iterative Geodesic Ascent'
abstract: 'Coherent uncertainty quantification is a key strength of Bayesian methods. But modern algorithms for approximate Bayesian posterior inference often sacrifice accurate posterior uncertainty estimation in the pursuit of scalability. This work shows that previous Bayesian coreset construction algorithms—which build a small, weighted subset of the data that approximates the full dataset—are no exception. We demonstrate that these algorithms scale the coreset log-likelihood suboptimally, resulting in underestimated posterior uncertainty. To address this shortcoming, we develop greedy iterative geodesic ascent (GIGA), a novel algorithm for Bayesian coreset construction that scales the coreset log-likelihood optimally. GIGA provides geometric decay in posterior approximation error as a function of coreset size, and maintains the fast running time of its predecessors. The paper concludes with validation of GIGA on both synthetic and real datasets, demonstrating that it reduces posterior approximation error by orders of magnitude compared with previous coreset constructions.'
volume: 80
URL: https://proceedings.mlr.press/v80/campbell18a.html
PDF: http://proceedings.mlr.press/v80/campbell18a/campbell18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-campbell18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Trevor
family: Campbell
- given: Tamara
family: Broderick
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 698-706
id: campbell18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 698
lastpage: 706
published: 2018-07-03 00:00:00 +0000
- title: 'Adversarial Learning with Local Coordinate Coding'
abstract: 'Generative adversarial networks (GANs) aim to generate realistic data from some prior distribution (e.g., Gaussian noises). However, such prior distribution is often independent of real data and thus may lose semantic information (e.g., geometric structure or content in images) of data. In practice, the semantic information might be represented by some latent distribution learned from data, which, however, is hard to be used for sampling in GANs. In this paper, rather than sampling from the pre-defined prior distribution, we propose a Local Coordinate Coding (LCC) based sampling method to improve GANs. We derive a generalization bound for LCC based GANs and prove that a small dimensional input is sufficient to achieve good generalization. Extensive experiments on various real-world datasets demonstrate the effectiveness of the proposed method.'
volume: 80
URL: https://proceedings.mlr.press/v80/cao18a.html
PDF: http://proceedings.mlr.press/v80/cao18a/cao18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-cao18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiezhang
family: Cao
- given: Yong
family: Guo
- given: Qingyao
family: Wu
- given: Chunhua
family: Shen
- given: Junzhou
family: Huang
- given: Mingkui
family: Tan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 707-715
id: cao18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 707
lastpage: 715
published: 2018-07-03 00:00:00 +0000
- title: 'Fair and Diverse DPP-Based Data Summarization'
abstract: 'Sampling methods that choose a subset of the data proportional to its diversity in the feature space are popular for data summarization. However, recent studies have noted the occurrence of bias {–} e.g., under or over representation of a particular gender or ethnicity {–} in such data summarization methods. In this paper we initiate a study of the problem of outputting a diverse and fair summary of a given dataset. We work with a well-studied determinantal measure of diversity and corresponding distributions (DPPs) and present a framework that allows us to incorporate a general class of fairness constraints into such distributions. Designing efficient algorithms to sample from these constrained determinantal distributions, however, suffers from a complexity barrier; we present a fast sampler that is provably good when the input vectors satisfy a natural property. Our empirical results on both real-world and synthetic datasets show that the diversity of the samples produced by adding fairness constraints is not too far from the unconstrained case.'
volume: 80
URL: https://proceedings.mlr.press/v80/celis18a.html
PDF: http://proceedings.mlr.press/v80/celis18a/celis18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-celis18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elisa
family: Celis
- given: Vijay
family: Keswani
- given: Damian
family: Straszak
- given: Amit
family: Deshpande
- given: Tarun
family: Kathuria
- given: Nisheeth
family: Vishnoi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 716-725
id: celis18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 716
lastpage: 725
published: 2018-07-03 00:00:00 +0000
- title: 'Conditional Noise-Contrastive Estimation of Unnormalised Models'
abstract: 'Many parametric statistical models are not properly normalised and only specified up to an intractable partition function, which renders parameter estimation difficult. Examples of unnormalised models are Gibbs distributions, Markov random fields, and neural network models in unsupervised deep learning. In previous work, the estimation principle called noise-contrastive estimation (NCE) was introduced where unnormalised models are estimated by learning to distinguish between data and auxiliary noise. An open question is how to best choose the auxiliary noise distribution. We here propose a new method that addresses this issue. The proposed method shares with NCE the idea of formulating density estimation as a supervised learning problem but in contrast to NCE, the proposed method leverages the observed data when generating noise samples. The noise can thus be generated in a semi-automated manner. We first present the underlying theory of the new method, show that score matching emerges as a limiting case, validate the method on continuous and discrete valued synthetic data, and show that we can expect an improved performance compared to NCE when the data lie in a lower-dimensional manifold. Then we demonstrate its applicability in unsupervised deep learning by estimating a four-layer neural image model.'
volume: 80
URL: https://proceedings.mlr.press/v80/ceylan18a.html
PDF: http://proceedings.mlr.press/v80/ceylan18a/ceylan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ceylan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ciwan
family: Ceylan
- given: Michael U.
family: Gutmann
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 726-734
id: ceylan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 726
lastpage: 734
published: 2018-07-03 00:00:00 +0000
- title: 'Adversarial Time-to-Event Modeling'
abstract: 'Modern health data science applications leverage abundant molecular and electronic health data, providing opportunities for machine learning to build statistical models to support clinical practice. Time-to-event analysis, also called survival analysis, stands as one of the most representative examples of such statistical models. We present a deep-network-based approach that leverages adversarial learning to address a key challenge in modern time-to-event modeling: nonparametric estimation of event-time distributions. We also introduce a principled cost function to exploit information from censored events (events that occur subsequent to the observation window). Unlike most time-to-event models, we focus on the estimation of time-to-event distributions, rather than time ordering. We validate our model on both benchmark and real datasets, demonstrating that the proposed formulation yields significant performance gains relative to a parametric alternative, which we also propose.'
volume: 80
URL: https://proceedings.mlr.press/v80/chapfuwa18a.html
PDF: http://proceedings.mlr.press/v80/chapfuwa18a/chapfuwa18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chapfuwa18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Paidamoyo
family: Chapfuwa
- given: Chenyang
family: Tao
- given: Chunyuan
family: Li
- given: Courtney
family: Page
- given: Benjamin
family: Goldstein
- given: Lawrence Carin
family: Duke
- given: Ricardo
family: Henao
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 735-744
id: chapfuwa18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 735
lastpage: 744
published: 2018-07-03 00:00:00 +0000
- title: 'Stability and Generalization of Learning Algorithms that Converge to Global Optima'
abstract: 'We establish novel generalization bounds for learning algorithms that converge to global minima. We derive black-box stability results that only depend on the convergence of a learning algorithm and the geometry around the minimizers of the empirical risk function. The results are shown for non-convex loss functions satisfying the Polyak-Lojasiewicz (PL) and the quadratic growth (QG) conditions, which we show arise for 1-layer neural networks with leaky ReLU activations and deep neural networks with linear activations. We use our results to establish the stability of first-order methods such as stochastic gradient descent (SGD), gradient descent (GD), randomized coordinate descent (RCD), and the stochastic variance reduced gradient method (SVRG), in both the PL and the strongly convex setting. Our results match or improve state-of-the-art generalization bounds and can easily extend to similar optimization algorithms. Finally, although our results imply comparable stability for SGD and GD in the PL setting, we show that there exist simple quadratic models with multiple local minima where SGD is stable but GD is not.'
volume: 80
URL: https://proceedings.mlr.press/v80/charles18a.html
PDF: http://proceedings.mlr.press/v80/charles18a/charles18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-charles18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zachary
family: Charles
- given: Dimitris
family: Papailiopoulos
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 745-754
id: charles18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 745
lastpage: 754
published: 2018-07-03 00:00:00 +0000
- title: 'Learning and Memorization'
abstract: 'In the machine learning research community, it is generally believed that there is a tension between memorization and generalization. In this work we examine to what extent this tension exists by exploring if it is possible to generalize by memorizing alone. Although direct memorization with a lookup table obviously does not generalize, we find that introducing depth in the form of a network of support-limited lookup tables leads to generalization that is significantly above chance and closer to those obtained by standard learning algorithms on several tasks derived from MNIST and CIFAR-10. Furthermore, we demonstrate through a series of empirical results that our approach allows for a smooth tradeoff between memorization and generalization and exhibits some of the most salient characteristics of neural networks: depth improves performance; random data can be memorized and yet there is generalization on real data; and memorizing random data is harder in a certain sense than memorizing real data. The extreme simplicity of the algorithm and potential connections with generalization theory point to several interesting directions for future research.'
volume: 80
URL: https://proceedings.mlr.press/v80/chatterjee18a.html
PDF: http://proceedings.mlr.press/v80/chatterjee18a/chatterjee18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chatterjee18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Satrajit
family: Chatterjee
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 755-763
id: chatterjee18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 755
lastpage: 763
published: 2018-07-03 00:00:00 +0000
- title: 'On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo'
abstract: 'We provide convergence guarantees in Wasserstein distance for a variety of variance-reduction methods: SAGA Langevin diffusion, SVRG Langevin diffusion and control-variate underdamped Langevin diffusion. We analyze these methods under a uniform set of assumptions on the log-posterior distribution, assuming it to be smooth, strongly convex and Hessian Lipschitz. This is achieved by a new proof technique combining ideas from finite-sum optimization and the analysis of sampling methods. Our sharp theoretical bounds allow us to identify regimes of interest where each method performs better than the others. Our theory is verified with experiments on real-world and synthetic datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/chatterji18a.html
PDF: http://proceedings.mlr.press/v80/chatterji18a/chatterji18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chatterji18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Niladri
family: Chatterji
- given: Nicolas
family: Flammarion
- given: Yian
family: Ma
- given: Peter
family: Bartlett
- given: Michael
family: Jordan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 764-773
id: chatterji18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 764
lastpage: 773
published: 2018-07-03 00:00:00 +0000
- title: 'Hierarchical Clustering with Structural Constraints'
abstract: 'Hierarchical clustering is a popular unsupervised data analysis method. For many real-world applications, we would like to exploit prior information about the data that imposes constraints on the clustering hierarchy, and is not captured by the set of features available to the algorithm. This gives rise to the problem of hierarchical clustering with structural constraints. Structural constraints pose major challenges for bottom-up approaches like average/single linkage and even though they can be naturally incorporated into top-down divisive algorithms, no formal guarantees exist on the quality of their output. In this paper, we provide provable approximation guarantees for two simple top-down algorithms, using a recently introduced optimization viewpoint of hierarchical clustering with pairwise similarity information (Dasgupta, 2016). We show how to find good solutions even in the presence of conflicting prior information, by formulating a constraint-based regularization of the objective. Furthemore, we explore a variation of this objective for dissimilarity information (Cohen-Addad et al., 2018) and improve upon current techniques. Finally, we demonstrate our approach on a real dataset for the taxonomy application.'
volume: 80
URL: https://proceedings.mlr.press/v80/chatziafratis18a.html
PDF: http://proceedings.mlr.press/v80/chatziafratis18a/chatziafratis18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chatziafratis18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vaggos
family: Chatziafratis
- given: Rad
family: Niazadeh
- given: Moses
family: Charikar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 774-783
id: chatziafratis18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 774
lastpage: 783
published: 2018-07-03 00:00:00 +0000
- title: 'Hierarchical Deep Generative Models for Multi-Rate Multivariate Time Series'
abstract: 'Multi-Rate Multivariate Time Series (MR-MTS) are the multivariate time series observations which come with various sampling rates and encode multiple temporal dependencies. State-space models such as Kalman filters and deep learning models such as deep Markov models are mainly designed for time series data with the same sampling rate and cannot capture all the dependencies present in the MR-MTS data. To address this challenge, we propose the Multi-Rate Hierarchical Deep Markov Model (MR-HDMM), a novel deep generative model which uses the latent hierarchical structure with a learnable switch mechanism to capture the temporal dependencies of MR-MTS. Experimental results on two real-world datasets demonstrate that our MR-HDMM model outperforms the existing state-of-the-art deep learning and state-space models on forecasting and interpolation tasks. In addition, the latent hierarchies in our model provide a way to show and interpret the multiple temporal dependencies.'
volume: 80
URL: https://proceedings.mlr.press/v80/che18a.html
PDF: http://proceedings.mlr.press/v80/che18a/che18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-che18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhengping
family: Che
- given: Sanjay
family: Purushotham
- given: Guangyu
family: Li
- given: Bo
family: Jiang
- given: Yan
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 784-793
id: che18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 784
lastpage: 793
published: 2018-07-03 00:00:00 +0000
- title: 'GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks'
abstract: 'Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process that incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18a.html
PDF: http://proceedings.mlr.press/v80/chen18a/chen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhao
family: Chen
- given: Vijay
family: Badrinarayanan
- given: Chen-Yu
family: Lee
- given: Andrew
family: Rabinovich
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 794-803
id: chen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 794
lastpage: 803
published: 2018-07-03 00:00:00 +0000
- title: 'Weakly Submodular Maximization Beyond Cardinality Constraints: Does Randomization Help Greedy?'
abstract: 'Submodular functions are a broad class of set functions that naturally arise in many machine learning applications. Due to their combinatorial structures, there has been a myriad of algorithms for maximizing such functions under various constraints. Unfortunately, once a function deviates from submodularity (even slightly), the known algorithms may perform arbitrarily poorly. Amending this issue, by obtaining approximation results for functions obeying properties that generalize submodularity, has been the focus of several recent works. One such class, known as weakly submodular functions, has received a lot of recent attention from the machine learning community due to its strong connections to restricted strong convexity and sparse reconstruction. In this paper, we prove that a randomized version of the greedy algorithm achieves an approximation ratio of $(1 + 1/\gamma )^{-2}$ for weakly submodular maximization subject to a general matroid constraint, where $\gamma$ is a parameter measuring the distance from submodularity. To the best of our knowledge, this is the first algorithm with a non-trivial approximation guarantee for this constrained optimization problem. Moreover, our experimental results show that our proposed algorithm performs well in a variety of real-world problems, including regression, video summarization, splice site detection, and black-box interpretation.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18b.html
PDF: http://proceedings.mlr.press/v80/chen18b/chen18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lin
family: Chen
- given: Moran
family: Feldman
- given: Amin
family: Karbasi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 804-813
id: chen18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 804
lastpage: 813
published: 2018-07-03 00:00:00 +0000
- title: 'Projection-Free Online Optimization with Stochastic Gradient: From Convexity to Submodularity'
abstract: 'Online optimization has been a successful framework for solving large-scale problems under computational constraints and partial information. Current methods for online convex optimization require either a projection or exact gradient computation at each step, both of which can be prohibitively expensive for large-scale applications. At the same time, there is a growing trend of non-convex optimization in machine learning community and a need for online methods. Continuous DR-submodular functions, which exhibit a natural diminishing returns condition, have recently been proposed as a broad class of non-convex functions which may be efficiently optimized. Although online methods have been introduced, they suffer from similar problems. In this work, we propose Meta-Frank-Wolfe, the first online projection-free algorithm that uses stochastic gradient estimates. The algorithm relies on a careful sampling of gradients in each round and achieves the optimal $O( \sqrt{T})$ adversarial regret bounds for convex and continuous submodular optimization. We also propose One-Shot Frank-Wolfe, a simpler algorithm which requires only a single stochastic gradient estimate in each round and achieves an $O(T^{2/3})$ stochastic regret bound for convex and continuous submodular optimization. We apply our methods to develop a novel "lifting" framework for the online discrete submodular maximization and also see that they outperform current state-of-the-art techniques on various experiments.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18c.html
PDF: http://proceedings.mlr.press/v80/chen18c/chen18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lin
family: Chen
- given: Christopher
family: Harshaw
- given: Hamed
family: Hassani
- given: Amin
family: Karbasi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 814-823
id: chen18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 814
lastpage: 823
published: 2018-07-03 00:00:00 +0000
- title: 'Continuous-Time Flows for Efficient Inference and Density Estimation'
abstract: 'Two fundamental problems in unsupervised learning are efficient inference for latent-variable models and robust density estimation based on large amounts of unlabeled data. Algorithms for the two tasks, such as normalizing flows and generative adversarial networks (GANs), are often developed independently. In this paper, we propose the concept of *continuous-time flows* (CTFs), a family of diffusion-based methods that are able to asymptotically approach a target distribution. Distinct from normalizing flows and GANs, CTFs can be adopted to achieve the above two goals in one framework, with theoretical guarantees. Our framework includes distilling knowledge from a CTF for efficient inference, and learning an explicit energy-based distribution with CTFs for density estimation. Both tasks rely on a new technique for distribution matching within amortized learning. Experiments on various tasks demonstrate promising performance of the proposed CTF framework, compared to related techniques.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18d.html
PDF: http://proceedings.mlr.press/v80/chen18d/chen18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Changyou
family: Chen
- given: Chunyuan
family: Li
- given: Liqun
family: Chen
- given: Wenlin
family: Wang
- given: Yunchen
family: Pu
- given: Lawrence Carin
family: Duke
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 824-833
id: chen18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 824
lastpage: 833
published: 2018-07-03 00:00:00 +0000
- title: 'Scalable Bilinear Pi Learning Using State and Action Features'
abstract: 'Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear $\pi$ learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts linear and bilinear models to represent the high-dimensional value function and state-action distributions, respectively, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18e.html
PDF: http://proceedings.mlr.press/v80/chen18e/chen18e.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yichen
family: Chen
- given: Lihong
family: Li
- given: Mengdi
family: Wang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 834-843
id: chen18e
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 834
lastpage: 843
published: 2018-07-03 00:00:00 +0000
- title: 'Stein Points'
abstract: 'An important task in computational statistics and machine learning is to approximate a posterior distribution $p(x)$ with an empirical measure supported on a set of representative points $\{x_i\}_{i=1}^n$. This paper focuses on methods where the selection of points is essentially deterministic, with an emphasis on achieving accurate approximation when $n$ is small. To this end, we present Stein Points. The idea is to exploit either a greedy or a conditional gradient method to iteratively minimise a kernel Stein discrepancy between the empirical measure and $p(x)$. Our empirical results demonstrate that Stein Points enable accurate approximation of the posterior at modest computational cost. In addition, theoretical results are provided to establish convergence of the method.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18f.html
PDF: http://proceedings.mlr.press/v80/chen18f/chen18f.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wilson Ye
family: Chen
- given: Lester
family: Mackey
- given: Jackson
family: Gorham
- given: Francois-Xavier
family: Briol
- given: Chris
family: Oates
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 844-853
id: chen18f
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 844
lastpage: 853
published: 2018-07-03 00:00:00 +0000
- title: 'Learning K-way D-dimensional Discrete Codes for Compact Embedding Representations'
abstract: 'Conventional embedding methods directly associate each symbol with a continuous embedding vector, which is equivalent to applying a linear transformation based on a “one-hot” encoding of the discrete symbols. Despite its simplicity, such approach yields the number of parameters that grows linearly with the vocabulary size and can lead to overfitting. In this work, we propose a much more compact K-way D-dimensional discrete encoding scheme to replace the “one-hot" encoding. In the proposed “KD encoding”, each symbol is represented by a $D$-dimensional code with a cardinality of $K$, and the final symbol embedding vector is generated by composing the code embedding vectors. To end-to-end learn semantically meaningful codes, we derive a relaxed discrete optimization approach based on stochastic gradient descent, which can be generally applied to any differentiable computational graph with an embedding layer. In our experiments with various applications from natural language processing to graph convolutional networks, the total size of the embedding layer can be reduced up to 98% while achieving similar or better performance.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18g.html
PDF: http://proceedings.mlr.press/v80/chen18g/chen18g.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ting
family: Chen
- given: Martin Renqiang
family: Min
- given: Yizhou
family: Sun
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 854-863
id: chen18g
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 854
lastpage: 863
published: 2018-07-03 00:00:00 +0000
- title: 'PixelSNAIL: An Improved Autoregressive Generative Model'
abstract: 'Autoregressive generative models achieve the best results in density estimation tasks involving high dimensional data, such as images or audio. They pose density estimation as a sequence modeling task, where a recurrent neural network (RNN) models the conditional distribution over the next element conditioned on all previous elements. In this paradigm, the bottleneck is the extent to which the RNN can model long-range dependencies, and the most successful approaches rely on causal convolutions. Taking inspiration from recent work in meta reinforcement learning, where dealing with long-range dependencies is also essential, we introduce a new generative model architecture that combines causal convolutions with self attention. In this paper, we describe the resulting model and present state-of-the-art log-likelihood results on heavily benchmarked datasets: CIFAR-10, $32 \times 32$ ImageNet and $64 \times 64$ ImageNet. Our implementation will be made available at \url{https://github.com/neocxi/pixelsnail-public}.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18h.html
PDF: http://proceedings.mlr.press/v80/chen18h/chen18h.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: XI
family: Chen
- given: Nikhil
family: Mishra
- given: Mostafa
family: Rohaninejad
- given: Pieter
family: Abbeel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 864-872
id: chen18h
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 864
lastpage: 872
published: 2018-07-03 00:00:00 +0000
- title: 'Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks'
abstract: 'Recurrent neural networks have gained widespread use in modeling sequence data across various domains. While many successful recurrent architectures employ a notion of gating, the exact mechanism that enables such remarkable performance is not well understood. We develop a theory for signal propagation in recurrent networks after random initialization using a combination of mean field theory and random matrix theory. To simplify our discussion, we introduce a new RNN cell with a simple gating mechanism that we call the minimalRNN and compare it with vanilla RNNs. Our theory allows us to define a maximum timescale over which RNNs can remember an input. We show that this theory predicts trainability for both recurrent architectures. We show that gated recurrent networks feature a much broader, more robust, trainable region than vanilla RNNs, which corroborates recent experimental findings. Finally, we develop a closed-form critical initialization scheme that achieves dynamical isometry in both vanilla RNNs and minimalRNNs. We show that this results in significantly improved training dynamics. Finally, we demonstrate that the minimalRNN achieves comparable performance to its more complex counterparts, such as LSTMs or GRUs, on a language modeling task.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18i.html
PDF: http://proceedings.mlr.press/v80/chen18i/chen18i.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18i.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Minmin
family: Chen
- given: Jeffrey
family: Pennington
- given: Samuel
family: Schoenholz
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 873-882
id: chen18i
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 873
lastpage: 882
published: 2018-07-03 00:00:00 +0000
- title: 'Learning to Explain: An Information-Theoretic Perspective on Model Interpretation'
abstract: 'We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show the effectiveness of our method on a variety of synthetic and real data sets using both quantitative metrics and human evaluation.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18j.html
PDF: http://proceedings.mlr.press/v80/chen18j/chen18j.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18j.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jianbo
family: Chen
- given: Le
family: Song
- given: Martin
family: Wainwright
- given: Michael
family: Jordan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 883-892
id: chen18j
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 883
lastpage: 892
published: 2018-07-03 00:00:00 +0000
- title: 'Variational Inference and Model Selection with Generalized Evidence Bounds'
abstract: 'Recent advances on the scalability and flexibility of variational inference have made it successful at unravelling hidden patterns in complex data. In this work we propose a new variational bound formulation, yielding an estimator that extends beyond the conventional variational bound. It naturally subsumes the importance-weighted and Renyi bounds as special cases, and it is provably sharper than these counterparts. We also present an improved estimator for variational learning, and advocate a novel high signal-to-variance ratio update rule for the variational parameters. We discuss model-selection issues associated with existing evidence-lower-bound-based variational inference procedures, and show how to leverage the flexibility of our new formulation to address them. Empirical evidence is provided to validate our claims.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18k.html
PDF: http://proceedings.mlr.press/v80/chen18k/chen18k.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18k.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Liqun
family: Chen
- given: Chenyang
family: Tao
- given: Ruiyi
family: Zhang
- given: Ricardo
family: Henao
- given: Lawrence Carin
family: Duke
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 893-902
id: chen18k
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 893
lastpage: 902
published: 2018-07-03 00:00:00 +0000
- title: 'DRACO: Byzantine-resilient Distributed Training via Redundant Gradients'
abstract: 'Distributed model training is vulnerable to byzantine system failures and adversarial compute nodes, i.e., nodes that use malicious updates to corrupt the global model stored at a parameter server (PS). To guarantee some form of robustness, recent work suggests using variants of the geometric median as an aggregation rule, in place of gradient averaging. Unfortunately, median-based rules can incur a prohibitive computational overhead in large-scale settings, and their convergence guarantees often require strong assumptions. In this work, we present DRACO, a scalable framework for robust distributed training that uses ideas from coding theory. In DRACO, each compute node evaluates redundant gradients that are used by the parameter server to eliminate the effects of adversarial updates. DRACO comes with problem-independent robustness guarantees, and the model that it trains is identical to the one trained in the adversary-free setup. We provide extensive experiments on real datasets and distributed setups across a variety of large-scale models, where we show that DRACO is several times, to orders of magnitude faster than median-based approaches.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18l.html
PDF: http://proceedings.mlr.press/v80/chen18l/chen18l.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18l.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lingjiao
family: Chen
- given: Hongyi
family: Wang
- given: Zachary
family: Charles
- given: Dimitris
family: Papailiopoulos
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 903-912
id: chen18l
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 903
lastpage: 912
published: 2018-07-03 00:00:00 +0000
- title: 'SADAGRAD: Strongly Adaptive Stochastic Gradient Methods'
abstract: 'Although the convergence rates of existing variants of ADAGRAD have a better dependence on the number of iterations under the strong convexity condition, their iteration complexities have a explicitly linear dependence on the dimensionality of the problem. To alleviate this bad dependence, we propose a simple yet novel variant of ADAGRAD for stochastic (weakly) strongly convex optimization. Different from existing variants, the proposed variant (referred to as SADAGRAD) uses an adaptive restarting scheme in which (i) ADAGRAD serves as a sub-routine and is restarted periodically; (ii) the number of iterations for restarting ADAGRAD depends on the history of learning that incorporates knowledge of the geometry of the data. In addition to the adaptive proximal functions and adaptive number of iterations for restarting, we also develop a variant that is adaptive to the (implicit) strong convexity from the data, which together makes the proposed algorithm strongly adaptive. In terms of iteration complexity, in the worst case SADAGRAD has an O(1/\epsilon) for finding an \epsilon-optimal solution similar to other variants. However, it could enjoy faster convergence and much better dependence on the problem’s dimensionality when stochastic gradients are sparse. Extensive experiments on large-scale data sets demonstrate the efficiency of the proposed algorithms in comparison with several variants of ADAGRAD and stochastic gradient method.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18m.html
PDF: http://proceedings.mlr.press/v80/chen18m/chen18m.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18m.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zaiyi
family: Chen
- given: Yi
family: Xu
- given: Enhong
family: Chen
- given: Tianbao
family: Yang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 913-921
id: chen18m
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 913
lastpage: 921
published: 2018-07-03 00:00:00 +0000
- title: 'Covariate Adjusted Precision Matrix Estimation via Nonconvex Optimization'
abstract: 'We propose a nonconvex estimator for the covariate adjusted precision matrix estimation problem in the high dimensional regime, under sparsity constraints. To solve this estimator, we propose an alternating gradient descent algorithm with hard thresholding. Compared with existing methods along this line of research, which lack theoretical guarantees in optimization error and/or statistical error, the proposed algorithm not only is computationally much more efficient with a linear rate of convergence, but also attains the optimal statistical rate up to a logarithmic factor. Thorough experiments on both synthetic and real data support our theory.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18n.html
PDF: http://proceedings.mlr.press/v80/chen18n/chen18n.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18n.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jinghui
family: Chen
- given: Pan
family: Xu
- given: Lingxiao
family: Wang
- given: Jian
family: Ma
- given: Quanquan
family: Gu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 922-931
id: chen18n
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 922
lastpage: 931
published: 2018-07-03 00:00:00 +0000
- title: 'End-to-End Learning for the Deep Multivariate Probit Model'
abstract: 'The multivariate probit model (MVP) is a popular classic model for studying binary responses of multiple entities. Nevertheless, the computational challenge of learning the MVP model, given that its likelihood involves integrating over a multidimensional constrained space of latent variables, significantly limits its application in practice. We propose a flexible deep generalization of the classic MVP, the Deep Multivariate Probit Model (DMVP), which is an end-to-end learning scheme that uses an efficient parallel sampling process of the multivariate probit model to exploit GPU-boosted deep neural networks. We present both theoretical and empirical analysis of the convergence behavior of DMVP’s sampling process with respect to the resolution of the correlation structure. We provide convergence guarantees for DMVP and our empirical analysis demonstrates the advantages of DMVP’s sampling compared with standard MCMC-based methods. We also show that when applied to multi-entity modelling problems, which are natural DMVP applications, DMVP trains faster than classical MVP, by at least an order of magnitude, captures rich correlations among entities, and further improves the joint likelihood of entities compared with several competitive models.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18o.html
PDF: http://proceedings.mlr.press/v80/chen18o/chen18o.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18o.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Di
family: Chen
- given: Yexiang
family: Xue
- given: Carla
family: Gomes
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 932-941
id: chen18o
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 932
lastpage: 941
published: 2018-07-03 00:00:00 +0000
- title: 'Stochastic Training of Graph Convolutional Networks with Variance Reduction'
abstract: 'Graph convolutional networks (GCNs) are powerful deep neural networks for graph-structured data. However, GCN computes the representation of a node recursively from its neighbors, making the receptive field size grow exponentially with the number of layers. Previous attempts on reducing the receptive field size by subsampling neighbors do not have convergence guarantee, and their receptive field size per node is still in the order of hundreds. In this paper, we develop control variate based algorithms with new theoretical guarantee to converge to a local optimum of GCN regardless of the neighbor sampling size. Empirical results show that our algorithms enjoy similar convergence rate and model quality with the exact algorithm using only two neighbors per node. The running time of our algorithms on a large Reddit dataset is only one seventh of previous neighbor sampling algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/chen18p.html
PDF: http://proceedings.mlr.press/v80/chen18p/chen18p.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chen18p.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jianfei
family: Chen
- given: Jun
family: Zhu
- given: Le
family: Song
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 942-950
id: chen18p
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 942
lastpage: 950
published: 2018-07-03 00:00:00 +0000
- title: 'Extreme Learning to Rank via Low Rank Assumption'
abstract: 'We consider the setting where we wish to perform ranking for hundreds of thousands of users which is common in recommender systems and web search ranking. Learning a single ranking function is unlikely to capture the variability across all users while learning a ranking function for each person is time-consuming and requires large amounts of data from each user. To address this situation, we propose a Factorization RankSVM algorithm which learns a series of k basic ranking functions and then constructs for each user a local ranking function that is a combination of them. We develop a fast algorithm to reduce the time complexity of gradient descent solver by exploiting the low-rank structure, and the resulting algorithm is much faster than existing methods. Furthermore, we prove that the generalization error of the proposed method can be significantly better than training individual RankSVMs. Finally, we present some interesting patterns in the principal ranking functions learned by our algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/cheng18a.html
PDF: http://proceedings.mlr.press/v80/cheng18a/cheng18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-cheng18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Minhao
family: Cheng
- given: Ian
family: Davidson
- given: Cho-Jui
family: Hsieh
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 951-960
id: cheng18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 951
lastpage: 960
published: 2018-07-03 00:00:00 +0000
- title: 'Learning a Mixture of Two Multinomial Logits'
abstract: 'The classical Multinomial Logit (MNL) is a behavioral model for user choice. In this model, a user is offered a slate of choices (a subset of a finite universe of $n$ items), and selects exactly one item from the slate, each with probability proportional to its (positive) weight. Given a set of observed slates and choices, the likelihood-maximizing item weights are easy to learn at scale, and easy to interpret. However, the model fails to represent common real-world behavior. As a result, researchers in user choice often turn to mixtures of MNLs, which are known to approximate a large class of models of rational user behavior. Unfortunately, the only known algorithms for this problem have been heuristic in nature. In this paper we give the first polynomial-time algorithms for exact learning of uniform mixtures of two MNLs. Interestingly, the parameters of the model can be learned for any $n$ by sampling the behavior of random users only on slates of sizes 2 and 3; in contrast, we show that slates of size 2 are insufficient by themselves.'
volume: 80
URL: https://proceedings.mlr.press/v80/chierichetti18a.html
PDF: http://proceedings.mlr.press/v80/chierichetti18a/chierichetti18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chierichetti18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Flavio
family: Chierichetti
- given: Ravi
family: Kumar
- given: Andrew
family: Tomkins
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 961-969
id: chierichetti18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 961
lastpage: 969
published: 2018-07-03 00:00:00 +0000
- title: 'Structured Evolution with Compact Architectures for Scalable Policy Optimization'
abstract: 'We present a new method of blackbox optimization via gradient approximation with the use of structured random orthogonal matrices, providing more accurate estimators than baselines and with provable theoretical guarantees. We show that this algorithm can be successfully applied to learn better quality compact policies than those using standard gradient estimation techniques. The compact policies we learn have several advantages over unstructured ones, including faster training algorithms and faster inference. These benefits are important when the policy is deployed on real hardware with limited resources. Further, compact policies provide more scalable architectures for derivative-free optimization (DFO) in high-dimensional spaces. We show that most robotics tasks from the OpenAI Gym can be solved using neural networks with less than 300 parameters, with almost linear time complexity of the inference phase, with up to 13x fewer parameters relative to the Evolution Strategies (ES) algorithm introduced by Salimans et al. (2017). We do not need heuristics such as fitness shaping to learn good quality policies, resulting in a simple and theoretically motivated training mechanism.'
volume: 80
URL: https://proceedings.mlr.press/v80/choromanski18a.html
PDF: http://proceedings.mlr.press/v80/choromanski18a/choromanski18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-choromanski18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Krzysztof
family: Choromanski
- given: Mark
family: Rowland
- given: Vikas
family: Sindhwani
- given: Richard
family: Turner
- given: Adrian
family: Weller
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 970-978
id: choromanski18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 970
lastpage: 978
published: 2018-07-03 00:00:00 +0000
- title: 'Path Consistency Learning in Tsallis Entropy Regularized MDPs'
abstract: 'We study the sparse entropy-regularized reinforcement learning (ERL) problem in which the entropy term is a special form of the Tsallis entropy. The optimal policy of this formulation is sparse, i.e., at each state, it has non-zero probability for only a small number of actions. This addresses the main drawback of the standard Shannon entropy-regularized RL (soft ERL) formulation, in which the optimal policy is softmax, and thus, may assign a non-negligible probability mass to non-optimal actions. This problem is aggravated as the number of actions is increased. In this paper, we follow the work of Nachum et al. (2017) in the soft ERL setting, and propose a class of novel path consistency learning (PCL) algorithms, called sparse PCL, for the sparse ERL problem that can work with both on-policy and off-policy data. We first derive a sparse consistency equation that specifies a relationship between the optimal value function and policy of the sparse ERL along any system trajectory. Crucially, a weak form of the converse is also true, and we quantify the sub-optimality of a policy which satisfies sparse consistency, and show that as we increase the number of actions, this sub-optimality is better than that of the soft ERL optimal policy. We then use this result to derive the sparse PCL algorithms. We empirically compare sparse PCL with its soft counterpart, and show its advantage, especially in problems with a large number of actions.'
volume: 80
URL: https://proceedings.mlr.press/v80/chow18a.html
PDF: http://proceedings.mlr.press/v80/chow18a/chow18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chow18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yinlam
family: Chow
- given: Ofir
family: Nachum
- given: Mohammad
family: Ghavamzadeh
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 979-988
id: chow18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 979
lastpage: 988
published: 2018-07-03 00:00:00 +0000
- title: 'An Iterative, Sketching-based Framework for Ridge Regression'
abstract: 'Ridge regression is a variant of regularized least squares regression that is particularly suitable in settings where the number of predictor variables greatly exceeds the number of observations. We present a simple, iterative, sketching-based algorithm for ridge regression that guarantees high-quality approximations to the optimal solution vector. Our analysis builds upon two simple structural results that boil down to randomized matrix multiplication, a fundamental and well-understood primitive of randomized linear algebra. An important contribution of our work is the analysis of the behavior of subsampled ridge regression problems when the ridge leverage scores are used: we prove that accurate approximations can be achieved by a sample whose size depends on the degrees of freedom of the ridge-regression problem rather than the dimensions of the design matrix. Our experimental evaluations verify our theoretical results on both real and synthetic data.'
volume: 80
URL: https://proceedings.mlr.press/v80/chowdhury18a.html
PDF: http://proceedings.mlr.press/v80/chowdhury18a/chowdhury18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-chowdhury18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Agniva
family: Chowdhury
- given: Jiasen
family: Yang
- given: Petros
family: Drineas
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 989-998
id: chowdhury18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 989
lastpage: 998
published: 2018-07-03 00:00:00 +0000
- title: 'Stochastic Wasserstein Barycenters'
abstract: 'We present a stochastic algorithm to compute the barycenter of a set of probability distributions under the Wasserstein metric from optimal transport. Unlike previous approaches, our method extends to continuous input distributions and allows the support of the barycenter to be adjusted in each iteration. We tackle the problem without regularization, allowing us to recover a sharp output whose support is contained within the support of the true barycenter. We give examples where our algorithm recovers a more meaningful barycenter than previous work. Our method is versatile and can be extended to applications such as generating super samples from a given distribution and recovering blue noise approximations.'
volume: 80
URL: https://proceedings.mlr.press/v80/claici18a.html
PDF: http://proceedings.mlr.press/v80/claici18a/claici18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-claici18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sebastian
family: Claici
- given: Edward
family: Chien
- given: Justin
family: Solomon
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 999-1008
id: claici18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 999
lastpage: 1008
published: 2018-07-03 00:00:00 +0000
- title: 'Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings'
abstract: 'In this work, we take a representation learning perspective on hierarchical reinforcement learning, where the problem of learning lower layers in a hierarchy is transformed into the problem of learning trajectory-level generative models. We show that we can learn continuous latent representations of trajectories, which are effective in solving temporally extended and multi-stage problems. Our proposed model, SeCTAR, draws inspiration from variational autoencoders, and learns latent representations of trajectories. A key component of this method is to learn both a latent-conditioned policy and a latent-conditioned model which are consistent with each other. Given the same latent, the policy generates a trajectory which should match the trajectory predicted by the model. This model provides a built-in prediction mechanism, by predicting the outcome of closed loop policy behavior. We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective. We show that our model is effective at reasoning over long horizons with sparse rewards for several simulated tasks, outperforming standard reinforcement learning methods and prior methods for hierarchical reasoning, model-based planning, and exploration. This model provides a built-in prediction mechanism, by predicting the outcome of closed loop policy behavior. We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective. We show that our model is effective at reasoning over long horizons with sparse rewards for several simulated tasks, outperforming standard reinforcement learning methods and prior methods for hierarchical reasoning, model-based planning, and exploration.'
volume: 80
URL: https://proceedings.mlr.press/v80/co-reyes18a.html
PDF: http://proceedings.mlr.press/v80/co-reyes18a/co-reyes18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-co-reyes18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: John
family: Co-Reyes
- given: YuXuan
family: Liu
- given: Abhishek
family: Gupta
- given: Benjamin
family: Eysenbach
- given: Pieter
family: Abbeel
- given: Sergey
family: Levine
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1009-1018
id: co-reyes18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1009
lastpage: 1018
published: 2018-07-03 00:00:00 +0000
- title: 'On Acceleration with Noise-Corrupted Gradients'
abstract: 'Accelerated algorithms have broad applications in large-scale optimization, due to their generality and fast convergence. However, their stability in the practical setting of noise-corrupted gradient oracles is not well-understood. This paper provides two main technical contributions: (i) a new accelerated method AGDP that generalizes Nesterov’s AGD and improves on the recent method AXGD (Diakonikolas & Orecchia, 2018), and (ii) a theoretical study of accelerated algorithms under noisy and inexact gradient oracles, which is supported by numerical experiments. This study leverages the simplicity of AGDP and its analysis to clarify the interaction between noise and acceleration and to suggest modifications to the algorithm that reduce the mean and variance of the error incurred due to the gradient noise.'
volume: 80
URL: https://proceedings.mlr.press/v80/cohen18a.html
PDF: http://proceedings.mlr.press/v80/cohen18a/cohen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-cohen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Michael
family: Cohen
- given: Jelena
family: Diakonikolas
- given: Lorenzo
family: Orecchia
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1019-1028
id: cohen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1019
lastpage: 1028
published: 2018-07-03 00:00:00 +0000
- title: 'Online Linear Quadratic Control'
abstract: 'We study the problem of controlling linear time-invariant systems with known noisy dynamics and adversarially chosen quadratic losses. We present the first efficient online learning algorithms in this setting that guarantee $O(\sqrt{T})$ regret under mild assumptions, where $T$ is the time horizon. Our algorithms rely on a novel SDP relaxation for the steady-state distribution of the system. Crucially, and in contrast to previously proposed relaxations, the feasible solutions of our SDP all correspond to “strongly stable” policies that mix exponentially fast to a steady state.'
volume: 80
URL: https://proceedings.mlr.press/v80/cohen18b.html
PDF: http://proceedings.mlr.press/v80/cohen18b/cohen18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-cohen18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alon
family: Cohen
- given: Avinatan
family: Hasidim
- given: Tomer
family: Koren
- given: Nevena
family: Lazic
- given: Yishay
family: Mansour
- given: Kunal
family: Talwar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1029-1038
id: cohen18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1029
lastpage: 1038
published: 2018-07-03 00:00:00 +0000
- title: 'GEP-PG: Decoupling Exploration and Exploitation in Deep Reinforcement Learning Algorithms'
abstract: 'In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like Novelty Search, Quality-Diversity or Goal Exploration Processes explore more robustly but are less efficient at fine-tuning policies using gradient-descent. In this paper, we present the GEP-PG approach, taking the best of both worlds by sequentially combining a Goal Exploration Process and two variants of DDPG . We study the learning performance of these components and their combination on a low dimensional deceptive reward problem and on the larger Half-Cheetah benchmark. We show that DDPG fails on the former and that GEP-PG improves over the best DDPG variant in both environments.'
volume: 80
URL: https://proceedings.mlr.press/v80/colas18a.html
PDF: http://proceedings.mlr.press/v80/colas18a/colas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-colas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Cédric
family: Colas
- given: Olivier
family: Sigaud
- given: Pierre-Yves
family: Oudeyer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1039-1048
id: colas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1039
lastpage: 1048
published: 2018-07-03 00:00:00 +0000
- title: 'Efficient Model-Based Deep Reinforcement Learning with Variational State Tabulation'
abstract: 'Modern reinforcement learning algorithms reach super-human performance on many board and video games, but they are sample inefficient, i.e. they typically require significantly more playing experience than humans to reach an equal performance level. To improve sample efficiency, an agent may build a model of the environment and use planning methods to update its policy. In this article we introduce Variational State Tabulation (VaST), which maps an environment with a high-dimensional state space (e.g. the space of visual inputs) to an abstract tabular model. Prioritized sweeping with small backups, a highly efficient planning method, can then be used to update state-action values. We show how VaST can rapidly learn to maximize reward in tasks like 3D navigation and efficiently adapt to sudden changes in rewards or transition probabilities.'
volume: 80
URL: https://proceedings.mlr.press/v80/corneil18a.html
PDF: http://proceedings.mlr.press/v80/corneil18a/corneil18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-corneil18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dane
family: Corneil
- given: Wulfram
family: Gerstner
- given: Johanni
family: Brea
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1049-1058
id: corneil18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1049
lastpage: 1058
published: 2018-07-03 00:00:00 +0000
- title: 'Online Learning with Abstention'
abstract: 'We present an extensive study of a key problem in online learning where the learner can opt to abstain from making a prediction, at a certain cost. In the adversarial setting, we show how existing online algorithms and guarantees can be adapted to this problem. In the stochastic setting, we first point out a bias problem that limits the straightforward extension of algorithms such as UCB-N to this context. Next, we give a new algorithm, UCB-GT, that exploits historical data and time-varying feedback graphs. We show that this algorithm benefits from more favorable regret guarantees than a natural extension of UCB-N . We further report the results of a series of experiments demonstrating that UCB-GT largely outperforms that extension of UCB-N, as well as other standard baselines.'
volume: 80
URL: https://proceedings.mlr.press/v80/cortes18a.html
PDF: http://proceedings.mlr.press/v80/cortes18a/cortes18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-cortes18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Corinna
family: Cortes
- given: Giulia
family: DeSalvo
- given: Claudio
family: Gentile
- given: Mehryar
family: Mohri
- given: Scott
family: Yang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1059-1067
id: cortes18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1059
lastpage: 1067
published: 2018-07-03 00:00:00 +0000
- title: 'Constrained Interacting Submodular Groupings'
abstract: 'We introduce the problem of grouping a finite ground set into blocks where each block is a subset of the ground set and where: (i) the blocks are individually highly valued by a submodular function (both robustly and in the average case) while satisfying block-specific matroid constraints; and (ii) block scores interact where blocks are jointly scored highly, thus making the blocks mutually non-redundant. Submodular functions are good models of information and diversity; thus, the above can be seen as grouping the ground set into matroid constrained blocks that are both intra- and inter-diverse. Potential applications include forming ensembles of classification/regression models, partitioning data for parallel processing, and summarization. In the non-robust case, we reduce the problem to non-monotone submodular maximization subject to multiple matroid constraints. In the mixed robust/average case, we offer a bi-criterion guarantee for a polynomial time deterministic algorithm and a probabilistic guarantee for randomized algorithm, as long as the involved submodular functions (including the inter-block interaction terms) are monotone. We close with a case study in which we use these algorithms to find high quality diverse ensembles of classifiers, showing good results.'
volume: 80
URL: https://proceedings.mlr.press/v80/cotter18a.html
PDF: http://proceedings.mlr.press/v80/cotter18a/cotter18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-cotter18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrew
family: Cotter
- given: Mahdi Milani
family: Fard
- given: Seungil
family: You
- given: Maya
family: Gupta
- given: Jeff
family: Bilmes
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1068-1077
id: cotter18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1068
lastpage: 1077
published: 2018-07-03 00:00:00 +0000
- title: 'Inference Suboptimality in Variational Autoencoders'
abstract: 'Amortized inference allows latent-variable models trained via variational learning to scale to large datasets. The quality of approximate inference is determined by two factors: a) the capacity of the variational distribution to match the true posterior and b) the ability of the recognition network to produce good variational parameters for each datapoint. We examine approximate inference in variational autoencoders in terms of these factors. We find that divergence from the true posterior is often due to imperfect recognition networks, rather than the limited complexity of the approximating distribution. We show that this is due partly to the generator learning to accommodate the choice of approximation. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.'
volume: 80
URL: https://proceedings.mlr.press/v80/cremer18a.html
PDF: http://proceedings.mlr.press/v80/cremer18a/cremer18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-cremer18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chris
family: Cremer
- given: Xuechen
family: Li
- given: David
family: Duvenaud
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1078-1086
id: cremer18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1078
lastpage: 1086
published: 2018-07-03 00:00:00 +0000
- title: 'Mix & Match Agent Curricula for Reinforcement Learning'
abstract: 'We introduce Mix and match (M&M) – a training framework designed to facilitate rapid and effective learning in RL agents that would be too slow or too challenging to train otherwise.The key innovation is a procedure that allows us to automatically form a curriculum over agents. Through such a curriculum we can progressively train more complex agents by, effectively, bootstrapping from solutions found by simpler agents.In contradistinction to typical curriculum learning approaches, we do not gradually modify the tasks or environments presented, but instead use a process to gradually alter how the policy is represented internally.We show the broad applicability of our method by demonstrating significant performance gains in three different experimental setups: (1) We train an agent able to control more than 700 actions in a challenging 3D first-person task; using our method to progress through an action-space curriculum we achieve both faster training and better final performance than one obtains using traditional methods.(2) We further show that M&M can be used successfully to progress through a curriculum of architectural variants defining an agents internal state. (3) Finally, we illustrate how a variant of our method can be used to improve agent performance in a multitask setting.'
volume: 80
URL: https://proceedings.mlr.press/v80/czarnecki18a.html
PDF: http://proceedings.mlr.press/v80/czarnecki18a/czarnecki18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-czarnecki18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wojciech
family: Czarnecki
- given: Siddhant
family: Jayakumar
- given: Max
family: Jaderberg
- given: Leonard
family: Hasenclever
- given: Yee Whye
family: Teh
- given: Nicolas
family: Heess
- given: Simon
family: Osindero
- given: Razvan
family: Pascanu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1087-1095
id: czarnecki18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1087
lastpage: 1095
published: 2018-07-03 00:00:00 +0000
- title: 'Implicit Quantile Networks for Distributional Reinforcement Learning'
abstract: 'In this work, we build on recent advances in distributional reinforcement learning to give a generally applicable, flexible, and state-of-the-art distributional variant of DQN. We achieve this by using quantile regression to approximate the full quantile function for the state-action return distribution. By reparameterizing a distribution over the sample space, this yields an implicitly defined return distribution and gives rise to a large class of risk-sensitive policies. We demonstrate improved performance on the 57 Atari 2600 games in the ALE, and use our algorithm’s implicitly defined distributions to study the effects of risk-sensitive policies in Atari games.'
volume: 80
URL: https://proceedings.mlr.press/v80/dabney18a.html
PDF: http://proceedings.mlr.press/v80/dabney18a/dabney18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dabney18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Will
family: Dabney
- given: Georg
family: Ostrovski
- given: David
family: Silver
- given: Remi
family: Munos
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1096-1105
id: dabney18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1096
lastpage: 1105
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Steady-States of Iterative Algorithms over Graphs'
abstract: 'Many graph analytics problems can be solved via iterative algorithms where the solutions are often characterized by a set of steady-state conditions. Different algorithms respect to different set of fixed point constraints, so instead of using these traditional algorithms, can we learn an algorithm which can obtain the same steady-state solutions automatically from examples, in an effective and scalable way? How to represent the meta learner for such algorithm and how to carry out the learning? In this paper, we propose an embedding representation for iterative algorithms over graphs, and design a learning method which alternates between updating the embeddings and projecting them onto the steady-state constraints. We demonstrate the effectiveness of our framework using a few commonly used graph algorithms, and show that in some cases, the learned algorithm can handle graphs with more than 100,000,000 nodes in a single machine.'
volume: 80
URL: https://proceedings.mlr.press/v80/dai18a.html
PDF: http://proceedings.mlr.press/v80/dai18a/dai18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dai18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hanjun
family: Dai
- given: Zornitsa
family: Kozareva
- given: Bo
family: Dai
- given: Alex
family: Smola
- given: Le
family: Song
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1106-1114
id: dai18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1106
lastpage: 1114
published: 2018-07-03 00:00:00 +0000
- title: 'Adversarial Attack on Graph Structured Data'
abstract: 'Deep learning on graph structures has shown exciting results in various applications. However, few attentions have been paid to the robustness of such models, in contrast to numerous research work for image or text adversarial attack and defense. In this paper, we focus on the adversarial attacks that fool deep learning models by modifying the combinatorial structure of data. We first propose a reinforcement learning based attack method that learns the generalizable attack policy, while only requiring prediction labels from the target classifier. We further propose attack methods based on genetic algorithms and gradient descent in the scenario where additional prediction confidence or gradients are available. We use both synthetic and real-world data to show that, a family of Graph Neural Network models are vulnerable to these attacks, in both graph-level and node-level classification tasks. We also show such attacks can be used to diagnose the learned classifiers.'
volume: 80
URL: https://proceedings.mlr.press/v80/dai18b.html
PDF: http://proceedings.mlr.press/v80/dai18b/dai18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dai18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hanjun
family: Dai
- given: Hui
family: Li
- given: Tian
family: Tian
- given: Xin
family: Huang
- given: Lin
family: Wang
- given: Jun
family: Zhu
- given: Le
family: Song
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1115-1124
id: dai18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1115
lastpage: 1124
published: 2018-07-03 00:00:00 +0000
- title: 'SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation'
abstract: 'When function approximation is used, solving the Bellman optimality equation with stability guarantees has remained a major open problem in reinforcement learning for decades. The fundamental difficulty is that the Bellman operator may become an expansion in general, resulting in oscillating and even divergent behavior of popular algorithms like Q-learning. In this paper, we revisit the Bellman equation, and reformulate it into a novel primal-dual optimization problem using Nesterov’s smoothing technique and the Legendre-Fenchel transformation. We then develop a new algorithm, called Smoothed Bellman Error Embedding, to solve this optimization problem where any differentiable function class may be used. We provide what we believe to be the first convergence guarantee for general nonlinear function approximation, and analyze the algorithm’s sample complexity. Empirically, our algorithm compares favorably to state-of-the-art baselines in several benchmark control problems.'
volume: 80
URL: https://proceedings.mlr.press/v80/dai18c.html
PDF: http://proceedings.mlr.press/v80/dai18c/dai18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dai18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bo
family: Dai
- given: Albert
family: Shaw
- given: Lihong
family: Li
- given: Lin
family: Xiao
- given: Niao
family: He
- given: Zhen
family: Liu
- given: Jianshu
family: Chen
- given: Le
family: Song
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1125-1134
id: dai18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1125
lastpage: 1134
published: 2018-07-03 00:00:00 +0000
- title: 'Compressing Neural Networks using the Variational Information Bottleneck'
abstract: 'Neural networks can be compressed to reduce memory and computational requirements, or to increase accuracy by facilitating the use of a larger base architecture. In this paper we focus on pruning individual neurons, which can simultaneously trim model size, FLOPs, and run-time memory. To improve upon the performance of existing compression algorithms we utilize the information bottleneck principle instantiated via a tractable variational bound. Minimization of this information theoretic bound reduces the redundancy between adjacent layers by aggregating useful information into a subset of neurons that can be preserved. In contrast, the activations of disposable neurons are shut off via an attractive form of sparse regularization that emerges naturally from this framework, providing tangible advantages over traditional sparsity penalties without contributing additional tuning parameters to the energy landscape. We demonstrate state-of-the-art compression rates across an array of datasets and network architectures.'
volume: 80
URL: https://proceedings.mlr.press/v80/dai18d.html
PDF: http://proceedings.mlr.press/v80/dai18d/dai18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dai18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bin
family: Dai
- given: Chen
family: Zhu
- given: Baining
family: Guo
- given: David
family: Wipf
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1135-1144
id: dai18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1135
lastpage: 1144
published: 2018-07-03 00:00:00 +0000
- title: 'Asynchronous Byzantine Machine Learning (the case of SGD)'
abstract: 'Asynchronous distributed machine learning solutions have proven very effective so far, but always assuming perfectly functioning workers. In practice, some of the workers can however exhibit Byzantine behavior, caused by hardware failures, software bugs, corrupt data, or even malicious attacks. We introduce Kardam, the first distributed asynchronous stochastic gradient descent (SGD) algorithm that copes with Byzantine workers. Kardam consists of two complementary components: a filtering and a dampening component. The first is scalar-based and ensures resilience against 1/3 Byzantine workers. Essentially, this filter leverages the Lipschitzness of cost functions and acts as a self-stabilizer against Byzantine workers that would attempt to corrupt the progress of SGD. The dampening component bounds the convergence rate by adjusting to stale information through a generic gradient weighting scheme. We prove that Kardam guarantees almost sure convergence in the presence of asynchrony and Byzantine behavior, and we derive its convergence rate. We evaluate Kardam on the CIFAR100 and EMNIST datasets and measure its overhead with respect to non Byzantine-resilient solutions. We empirically show that Kardam does not introduce additional noise to the learning procedure but does induce a slowdown (the cost of Byzantine resilience) that we both theoretically and empirically show to be less than f/n, where f is the number of Byzantine failures tolerated and n the total number of workers. Interestingly, we also empirically observe that the dampening component is interesting in its own right for it enables to build an SGD algorithm that outperforms alternative staleness-aware asynchronous competitors in environments with honest workers.'
volume: 80
URL: https://proceedings.mlr.press/v80/damaskinos18a.html
PDF: http://proceedings.mlr.press/v80/damaskinos18a/damaskinos18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-damaskinos18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Georgios
family: Damaskinos
- given: El Mahdi
family: El Mhamdi
- given: Rachid
family: Guerraoui
- given: Rhicheek
family: Patra
- given: Mahsa
family: Taziki
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1145-1154
id: damaskinos18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1145
lastpage: 1154
published: 2018-07-03 00:00:00 +0000
- title: 'Escaping Saddles with Stochastic Gradients'
abstract: 'We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.'
volume: 80
URL: https://proceedings.mlr.press/v80/daneshmand18a.html
PDF: http://proceedings.mlr.press/v80/daneshmand18a/daneshmand18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-daneshmand18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hadi
family: Daneshmand
- given: Jonas
family: Kohler
- given: Aurelien
family: Lucchi
- given: Thomas
family: Hofmann
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1155-1164
id: daneshmand18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1155
lastpage: 1164
published: 2018-07-03 00:00:00 +0000
- title: 'Minibatch Gibbs Sampling on Large Graphical Models'
abstract: 'Gibbs sampling is the de facto Markov chain Monte Carlo method used for inference and learning on large scale graphical models. For complicated factor graphs with lots of factors, the performance of Gibbs sampling can be limited by the computational cost of executing a single update step of the Markov chain. This cost is proportional to the degree of the graph, the number of factors adjacent to each variable. In this paper, we show how this cost can be reduced by using minibatching: subsampling the factors to form an estimate of their sum. We introduce several minibatched variants of Gibbs, show that they can be made unbiased, prove bounds on their convergence rates, and show that under some conditions they can result in asymptotic single-update-run-time speedups over plain Gibbs sampling.'
volume: 80
URL: https://proceedings.mlr.press/v80/desa18a.html
PDF: http://proceedings.mlr.press/v80/desa18a/desa18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-desa18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chris
family: De Sa
- given: Vincent
family: Chen
- given: Wing
family: Wong
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1165-1173
id: desa18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1165
lastpage: 1173
published: 2018-07-03 00:00:00 +0000
- title: 'Stochastic Video Generation with a Learned Prior'
abstract: 'Generating video frames that accurately predict future world states is challenging. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this paper we introduce a video generation model with a learned prior over stochastic latent variables at each time step. Video frames are generated by drawing samples from this prior and combining them with a deterministic estimate of the future frame. The approach is simple and easily trained end-to-end on a variety of datasets. Sample generations are both varied and sharp, even many frames into the future, and compare favorably to those from existing approaches.'
volume: 80
URL: https://proceedings.mlr.press/v80/denton18a.html
PDF: http://proceedings.mlr.press/v80/denton18a/denton18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-denton18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Emily
family: Denton
- given: Rob
family: Fergus
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1174-1183
id: denton18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1174
lastpage: 1183
published: 2018-07-03 00:00:00 +0000
- title: 'Decomposition of Uncertainty in Bayesian Deep Learning for Efficient and Risk-sensitive Learning'
abstract: 'Bayesian neural networks with latent variables are scalable and flexible probabilistic models: they account for uncertainty in the estimation of the network weights and, by making use of latent variables, can capture complex noise patterns in the data. Using these models we show how to perform and utilize a decomposition of uncertainty in aleatoric and epistemic components for decision making purposes. This allows us to successfully identify informative points for active learning of functions with heteroscedastic and bimodal noise. Using the decomposition we further define a novel risk-sensitive criterion for reinforcement learningto identify policies that balance expected cost, model-bias and noise aversion.'
volume: 80
URL: https://proceedings.mlr.press/v80/depeweg18a.html
PDF: http://proceedings.mlr.press/v80/depeweg18a/depeweg18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-depeweg18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Stefan
family: Depeweg
- given: Jose-Miguel
family: Hernandez-Lobato
- given: Finale
family: Doshi-Velez
- given: Steffen
family: Udluft
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1184-1193
id: depeweg18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1184
lastpage: 1193
published: 2018-07-03 00:00:00 +0000
- title: 'Accurate Inference for Adaptive Linear Models'
abstract: 'Estimators computed from adaptively collected data do not behave like their non-adaptive brethren.Rather, the sequential dependence of the collection policy can lead to severe distributional biases that persist even in the infinite data limit. We develop a general method – *$\mathbf{W}$-decorrelation* – for transforming the bias of adaptive linear regression estimators into variance. The method uses only coarse-grained information about the data collection policy and does not need access to propensity scores or exact knowledge of the policy.We bound the finite-sample bias and variance of the $\mathbf{W}$-estimator and develop asymptotically correct confidence intervals based on a novel martingale central limit theorem. We then demonstrate the empirical benefits of the generic $\mathbf{W}$-decorrelation procedure in two different adaptive data settings: the multi-armed bandit and the autoregressive time series.'
volume: 80
URL: https://proceedings.mlr.press/v80/deshpande18a.html
PDF: http://proceedings.mlr.press/v80/deshpande18a/deshpande18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-deshpande18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yash
family: Deshpande
- given: Lester
family: Mackey
- given: Vasilis
family: Syrgkanis
- given: Matt
family: Taddy
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1194-1203
id: deshpande18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1194
lastpage: 1203
published: 2018-07-03 00:00:00 +0000
- title: 'Variational Network Inference: Strong and Stable with Concrete Support'
abstract: 'Traditional methods for the discovery of latent network structures are limited in two ways: they either assume that all the signal comes from the network (i.e. there is no source of signal outside the network) or they place constraints on the network parameters to ensure model or algorithmic stability. We address these limitations by proposing a model that incorporates a Gaussian process prior on a network-independent component and formally proving that we get algorithmic stability for free while providing a novel perspective on model stability as well as robustness results and precise intervals for key inference parameters. We show that, on three applications, our approach outperforms previous methods consistently.'
volume: 80
URL: https://proceedings.mlr.press/v80/dezfouli18a.html
PDF: http://proceedings.mlr.press/v80/dezfouli18a/dezfouli18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dezfouli18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amir
family: Dezfouli
- given: Edwin
family: Bonilla
- given: Richard
family: Nock
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1204-1213
id: dezfouli18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1204
lastpage: 1213
published: 2018-07-03 00:00:00 +0000
- title: 'Modeling Sparse Deviations for Compressed Sensing using Generative Models'
abstract: 'In compressed sensing, a small number of linear measurements can be used to reconstruct an unknown signal. Existing approaches leverage assumptions on the structure of these signals, such as sparsity or the availability of a generative model. A domain-specific generative model can provide a stronger prior and thus allow for recovery with far fewer measurements. However, unlike sparsity-based approaches, existing methods based on generative models guarantee exact recovery only over their support, which is typically only a small subset of the space on which the signals are defined. We propose Sparse-Gen, a framework that allows for sparse deviations from the support set, thereby achieving the best of both worlds by using a domain specific prior and allowing reconstruction over the full space of signals. Theoretically, our framework provides a new class of signals that can be acquired using compressed sensing, reducing classic sparse vector recovery to a special case and avoiding the restrictive support due to a generative model prior. Empirically, we observe consistent improvements in reconstruction accuracy over competing approaches, especially in the more practical setting of transfer compressed sensing where a generative model for a data-rich, source domain aids sensing on a data-scarce, target domain.'
volume: 80
URL: https://proceedings.mlr.press/v80/dhar18a.html
PDF: http://proceedings.mlr.press/v80/dhar18a/dhar18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dhar18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Manik
family: Dhar
- given: Aditya
family: Grover
- given: Stefano
family: Ermon
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1214-1223
id: dhar18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1214
lastpage: 1223
published: 2018-07-03 00:00:00 +0000
- title: 'Alternating Randomized Block Coordinate Descent'
abstract: 'Block-coordinate descent algorithms and alternating minimization methods are fundamental optimization algorithms and an important primitive in large-scale optimization and machine learning. While various block-coordinate-descent-type methods have been studied extensively, only alternating minimization – which applies to the setting of only two blocks – is known to have convergence time that scales independently of the least smooth block. A natural question is then: is the setting of two blocks special? We show that the answer is “no” as long as the least smooth block can be optimized exactly – an assumption that is also needed in the setting of alternating minimization. We do so by introducing a novel algorithm AR-BCD, whose convergence time scales independently of the least smooth (possibly non-smooth) block. The basic algorithm generalizes both alternating minimization and randomized block coordinate (gradient) descent, and we also provide its accelerated version – AAR-BCD.'
volume: 80
URL: https://proceedings.mlr.press/v80/diakonikolas18a.html
PDF: http://proceedings.mlr.press/v80/diakonikolas18a/diakonikolas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-diakonikolas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jelena
family: Diakonikolas
- given: Lorenzo
family: Orecchia
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1224-1232
id: diakonikolas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1224
lastpage: 1232
published: 2018-07-03 00:00:00 +0000
- title: 'Learning to Act in Decentralized Partially Observable MDPs'
abstract: 'We address a long-standing open problem of reinforcement learning in decentralized partially observable Markov decision processes. Previous attempts focussed on different forms of generalized policy iteration, which at best led to local optima. In this paper, we restrict attention to plans, which are simpler to store and update than policies. We derive, under certain conditions, the first near-optimal cooperative multi-agent reinforcement learning algorithm. To achieve significant scalability gains, we replace the greedy maximization by mixed-integer linear programming. Experiments show our approach can learn to act near-optimally in many finite domains from the literature.'
volume: 80
URL: https://proceedings.mlr.press/v80/dibangoye18a.html
PDF: http://proceedings.mlr.press/v80/dibangoye18a/dibangoye18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dibangoye18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jilles
family: Dibangoye
- given: Olivier
family: Buffet
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1233-1242
id: dibangoye18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1233
lastpage: 1242
published: 2018-07-03 00:00:00 +0000
- title: 'Leveraging Well-Conditioned Bases: Streaming and Distributed Summaries in Minkowski $p$-Norms'
abstract: 'Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm $\ell_2$. We study other $\ell_p$ norms, which are more robust for $p < 2$, and can be used to find outliers for $p > 2$. Unlike previous algorithms for such norms, we give algorithms that are (1) deterministic, (2) work simultaneously for every $p \geq 1$, including $p = \infty$, and (3) can be implemented in both distributed and streaming environments. We study $\ell_p$-regression, entrywise $\ell_p$-low rank approximation, and versions of approximate matrix multiplication.'
volume: 80
URL: https://proceedings.mlr.press/v80/dickens18a.html
PDF: http://proceedings.mlr.press/v80/dickens18a/dickens18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dickens18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Charlie
family: Dickens
- given: Graham
family: Cormode
- given: David
family: Woodruff
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1243-1251
id: dickens18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1243
lastpage: 1251
published: 2018-07-03 00:00:00 +0000
- title: 'Noisin: Unbiased Regularization for Recurrent Neural Networks'
abstract: 'Recurrent neural networks (RNNs) are powerful models of sequential data. They have been successfully used in domains such as text and speech. However, RNNs are susceptible to overfitting; regularization is important. In this paper we develop Noisin, a new method for regularizing RNNs. Noisin injects random noise into the hidden states of the RNN and then maximizes the corresponding marginal likelihood of the data. We show how Noisin applies to any RNN and we study many different types of noise. Noisin is unbiased–it preserves the underlying RNN on average. We characterize how Noisin regularizes its RNN both theoretically and empirically. On language modeling benchmarks, Noisin improves over dropout by as much as 12.2% on the Penn Treebank and 9.4% on the Wikitext-2 dataset. We also compared the state-of-the-art language model of Yang et al. 2017, both with and without Noisin. On the Penn Treebank, the method with Noisin more quickly reaches state-of-the-art performance.'
volume: 80
URL: https://proceedings.mlr.press/v80/dieng18a.html
PDF: http://proceedings.mlr.press/v80/dieng18a/dieng18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dieng18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Adji Bousso
family: Dieng
- given: Rajesh
family: Ranganath
- given: Jaan
family: Altosaar
- given: David
family: Blei
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1252-1261
id: dieng18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1252
lastpage: 1261
published: 2018-07-03 00:00:00 +0000
- title: 'Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning'
abstract: 'Exogenous state variables and rewards can slow down reinforcement learning by injecting uncontrolled variation into the reward signal. We formalize exogenous state variables and rewards and identify conditions under which an MDP with exogenous state can be decomposed into an exogenous Markov Reward Process involving only the exogenous state+reward and an endogenous Markov Decision Process defined with respect to only the endogenous rewards. We also derive a variance-covariance condition under which Monte Carlo policy evaluation on the endogenous MDP is accelerated compared to using the full MDP. Similar speedups are likely to carry over to all RL algorithms. We develop two algorithms for discovering the exogenous variables and test them on several MDPs. Results show that the algorithms are practical and can significantly speed up reinforcement learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/dietterich18a.html
PDF: http://proceedings.mlr.press/v80/dietterich18a/dietterich18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dietterich18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Dietterich
- given: George
family: Trimponias
- given: Zhitang
family: Chen
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1262-1270
id: dietterich18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1262
lastpage: 1270
published: 2018-07-03 00:00:00 +0000
- title: 'Coordinated Exploration in Concurrent Reinforcement Learning'
abstract: 'We consider a team of reinforcement learning agents that concurrently learn to operate in a common environment. We identify three properties - adaptivity, commitment, and diversity - which are necessary for efficient coordinated exploration and demonstrate that straightforward extensions to single-agent optimistic and posterior sampling approaches fail to satisfy them. As an alternative, we propose seed sampling, which extends posterior sampling in a manner that meets these requirements. Simulation results investigate how per-agent regret decreases as the number of agents grows, establishing substantial advantages of seed sampling over alternative exploration schemes.'
volume: 80
URL: https://proceedings.mlr.press/v80/dimakopoulou18a.html
PDF: http://proceedings.mlr.press/v80/dimakopoulou18a/dimakopoulou18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dimakopoulou18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maria
family: Dimakopoulou
- given: Benjamin
family: Van Roy
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1271-1279
id: dimakopoulou18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1271
lastpage: 1279
published: 2018-07-03 00:00:00 +0000
- title: 'Probabilistic Recurrent State-Space Models'
abstract: 'State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g., LSTMs) proved extremely successful in modeling complex time series data. Fully probabilistic SSMs, however, are often found hard to train, even for smaller problems. We propose a novel model formulation and a scalable training algorithm based on doubly stochastic variational inference and Gaussian processes. This combination allows efficient incorporation of latent state temporal correlations, which we found to be key to robust training. The effectiveness of the proposed PR-SSM is evaluated on a set of real-world benchmark datasets in comparison to state-of-the-art probabilistic model learning methods. Scalability and robustness are demonstrated on a high dimensional problem.'
volume: 80
URL: https://proceedings.mlr.press/v80/doerr18a.html
PDF: http://proceedings.mlr.press/v80/doerr18a/doerr18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-doerr18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andreas
family: Doerr
- given: Christian
family: Daniel
- given: Martin
family: Schiegg
- given: Nguyen-Tuong
family: Duy
- given: Stefan
family: Schaal
- given: Marc
family: Toussaint
- given: Trimpe
family: Sebastian
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1280-1289
id: doerr18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1280
lastpage: 1289
published: 2018-07-03 00:00:00 +0000
- title: 'Randomized Block Cubic Newton Method'
abstract: 'We study the problem of minimizing the sum of three convex functions: a differentiable, twice-differentiable and a non-smooth term in a high dimensional setting. To this effect we propose and analyze a randomized block cubic Newton (RBCN) method, which in each iteration builds a model of the objective function formed as the sum of the natural models of its three components: a linear model with a quadratic regularizer for the differentiable term, a quadratic model with a cubic regularizer for the twice differentiable term, and perfect (proximal) model for the nonsmooth term. Our method in each iteration minimizes the model over a random subset of blocks of the search variable. RBCN is the first algorithm with these properties, generalizing several existing methods, matching the best known bounds in all special cases. We establish ${\cal O}(1/\epsilon)$, ${\cal O}(1/\sqrt{\epsilon})$ and ${\cal O}(\log (1/\epsilon))$ rates under different assumptions on the component functions. Lastly, we show numerically that our method outperforms the state-of-the-art on a variety of machine learning problems, including cubically regularized least-squares, logistic regression with constraints, and Poisson regression.'
volume: 80
URL: https://proceedings.mlr.press/v80/doikov18a.html
PDF: http://proceedings.mlr.press/v80/doikov18a/doikov18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-doikov18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nikita
family: Doikov
- given: Peter
family: Richtarik
- given: University
family: Edinburgh
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1290-1298
id: doikov18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1290
lastpage: 1298
published: 2018-07-03 00:00:00 +0000
- title: 'Low-Rank Riemannian Optimization on Positive Semidefinite Stochastic Matrices with Applications to Graph Clustering'
abstract: 'This paper develops a Riemannian optimization framework for solving optimization problems on the set of symmetric positive semidefinite stochastic matrices. The paper first reformulates the problem by factorizing the optimization variable as $\mathbf{X}=\mathbf{Y}\mathbf{Y}^T$ and deriving conditions on $p$, i.e., the number of columns of $\mathbf{Y}$, under which the factorization yields a satisfactory solution. The reparameterization of the problem allows its formulation as an optimization over either an embedded or quotient Riemannian manifold whose geometries are investigated. In particular, the paper explicitly derives the tangent space, Riemannian gradients and retraction operator that allow the design of efficient optimization methods on the proposed manifolds. The numerical results reveal that, when the optimal solution has a known low-rank, the resulting algorithms present a clear complexity advantage when compared with state-of-the-art Euclidean and Riemannian approaches for graph clustering applications.'
volume: 80
URL: https://proceedings.mlr.press/v80/douik18a.html
PDF: http://proceedings.mlr.press/v80/douik18a/douik18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-douik18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ahmed
family: Douik
- given: Babak
family: Hassibi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1299-1308
id: douik18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1299
lastpage: 1308
published: 2018-07-03 00:00:00 +0000
- title: 'Essentially No Barriers in Neural Network Energy Landscape'
abstract: 'Training neural networks involves finding minima of a high-dimensional non-convex loss function. Relaxing from linear interpolations, we construct continuous paths between minima of recent neural network architectures on CIFAR10 and CIFAR100. Surprisingly, the paths are essentially flat in both the training and test landscapes. This implies that minima are perhaps best seen as points on a single connected manifold of low loss, rather than as the bottoms of distinct valleys.'
volume: 80
URL: https://proceedings.mlr.press/v80/draxler18a.html
PDF: http://proceedings.mlr.press/v80/draxler18a/draxler18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-draxler18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Felix
family: Draxler
- given: Kambis
family: Veschgini
- given: Manfred
family: Salmhofer
- given: Fred
family: Hamprecht
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1309-1318
id: draxler18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1309
lastpage: 1318
published: 2018-07-03 00:00:00 +0000
- title: 'Weakly Consistent Optimal Pricing Algorithms in Repeated Posted-Price Auctions with Strategic Buyer'
abstract: 'We study revenue optimization learning algorithms for repeated posted-price auctions where a seller interacts with a single strategic buyer that holds a fixed private valuation for a good and seeks to maximize his cumulative discounted surplus. We propose a novel algorithm that never decreases offered prices and has a tight strategic regret bound of $\Theta(\log\log T)$. This result closes the open research question on the existence of a no-regret horizon-independent weakly consistent pricing. We also show that the property of non-decreasing prices is nearly necessary for a weakly consistent algorithm to be a no-regret one.'
volume: 80
URL: https://proceedings.mlr.press/v80/drutsa18a.html
PDF: http://proceedings.mlr.press/v80/drutsa18a/drutsa18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-drutsa18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexey
family: Drutsa
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1319-1328
id: drutsa18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1319
lastpage: 1328
published: 2018-07-03 00:00:00 +0000
- title: 'On the Power of Over-parametrization in Neural Networks with Quadratic Activation'
abstract: 'We provide new theoretical insights on why over-parametrization is effective in learning neural networks. For a $k$ hidden node shallow network with quadratic activation and $n$ training data points, we show as long as $ k \ge \sqrt{2n}$, over-parametrization enables local search algorithms to find a *globally* optimal solution for general smooth and convex loss functions. Further, despite that the number of parameters may exceed the sample size, using theory of Rademacher complexity, we show with weight decay, the solution also generalizes well if the data is sampled from a regular distribution such as Gaussian. To prove when $k\ge \sqrt{2n}$, the loss function has benign landscape properties, we adopt an idea from smoothed analysis, which may have other applications in studying loss surfaces of neural networks.'
volume: 80
URL: https://proceedings.mlr.press/v80/du18a.html
PDF: http://proceedings.mlr.press/v80/du18a/du18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-du18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Simon
family: Du
- given: Jason
family: Lee
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1329-1338
id: du18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1329
lastpage: 1338
published: 2018-07-03 00:00:00 +0000
- title: 'Gradient Descent Learns One-hidden-layer CNN: Don’t be Afraid of Spurious Local Minima'
abstract: 'We consider the problem of learning an one-hidden-layer neural network with non-overlapping convolutional layer and ReLU activation function, i.e., $f(Z; w, a) = \sum_j a_j\sigma(w^\top Z_j)$, in which both the convolutional weights $w$ and the output weights $a$ are parameters to be learned. We prove that with Gaussian input $\mathbf{Z}$ there is a spurious local minimizer. Surprisingly, in the presence of the spurious local minimizer, starting from randomly initialized weights, gradient descent with weight normalization can still be proven to recover the true parameters with constant probability (which can be boosted to probability $1$ with multiple restarts). We also show that with constant probability, the same procedure could also converge to the spurious local minimum, showing that the local minimum plays a non-trivial role in the dynamics of gradient descent. Furthermore, a quantitative analysis shows that the gradient descent dynamics has two phases: it starts off slow, but converges much faster after several iterations.'
volume: 80
URL: https://proceedings.mlr.press/v80/du18b.html
PDF: http://proceedings.mlr.press/v80/du18b/du18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-du18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Simon
family: Du
- given: Jason
family: Lee
- given: Yuandong
family: Tian
- given: Aarti
family: Singh
- given: Barnabas
family: Poczos
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1339-1348
id: du18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1339
lastpage: 1348
published: 2018-07-03 00:00:00 +0000
- title: 'Investigating Human Priors for Playing Video Games'
abstract: 'What makes humans so good at solving seemingly complex video games? Unlike computers, humans bring in a great deal of prior knowledge about the world, enabling efficient decision making. This paper investigates the role of human priors for solving video games. Given a sample game, we conduct a series of ablation studies to quantify the importance of various priors on human performance. We do this by modifying the video game environment to systematically mask different types of visual information that could be used by humans as priors. We find that removal of some prior knowledge causes a drastic degradation in the speed with which human players solve the game, e.g. from 2 minutes to over 20 minutes. Furthermore, our results indicate that general priors, such as the importance of objects and visual consistency, are critical for efficient game-play. Videos and the game manipulations are available at https://rach0012.github.io/humanRL_website/'
volume: 80
URL: https://proceedings.mlr.press/v80/dubey18a.html
PDF: http://proceedings.mlr.press/v80/dubey18a/dubey18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dubey18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rachit
family: Dubey
- given: Pulkit
family: Agrawal
- given: Deepak
family: Pathak
- given: Tom
family: Griffiths
- given: Alexei
family: Efros
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1349-1357
id: dubey18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1349
lastpage: 1357
published: 2018-07-03 00:00:00 +0000
- title: 'A Distributed Second-Order Algorithm You Can Trust'
abstract: 'Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years. While first-order methods seem to dominate the field, second-order methods are nevertheless attractive as they potentially require fewer communication rounds to converge. However, there are significant drawbacks that impede their wide adoption, such as the computation and the communication of a large Hessian matrix. In this paper we present a new algorithm for distributed training of generalized linear models that only requires the computation of diagonal blocks of the Hessian matrix on the individual workers. To deal with this approximate information we propose an adaptive approach that - akin to trust-region methods - dynamically adapts the auxiliary model to compensate for modeling errors. We provide theoretical rates of convergence for a wide class of problems including $L_1$-regularized objectives. We also demonstrate that our approach achieves state-of-the-art results on multiple large benchmark datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/duenner18a.html
PDF: http://proceedings.mlr.press/v80/duenner18a/duenner18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-duenner18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Celestine
family: Duenner
- given: Aurelien
family: Lucchi
- given: Matilde
family: Gargiani
- given: An
family: Bian
- given: Thomas
family: Hofmann
- given: Martin
family: Jaggi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1358-1366
id: duenner18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1358
lastpage: 1366
published: 2018-07-03 00:00:00 +0000
- title: 'Computational Optimal Transport: Complexity by Accelerated Gradient Descent Is Better Than by Sinkhorn’s Algorithm'
abstract: 'We analyze two algorithms for approximating the general optimal transport (OT) distance between two discrete distributions of size $n$, up to accuracy $\varepsilon$. For the first algorithm, which is based on the celebrated Sinkhorn’s algorithm, we prove the complexity bound $\widetilde{O}\left(\frac{n^2}{\varepsilon^2}\right)$ arithmetic operations ($\widetilde{O}$ hides polylogarithmic factors $(\ln n)^c$, $c>0$). For the second one, which is based on our novel Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD) algorithm, we prove the complexity bound $\widetilde{O}\left(\min\left\{\frac{n^{9/4}}{\varepsilon}, \frac{n^{2}}{\varepsilon^2} \right\}\right)$ arithmetic operations. Both bounds have better dependence on $\varepsilon$ than the state-of-the-art result given by $\widetilde{O}\left(\frac{n^2}{\varepsilon^3}\right)$. Our second algorithm not only has better dependence on $\varepsilon$ in the complexity bound, but also is not specific to entropic regularization and can solve the OT problem with different regularizers.'
volume: 80
URL: https://proceedings.mlr.press/v80/dvurechensky18a.html
PDF: http://proceedings.mlr.press/v80/dvurechensky18a/dvurechensky18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dvurechensky18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pavel
family: Dvurechensky
- given: Alexander
family: Gasnikov
- given: Alexey
family: Kroshnin
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1367-1376
id: dvurechensky18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1367
lastpage: 1376
published: 2018-07-03 00:00:00 +0000
- title: 'Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors'
abstract: 'We show that Entropy-SGD (Chaudhari et al., 2017), when viewed as a learning algorithm, optimizes a PAC-Bayes bound on the risk of a Gibbs (posterior) classifier, i.e., a randomized classifier obtained by a risk-sensitive perturbation of the weights of a learned classifier. Entropy-SGD works by optimizing the bound’s prior, violating the hypothesis of the PAC-Bayes theorem that the prior is chosen independently of the data. Indeed, available implementations of Entropy-SGD rapidly obtain zero training error on random labels and the same holds of the Gibbs posterior. In order to obtain a valid generalization bound, we rely on a result showing that data-dependent priors obtained by stochastic gradient Langevin dynamics (SGLD) yield valid PAC-Bayes bounds provided the target distribution of SGLD is eps-differentially private. We observe that test error on MNIST and CIFAR10 falls within the (empirically nonvacuous) risk bounds computed under the assumption that SGLD reaches stationarity. In particular, Entropy-SGLD can be configured to yield relatively tight generalization bounds and still fit real labels, although these same settings do not obtain state-of-the-art performance.'
volume: 80
URL: https://proceedings.mlr.press/v80/dziugaite18a.html
PDF: http://proceedings.mlr.press/v80/dziugaite18a/dziugaite18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-dziugaite18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gintare Karolina
family: Dziugaite
- given: Daniel
family: Roy
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1377-1386
id: dziugaite18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1377
lastpage: 1386
published: 2018-07-03 00:00:00 +0000
- title: 'Beyond the One-Step Greedy Approach in Reinforcement Learning'
abstract: 'The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms are, in fact, instances of our framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.'
volume: 80
URL: https://proceedings.mlr.press/v80/efroni18a.html
PDF: http://proceedings.mlr.press/v80/efroni18a/efroni18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-efroni18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yonathan
family: Efroni
- given: Gal
family: Dalal
- given: Bruno
family: Scherrer
- given: Shie
family: Mannor
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1387-1396
id: efroni18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1387
lastpage: 1396
published: 2018-07-03 00:00:00 +0000
- title: 'Parallel and Streaming Algorithms for K-Core Decomposition'
abstract: 'The k-core decomposition is a fundamental primitive in many machine learning and data mining applications. We present the first distributed and the first streaming algorithms to compute and maintain an approximate k-core decomposition with provable guarantees. Our algorithms achieve rigorous bounds on space complexity while bounding the number of passes or number of rounds of computation. We do so by presenting a new powerful sketching technique for k-core decomposition, and then by showing it can be computed efficiently in both streaming and MapReduce models. Finally, we confirm the effectiveness of our sketching technique empirically on a number of publicly available graphs.'
volume: 80
URL: https://proceedings.mlr.press/v80/esfandiari18a.html
PDF: http://proceedings.mlr.press/v80/esfandiari18a/esfandiari18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-esfandiari18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hossein
family: Esfandiari
- given: Silvio
family: Lattanzi
- given: Vahab
family: Mirrokni
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1397-1406
id: esfandiari18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1397
lastpage: 1406
published: 2018-07-03 00:00:00 +0000
- title: 'IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures'
abstract: 'In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time. We have developed a new distributed agent IMPALA (Importance Weighted Actor-Learner Architecture) that not only uses resources more efficiently in single-machine training but also scales to thousands of machines without sacrificing data efficiency or resource utilisation. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents with less data, and crucially exhibits positive transfer between tasks as a result of its multi-task approach.'
volume: 80
URL: https://proceedings.mlr.press/v80/espeholt18a.html
PDF: http://proceedings.mlr.press/v80/espeholt18a/espeholt18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-espeholt18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lasse
family: Espeholt
- given: Hubert
family: Soyer
- given: Remi
family: Munos
- given: Karen
family: Simonyan
- given: Vlad
family: Mnih
- given: Tom
family: Ward
- given: Yotam
family: Doron
- given: Vlad
family: Firoiu
- given: Tim
family: Harley
- given: Iain
family: Dunning
- given: Shane
family: Legg
- given: Koray
family: Kavukcuoglu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1407-1416
id: espeholt18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1407
lastpage: 1416
published: 2018-07-03 00:00:00 +0000
- title: 'Scalable Gaussian Processes with Grid-Structured Eigenfunctions (GP-GRIEF)'
abstract: 'We introduce a kernel approximation strategy that enables computation of the Gaussian process log marginal likelihood and all hyperparameter derivatives in O(p) time. Our GRIEF kernel consists of p eigenfunctions found using a Nystrom approximation from a dense Cartesian product grid of inducing points. By exploiting algebraic properties of Kronecker and Khatri-Rao tensor products, computational complexity of the training procedure can be practically independent of the number of inducing points. This allows us to use arbitrarily many inducing points to achieve a globally accurate kernel approximation, even in high-dimensional problems. The fast likelihood evaluation enables type-I or II Bayesian inference on large-scale datasets. We benchmark our algorithms on real-world problems with up to two-million training points and 10^33 inducing points.'
volume: 80
URL: https://proceedings.mlr.press/v80/evans18a.html
PDF: http://proceedings.mlr.press/v80/evans18a/evans18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-evans18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Trefor
family: Evans
- given: Prasanth
family: Nair
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1417-1426
id: evans18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1417
lastpage: 1426
published: 2018-07-03 00:00:00 +0000
- title: 'The Limits of Maxing, Ranking, and Preference Learning'
abstract: 'We present a comprehensive understanding of three important problems in PAC preference learning: maximum selection (maxing), ranking, and estimating *all* pairwise preference probabilities, in the adaptive setting. With just Weak Stochastic Transitivity, we show that maxing requires $\Omega(n^2)$ comparisons and with slightly more restrictive Medium Stochastic Transitivity, we present a linear complexity maxing algorithm. With Strong Stochastic Transitivity and Stochastic Triangle Inequality, we derive a ranking algorithm with optimal $\mathcal{O}(n\log n)$ complexity and an optimal algorithm that estimates all pairwise preference probabilities.'
volume: 80
URL: https://proceedings.mlr.press/v80/falahatgar18a.html
PDF: http://proceedings.mlr.press/v80/falahatgar18a/falahatgar18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-falahatgar18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Moein
family: Falahatgar
- given: Ayush
family: Jain
- given: Alon
family: Orlitsky
- given: Venkatadheeraj
family: Pichapati
- given: Vaishakh
family: Ravindrakumar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1427-1436
id: falahatgar18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1427
lastpage: 1436
published: 2018-07-03 00:00:00 +0000
- title: 'BOHB: Robust and Efficient Hyperparameter Optimization at Scale'
abstract: 'Modern deep learning methods are very sensitive to many hyperparameters, and, due to the long training times of state-of-the-art models, vanilla Bayesian hyperparameter optimization is typically computationally infeasible. On the other hand, bandit-based configuration evaluation approaches based on random search lack guidance and do not converge to the best configurations as quickly. Here, we propose to combine the benefits of both Bayesian optimization and bandit-based methods, in order to achieve the best of both worlds: strong anytime performance and fast convergence to optimal configurations. We propose a new practical state-of-the-art hyperparameter optimization method, which consistently outperforms both Bayesian optimization and Hyperband on a wide range of problem types, including high-dimensional toy functions, support vector machines, feed-forward neural networks, Bayesian neural networks, deep reinforcement learning, and convolutional neural networks. Our method is robust and versatile, while at the same time being conceptually simple and easy to implement.'
volume: 80
URL: https://proceedings.mlr.press/v80/falkner18a.html
PDF: http://proceedings.mlr.press/v80/falkner18a/falkner18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-falkner18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Stefan
family: Falkner
- given: Aaron
family: Klein
- given: Frank
family: Hutter
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1437-1446
id: falkner18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1437
lastpage: 1446
published: 2018-07-03 00:00:00 +0000
- title: 'More Robust Doubly Robust Off-policy Evaluation'
abstract: 'We study the problem of off-policy evaluation (OPE) in reinforcement learning (RL), where the goal is to estimate the performance of a policy from the data generated by another policy(ies). In particular, we focus on the doubly robust (DR) estimators that consist of an importance sampling (IS) component and a performance model, and utilize the low (or zero) bias of IS and low variance of the model at the same time. Although the accuracy of the model has a huge impact on the overall performance of DR, most of the work on using the DR estimators in OPE has been focused on improving the IS part, and not much on how to learn the model. In this paper, we propose alternative DR estimators, called more robust doubly robust (MRDR), that learn the model parameter by minimizing the variance of the DR estimator. We first present a formulation for learning the DR model in RL. We then derive formulas for the variance of the DR estimator in both contextual bandits and RL, such that their gradients w.r.t. the model parameters can be estimated from the samples, and propose methods to efficiently minimize the variance. We prove that the MRDR estimators are strongly consistent and asymptotically optimal. Finally, we evaluate MRDR in bandits and RL benchmark problems, and compare its performance with the existing methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/farajtabar18a.html
PDF: http://proceedings.mlr.press/v80/farajtabar18a/farajtabar18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-farajtabar18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mehrdad
family: Farajtabar
- given: Yinlam
family: Chow
- given: Mohammad
family: Ghavamzadeh
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1447-1456
id: farajtabar18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1447
lastpage: 1456
published: 2018-07-03 00:00:00 +0000
- title: 'Efficient and Consistent Adversarial Bipartite Matching'
abstract: 'Many important structured prediction problems, including learning to rank items, correspondence-based natural language processing, and multi-object tracking, can be formulated as weighted bipartite matching optimizations. Existing structured prediction approaches have significant drawbacks when applied under the constraints of perfect bipartite matchings. Exponential family probabilistic models, such as the conditional random field (CRF), provide statistical consistency guarantees, but suffer computationally from the need to compute the normalization term of its distribution over matchings, which is a #P-hard matrix permanent computation. In contrast, the structured support vector machine (SSVM) provides computational efficiency, but lacks Fisher consistency, meaning that there are distributions of data for which it cannot learn the optimal matching even under ideal learning conditions (i.e., given the true distribution and selecting from all measurable potential functions). We propose adversarial bipartite matching to avoid both of these limitations. We develop this approach algorithmically, establish its computational efficiency and Fisher consistency properties, and apply it to matching problems that demonstrate its empirical benefits.'
volume: 80
URL: https://proceedings.mlr.press/v80/fathony18a.html
PDF: http://proceedings.mlr.press/v80/fathony18a/fathony18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-fathony18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rizal
family: Fathony
- given: Sima
family: Behpour
- given: Xinhua
family: Zhang
- given: Brian
family: Ziebart
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1457-1466
id: fathony18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1457
lastpage: 1466
published: 2018-07-03 00:00:00 +0000
- title: 'Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator'
abstract: 'Direct policy gradient methods for reinforcement learning and continuous control problems are a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model, 2) they are an “end-to-end” approach, directly optimizing the performance metric of interest, 3) they inherently allow for richly parameterized policies. A notable drawback is that even in the most basic continuous control problem (that of linear quadratic regulators), these methods must solve a non-convex optimization problem, where little is understood about their efficiency from both computational and statistical perspectives. In contrast, system identification and model based planning in optimal control theory have a much more solid theoretical footing, where much is known with regards to their computational and statistical properties. This work bridges this gap showing that (model free) policy gradient methods globally converge to the optimal solution and are efficient (polynomially so in relevant problem dependent quantities) with regards to their sample and computational complexities.'
volume: 80
URL: https://proceedings.mlr.press/v80/fazel18a.html
PDF: http://proceedings.mlr.press/v80/fazel18a/fazel18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-fazel18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maryam
family: Fazel
- given: Rong
family: Ge
- given: Sham
family: Kakade
- given: Mehran
family: Mesbahi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1467-1476
id: fazel18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1467
lastpage: 1476
published: 2018-07-03 00:00:00 +0000
- title: 'CRVI: Convex Relaxation for Variational Inference'
abstract: 'We present a new technique for solving non-convex variational inference optimization problems. Variational inference is a widely used method for posterior approximation in which the inference problem is transformed into an optimization problem. For most models, this optimization is highly non-convex and so hard to solve. In this paper, we introduce a new approach to solving the variational inference optimization based on convex relaxation and semidefinite programming. Our theoretical results guarantee very tight relaxation bounds that get nearer to the global optimal solution than traditional coordinate ascent. We evaluate the performance of our approach on regression and sparse coding.'
volume: 80
URL: https://proceedings.mlr.press/v80/fazelnia18a.html
PDF: http://proceedings.mlr.press/v80/fazelnia18a/fazelnia18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-fazelnia18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ghazal
family: Fazelnia
- given: John
family: Paisley
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1477-1485
id: fazelnia18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1477
lastpage: 1485
published: 2018-07-03 00:00:00 +0000
- title: 'Fourier Policy Gradients'
abstract: 'We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.'
volume: 80
URL: https://proceedings.mlr.press/v80/fellows18a.html
PDF: http://proceedings.mlr.press/v80/fellows18a/fellows18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-fellows18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Fellows
- given: Kamil
family: Ciosek
- given: Shimon
family: Whiteson
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1486-1495
id: fellows18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1486
lastpage: 1495
published: 2018-07-03 00:00:00 +0000
- title: 'Nonparametric variable importance using an augmented neural network with multi-task learning'
abstract: 'In predictive modeling applications, it is often of interest to determine the relative contribution of subsets of features in explaining the variability of an outcome. It is useful to consider this variable importance as a function of the unknown, underlying data-generating mechanism rather than the specific predictive algorithm used to fit the data. In this paper, we connect these ideas in nonparametric variable importance to machine learning, and provide a method for efficient estimation of variable importance when building a predictive model using a neural network. We show how a single augmented neural network with multi-task learning simultaneously estimates the importance of many feature subsets, improving on previous procedures for estimating importance. We demonstrate on simulated data that our method is both accurate and computationally efficient, and apply our method to both a study of heart disease and for predicting mortality in ICU patients.'
volume: 80
URL: https://proceedings.mlr.press/v80/feng18a.html
PDF: http://proceedings.mlr.press/v80/feng18a/feng18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-feng18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jean
family: Feng
- given: Brian
family: Williamson
- given: Noah
family: Simon
- given: Marco
family: Carone
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1496-1505
id: feng18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1496
lastpage: 1505
published: 2018-07-03 00:00:00 +0000
- title: 'Closed-form Marginal Likelihood in Gamma-Poisson Matrix Factorization'
abstract: 'We present novel understandings of the Gamma-Poisson (GaP) model, a probabilistic matrix factorization model for count data. We show that GaP can be rewritten free of the score/activation matrix. This gives us new insights about the estimation of the topic/dictionary matrix by maximum marginal likelihood estimation. In particular, this explains the robustness of this estimator to over-specified values of the factorization rank, especially its ability to automatically prune irrelevant dictionary columns, as empirically observed in previous work. The marginalization of the activation matrix leads in turn to a new Monte Carlo Expectation-Maximization algorithm with favorable properties.'
volume: 80
URL: https://proceedings.mlr.press/v80/filstroff18a.html
PDF: http://proceedings.mlr.press/v80/filstroff18a/filstroff18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-filstroff18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Louis
family: Filstroff
- given: Alberto
family: Lumbreras
- given: Cédric
family: Févotte
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1506-1514
id: filstroff18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1506
lastpage: 1514
published: 2018-07-03 00:00:00 +0000
- title: 'Automatic Goal Generation for Reinforcement Learning Agents'
abstract: 'Reinforcement learning (RL) is a powerful technique to train an agent to perform a task; however, an agent that is trained using RL is only capable of achieving the single task that is specified via its reward function. Such an approach does not scale well to settings in which an agent needs to perform a diverse set of tasks, such as navigating to varying positions in a room or moving objects to varying locations. Instead, we propose a method that allows an agent to automatically discover the range of tasks that it is capable of performing in its environment. We use a generator network to propose tasks for the agent to try to accomplish, each task being specified as reaching a certain parametrized subset of the state-space. The generator network is optimized using adversarial training to produce tasks that are always at the appropriate level of difficulty for the agent, thus automatically producing a curriculum. We show that, by using this framework, an agent can efficiently and automatically learn to perform a wide set of tasks without requiring any prior knowledge of its environment, even when only sparse rewards are available. Videos and code available at https://sites.google.com/view/goalgeneration4rl.'
volume: 80
URL: https://proceedings.mlr.press/v80/florensa18a.html
PDF: http://proceedings.mlr.press/v80/florensa18a/florensa18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-florensa18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Carlos
family: Florensa
- given: David
family: Held
- given: Xinyang
family: Geng
- given: Pieter
family: Abbeel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1515-1528
id: florensa18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1515
lastpage: 1528
published: 2018-07-03 00:00:00 +0000
- title: 'DiCE: The Infinitely Differentiable Monte Carlo Estimator'
abstract: 'The score function estimator is widely used for estimating gradients of stochastic objectives in stochastic computation graphs (SCG), eg., in reinforcement learning and meta-learning. While deriving the first-order gradient estimators by differentiating a surrogate loss (SL) objective is computationally and conceptually simple, using the same approach for higher-order derivatives is more challenging. Firstly, analytically deriving and implementing such estimators is laborious and not compliant with automatic differentiation. Secondly, repeatedly applying SL to construct new objectives for each order derivative involves increasingly cumbersome graph manipulations. Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives. To address all these shortcomings in a unified way, we introduce DiCE, which provides a single objective that can be differentiated repeatedly, generating correct estimators of derivatives of any order in SCGs. Unlike SL, DiCE relies on automatic differentiation for performing the requisite graph manipulations. We verify the correctness of DiCE both through a proof and numerical evaluation of the DiCE derivative estimates. We also use DiCE to propose and evaluate a novel approach for multi-agent learning. Our code is available at https://github.com/alshedivat/lola.'
volume: 80
URL: https://proceedings.mlr.press/v80/foerster18a.html
PDF: http://proceedings.mlr.press/v80/foerster18a/foerster18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-foerster18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jakob
family: Foerster
- given: Gregory
family: Farquhar
- given: Maruan
family: Al-Shedivat
- given: Tim
family: Rocktäschel
- given: Eric
family: Xing
- given: Shimon
family: Whiteson
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1529-1538
id: foerster18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1529
lastpage: 1538
published: 2018-07-03 00:00:00 +0000
- title: 'Practical Contextual Bandits with Regression Oracles'
abstract: 'A major challenge in contextual bandits is to design general-purpose algorithms that are both practically useful and theoretically well-founded. We present a new technique that has the empirical and computational advantages of realizability-based approaches combined with the flexibility of agnostic methods. Our algorithms leverage the availability of a regression oracle for the value-function class, a more realistic and reasonable oracle than the classification oracles over policies typically assumed by agnostic methods. Our approach generalizes both UCB and LinUCB to far more expressive possible model classes and achieves low regret under certain distributional assumptions. In an extensive empirical evaluation, we find that our approach typically matches or outperforms both realizability-based and agnostic baselines.'
volume: 80
URL: https://proceedings.mlr.press/v80/foster18a.html
PDF: http://proceedings.mlr.press/v80/foster18a/foster18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-foster18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dylan
family: Foster
- given: Alekh
family: Agarwal
- given: Miroslav
family: Dudik
- given: Haipeng
family: Luo
- given: Robert
family: Schapire
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1539-1548
id: foster18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1539
lastpage: 1548
published: 2018-07-03 00:00:00 +0000
- title: 'Generative Temporal Models with Spatial Memory for Partially Observed Environments'
abstract: 'In model-based reinforcement learning, generative and temporal models of environments can be leveraged to boost agent performance, either by tuning the agent’s representations during training or via use as part of an explicit planning mechanism. However, their application in practice has been limited to simplistic environments, due to the difficulty of training such models in larger, potentially partially-observed and 3D environments. In this work we introduce a novel action-conditioned generative model of such challenging environments. The model features a non-parametric spatial memory system in which we store learned, disentangled representations of the environment. Low-dimensional spatial updates are computed using a state-space model that makes use of knowledge on the prior dynamics of the moving agent, and high-dimensional visual observations are modelled with a Variational Auto-Encoder. The result is a scalable architecture capable of performing coherent predictions over hundreds of time steps across a range of partially observed 2D and 3D environments.'
volume: 80
URL: https://proceedings.mlr.press/v80/fraccaro18a.html
PDF: http://proceedings.mlr.press/v80/fraccaro18a/fraccaro18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-fraccaro18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marco
family: Fraccaro
- given: Danilo
family: Rezende
- given: Yori
family: Zwols
- given: Alexander
family: Pritzel
- given: S. M. Ali
family: Eslami
- given: Fabio
family: Viola
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1549-1558
id: fraccaro18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1549
lastpage: 1558
published: 2018-07-03 00:00:00 +0000
- title: 'ADMM and Accelerated ADMM as Continuous Dynamical Systems'
abstract: 'Recently, there has been an increasing interest in using tools from dynamical systems to analyze the behavior of simple optimization algorithms such as gradient descent and accelerated variants. This paper strengthens such connections by deriving the differential equations that model the continuous limit of the sequence of iterates generated by the alternating direction method of multipliers, as well as an accelerated variant. We employ the direct method of Lyapunov to analyze the stability of critical points of the dynamical systems and to obtain associated convergence rates.'
volume: 80
URL: https://proceedings.mlr.press/v80/franca18a.html
PDF: http://proceedings.mlr.press/v80/franca18a/franca18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-franca18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Guilherme
family: Franca
- given: Daniel
family: Robinson
- given: Rene
family: Vidal
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1559-1567
id: franca18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1559
lastpage: 1567
published: 2018-07-03 00:00:00 +0000
- title: 'Bilevel Programming for Hyperparameter Optimization and Meta-Learning'
abstract: 'We introduce a framework based on bilevel programming that unifies gradient-based hyperparameter optimization and meta-learning. We show that an approximate version of the bilevel problem can be solved by taking into explicit account the optimization dynamics for the inner objective. Depending on the specific setting, the outer variables take either the meaning of hyperparameters in a supervised learning problem or parameters of a meta-learner. We provide sufficient conditions under which solutions of the approximate problem converge to those of the exact problem. We instantiate our approach for meta-learning in the case of deep learning where representation layers are treated as hyperparameters shared across a set of training episodes. In experiments, we confirm our theoretical findings, present encouraging results for few-shot learning and contrast the bilevel approach against classical approaches for learning-to-learn.'
volume: 80
URL: https://proceedings.mlr.press/v80/franceschi18a.html
PDF: http://proceedings.mlr.press/v80/franceschi18a/franceschi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-franceschi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Luca
family: Franceschi
- given: Paolo
family: Frasconi
- given: Saverio
family: Salzo
- given: Riccardo
family: Grazzi
- given: Massimiliano
family: Pontil
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1568-1577
id: franceschi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1568
lastpage: 1577
published: 2018-07-03 00:00:00 +0000
- title: 'Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning'
abstract: 'We introduce SCAL, an algorithm designed to perform efficient exploration-exploration in any unknown weakly-communicating Markov Decision Process (MDP) for which an upper bound c on the span of the optimal bias function is known. For an MDP with $S$ states, $A$ actions and $\Gamma \leq S$ possible next states, we prove a regret bound of $O(c\sqrt{\Gamma SAT})$, which significantly improves over existing algorithms (e.g., UCRL and PSRL), whose regret scales linearly with the MDP diameter $D$. In fact, the optimal bias span is finite and often much smaller than $D$ (e.g., $D=+\infty$ in non-communicating MDPs). A similar result was originally derived by Bartlett and Tewari (2009) for REGAL.C, for which no tractable algorithm is available. In this paper, we relax the optimization problem at the core of REGAL.C, we carefully analyze its properties, and we provide the first computationally efficient algorithm to solve it. Finally, we report numerical simulations supporting our theoretical findings and showing how SCAL significantly outperforms UCRL in MDPs with large diameter and small span.'
volume: 80
URL: https://proceedings.mlr.press/v80/fruit18a.html
PDF: http://proceedings.mlr.press/v80/fruit18a/fruit18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-fruit18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ronan
family: Fruit
- given: Matteo
family: Pirotta
- given: Alessandro
family: Lazaric
- given: Ronald
family: Ortner
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1578-1586
id: fruit18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1578
lastpage: 1586
published: 2018-07-03 00:00:00 +0000
- title: 'Addressing Function Approximation Error in Actor-Critic Methods'
abstract: 'In value-based reinforcement learning methods such as deep Q-learning, function approximation errors are known to lead to overestimated value estimates and suboptimal policies. We show that this problem persists in an actor-critic setting and propose novel mechanisms to minimize its effects on both the actor and the critic. Our algorithm builds on Double Q-learning, by taking the minimum value between a pair of critics to limit overestimation. We draw the connection between target networks and overestimation bias, and suggest delaying policy updates to reduce per-update error and further improve performance. We evaluate our method on the suite of OpenAI gym tasks, outperforming the state of the art in every environment tested.'
volume: 80
URL: https://proceedings.mlr.press/v80/fujimoto18a.html
PDF: http://proceedings.mlr.press/v80/fujimoto18a/fujimoto18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-fujimoto18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Scott
family: Fujimoto
- given: Herke
family: Hoof
- given: David
family: Meger
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1587-1596
id: fujimoto18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1587
lastpage: 1596
published: 2018-07-03 00:00:00 +0000
- title: 'Clipped Action Policy Gradient'
abstract: 'Many continuous control tasks have bounded action spaces. When policy gradient methods are applied to such tasks, out-of-bound actions need to be clipped before execution, while policies are usually optimized as if the actions are not clipped. We propose a policy gradient estimator that exploits the knowledge of actions being clipped to reduce the variance in estimation. We prove that our estimator, named clipped action policy gradient (CAPG), is unbiased and achieves lower variance than the conventional estimator that ignores action bounds. Experimental results demonstrate that CAPG generally outperforms the conventional estimator, indicating that it is a better policy gradient estimator for continuous control tasks. The source code is available at https://github.com/pfnet-research/capg.'
volume: 80
URL: https://proceedings.mlr.press/v80/fujita18a.html
PDF: http://proceedings.mlr.press/v80/fujita18a/fujita18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-fujita18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yasuhiro
family: Fujita
- given: Shin-ichi
family: Maeda
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1597-1606
id: fujita18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1597
lastpage: 1606
published: 2018-07-03 00:00:00 +0000
- title: 'Born Again Neural Networks'
abstract: 'Knowledge Distillation (KD) consists of transferring “knowledge” from one machine learning model (the teacher) to another (the student). Commonly, the teacher is a high-capacity model with formidable performance, while the student is more compact. By transferring knowledge, one hopes to benefit from the student’s compactness, without sacrificing too much performance. We study KD from a new perspective: rather than compressing models, we train students parameterized identically to their teachers. Surprisingly, these Born-Again Networks (BANs), outperform their teachers significantly, both on computer vision and language modeling tasks. Our experiments with BANs based on DenseNets demonstrate state-of-the-art performance on the CIFAR-10 (3.5%) and CIFAR-100 (15.5%) datasets, by validation error. Additional experiments explore two distillation objectives: (i) Confidence-Weighted by Teacher Max (CWTM) and (ii) Dark Knowledge with Permuted Predictions (DKPP). Both methods elucidate the essential components of KD, demonstrating the effect of the teacher outputs on both predicted and non-predicted classes.'
volume: 80
URL: https://proceedings.mlr.press/v80/furlanello18a.html
PDF: http://proceedings.mlr.press/v80/furlanello18a/furlanello18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-furlanello18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tommaso
family: Furlanello
- given: Zachary
family: Lipton
- given: Michael
family: Tschannen
- given: Laurent
family: Itti
- given: Anima
family: Anandkumar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1607-1616
id: furlanello18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1607
lastpage: 1616
published: 2018-07-03 00:00:00 +0000
- title: 'The Generalization Error of Dictionary Learning with Moreau Envelopes'
abstract: 'This is a theoretical study on the sample complexity of dictionary learning with a general type of reconstruction loss. The goal is to estimate a $m \times d$ matrix $D$ of unit-norm columns when the only available information is a set of training samples. Points $x$ in $\mathbb{R}^m$ are subsequently approximated by the linear combination $Da$ after solving the problem $\min_{a \in \mathbb{R}^d} \Phi(x - Da) + g(a)$; function $g:\mathbb{R}^d \to [0,+\infty)$ is either an indicator function or a sparsity promoting regularizer. Here is considered the case where $ \Phi(x) = \inf_{z \in \mathbb{R}^m} { ||x-z||_2^2 + h(||z||_2)}$ and $h$ is an even and univariate function on the real line. Connections are drawn between $\Phi$ and the Moreau envelope of $h$. A new sample complexity result concerning the $k$-sparse dictionary problem removes the spurious condition on the coherence of $D$ appearing in previous works. Finally, comments are made on the approximation error of certain families of losses. The derived generalization bounds are of order $\mathcal{O}(\sqrt{\log n /n})$ and valid without any further restrictions on the set of dictionaries with unit-norm columns.'
volume: 80
URL: https://proceedings.mlr.press/v80/georgogiannis18a.html
PDF: http://proceedings.mlr.press/v80/georgogiannis18a/georgogiannis18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-georgogiannis18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexandros
family: Georgogiannis
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1617-1625
id: georgogiannis18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1617
lastpage: 1625
published: 2018-07-03 00:00:00 +0000
- title: 'Local Private Hypothesis Testing: Chi-Square Tests'
abstract: 'The local model for differential privacy is emerging as the reference model for practical applications of collecting and sharing sensitive information while satisfying strong privacy guarantees. In the local model, there is no trusted entity which is allowed to have each individual’s raw data as is assumed in the traditional curator model. Individuals’ data are usually perturbed before sharing them. We explore the design of private hypothesis tests in the local model, where each data entry is perturbed to ensure the privacy of each participant. Specifically, we analyze locally private chi-square tests for goodness of fit and independence testing.'
volume: 80
URL: https://proceedings.mlr.press/v80/gaboardi18a.html
PDF: http://proceedings.mlr.press/v80/gaboardi18a/gaboardi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-gaboardi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marco
family: Gaboardi
- given: Ryan
family: Rogers
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1626-1635
id: gaboardi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1626
lastpage: 1635
published: 2018-07-03 00:00:00 +0000
- title: 'Inductive Two-Layer Modeling with Parametric Bregman Transfer'
abstract: 'Latent prediction models, exemplified by multi-layer networks, employ hidden variables that automate abstract feature discovery. They typically pose nonconvex optimization problems and effective semi-definite programming (SDP) relaxations have been developed to enable global solutions (Aslan et al., 2014).However, these models rely on nonparametric training of layer-wise kernel representations, and are therefore restricted to transductive learning which slows down test prediction. In this paper, we develop a new inductive learning framework for parametric transfer functions using matching losses. The result for ReLU utilizes completely positive matrices, and the inductive learner not only delivers superior accuracy but also offers an order of magnitude speedup over SDP with constant approximation guarantees.'
volume: 80
URL: https://proceedings.mlr.press/v80/ganapathiraman18a.html
PDF: http://proceedings.mlr.press/v80/ganapathiraman18a/ganapathiraman18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ganapathiraman18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vignesh
family: Ganapathiraman
- given: Zhan
family: Shi
- given: Xinhua
family: Zhang
- given: Yaoliang
family: Yu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1636-1645
id: ganapathiraman18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1636
lastpage: 1645
published: 2018-07-03 00:00:00 +0000
- title: 'Hyperbolic Entailment Cones for Learning Hierarchical Embeddings'
abstract: 'Learning graph representations via low-dimensional embeddings that preserve relevant network properties is an important class of problems in machine learning. We here present a novel method to embed directed acyclic graphs. Following prior work, we first advocate for using hyperbolic spaces which provably model tree-like structures better than Euclidean geometry. Second, we view hierarchical relations as partial orders defined using a family of nested geodesically convex cones. We prove that these entailment cones admit an optimal shape with a closed form expression both in the Euclidean and hyperbolic spaces, and they canonically define the embedding learning process. Experiments show significant improvements of our method over strong recent baselines both in terms of representational capacity and generalization.'
volume: 80
URL: https://proceedings.mlr.press/v80/ganea18a.html
PDF: http://proceedings.mlr.press/v80/ganea18a/ganea18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ganea18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Octavian
family: Ganea
- given: Gary
family: Becigneul
- given: Thomas
family: Hofmann
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1646-1655
id: ganea18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1646
lastpage: 1655
published: 2018-07-03 00:00:00 +0000
- title: 'Parameterized Algorithms for the Matrix Completion Problem'
abstract: 'We consider two matrix completion problems, in which we are given a matrix with missing entries and the task is to complete the matrix in a way that (1) minimizes the rank, or (2) minimizes the number of distinct rows. We study the parameterized complexity of the two aforementioned problems with respect to several parameters of interest, including the minimum number of matrix rows, columns, and rows plus columns needed to cover all missing entries. We obtain new algorithmic results showing that, for the bounded domain case, both problems are fixed-parameter tractable with respect to all aforementioned parameters. We complement these results with a lower-bound result for the unbounded domain case that rules out fixed-parameter tractability w.r.t. some of the parameters under consideration.'
volume: 80
URL: https://proceedings.mlr.press/v80/ganian18a.html
PDF: http://proceedings.mlr.press/v80/ganian18a/ganian18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ganian18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Robert
family: Ganian
- given: Iyad
family: Kanj
- given: Sebastian
family: Ordyniak
- given: Stefan
family: Szeider
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1656-1665
id: ganian18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1656
lastpage: 1665
published: 2018-07-03 00:00:00 +0000
- title: 'Synthesizing Programs for Images using Reinforced Adversarial Learning'
abstract: 'Advances in deep generative networks have led to impressive results in recent years. Nevertheless, such models can often waste their capacity on the minutiae of datasets, presumably due to weak inductive biases in their decoders. This is where graphics engines may come in handy since they abstract away low-level details and represent images as high-level programs. Current methods that combine deep learning and renderers are limited by hand-crafted likelihood or distance functions, a need for large amounts of supervision, or difficulties in scaling their inference algorithms to richer datasets. To mitigate these issues, we present SPIRAL, an adversarially trained agent that generates a program which is executed by a graphics engine to interpret and sample images. The goal of this agent is to fool a discriminator network that distinguishes between real and rendered data, trained with a distributed reinforcement learning setup without any supervision. A surprising finding is that using the discriminator’s output as a reward signal is the key to allow the agent to make meaningful progress at matching the desired output rendering. To the best of our knowledge, this is the first demonstration of an end-to-end, unsupervised and adversarial inverse graphics agent on challenging real world (MNIST, Omniglot, CelebA) and synthetic 3D datasets. A video of the agent can be found at https://youtu.be/iSyvwAwa7vk.'
volume: 80
URL: https://proceedings.mlr.press/v80/ganin18a.html
PDF: http://proceedings.mlr.press/v80/ganin18a/ganin18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ganin18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yaroslav
family: Ganin
- given: Tejas
family: Kulkarni
- given: Igor
family: Babuschkin
- given: S. M. Ali
family: Eslami
- given: Oriol
family: Vinyals
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1666-1675
id: ganin18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1666
lastpage: 1675
published: 2018-07-03 00:00:00 +0000
- title: 'Spotlight: Optimizing Device Placement for Training Deep Neural Networks'
abstract: 'Training deep neural networks (DNNs) requires an increasing amount of computation resources, and it becomes typical to use a mixture of GPU and CPU devices. Due to the heterogeneity of these devices, a recent challenge is how each operation in a neural network can be optimally placed on these devices, so that the training process can take the shortest amount of time possible. The current state-of-the-art solution uses reinforcement learning based on the policy gradient method, and it suffers from suboptimal training times. In this paper, we propose Spotlight, a new reinforcement learning algorithm based on proximal policy optimization, designed specifically for finding an optimal device placement for training DNNs. The design of our new algorithm relies upon a new model of the device placement problem: by modeling it as a Markov decision process with multiple stages, we are able to prove that Spotlight achieves a theoretical guarantee on performance improvements. We have implemented Spotlight in the CIFAR-10 benchmark and deployed it on the Google Cloud platform. Extensive experiments have demonstrated that the training time with placements recommended by Spotlight is 60.9% of that recommended by the policy gradient method.'
volume: 80
URL: https://proceedings.mlr.press/v80/gao18a.html
PDF: http://proceedings.mlr.press/v80/gao18a/gao18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-gao18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuanxiang
family: Gao
- given: Li
family: Chen
- given: Baochun
family: Li
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1676-1684
id: gao18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1676
lastpage: 1684
published: 2018-07-03 00:00:00 +0000
- title: 'Parallel Bayesian Network Structure Learning'
abstract: 'Recent advances in Bayesian Network (BN) structure learning have focused on local-to-global learning, where the graph structure is learned via one local subgraph at a time. As a natural progression, we investigate parallel learning of BN structures via multiple learning agents simultaneously, where each agent learns one local subgraph at a time. We find that parallel learning can reduce the number of subgraphs requiring structure learning by storing previously queried results and communicating (even partial) results among agents. More specifically, by using novel rules on query subset and superset inference, many subgraph structures can be inferred without learning. We provide a sound and complete parallel structure learning (PSL) algorithm, and demonstrate its improved efficiency over state-of-the-art single-thread learning algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/gao18b.html
PDF: http://proceedings.mlr.press/v80/gao18b/gao18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-gao18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tian
family: Gao
- given: Dennis
family: Wei
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1685-1694
id: gao18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1685
lastpage: 1694
published: 2018-07-03 00:00:00 +0000
- title: 'Structured Output Learning with Abstention: Application to Accurate Opinion Prediction'
abstract: 'Motivated by Supervised Opinion Analysis, we propose a novel framework devoted to Structured Output Learning with Abstention (SOLA). The structure prediction model is able to abstain from predicting some labels in the structured output at a cost chosen by the user in a flexible way. For that purpose, we decompose the problem into the learning of a pair of predictors, one devoted to structured abstention and the other, to structured output prediction. To compare fully labeled training data with predictions potentially containing abstentions, we define a wide class of asymmetric abstention-aware losses. Learning is achieved by surrogate regression in an appropriate feature space while prediction with abstention is performed by solving a new pre-image problem. Thus, SOLA extends recent ideas about Structured Output Prediction via surrogate problems and calibration theory and enjoys statistical guarantees on the resulting excess risk. Instantiated on a hierarchical abstention-aware loss, SOLA is shown to be relevant for fine-grained opinion mining and gives state-of-the-art results on this task. Moreover, the abstention-aware representations can be used to competitively predict user-review ratings based on a sentence-level opinion predictor.'
volume: 80
URL: https://proceedings.mlr.press/v80/garcia18a.html
PDF: http://proceedings.mlr.press/v80/garcia18a/garcia18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-garcia18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexandre
family: Garcia
- given: Chloé
family: Clavel
- given: Slim
family: Essid
- given: Florence
family: d’Alche-Buc
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1695-1703
id: garcia18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1695
lastpage: 1703
published: 2018-07-03 00:00:00 +0000
- title: 'Conditional Neural Processes'
abstract: 'Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet, GPs are computationally expensive, and it can be hard to design appropriate priors. In this paper we propose a family of neural models, Conditional Neural Processes (CNPs), that combine the benefits of both. CNPs are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent. CNPs make accurate predictions after observing only a handful of training data points, yet scale to complex functions and large datasets. We demonstrate the performance and versatility of the approach on a range of canonical machine learning tasks, including regression, classification and image completion.'
volume: 80
URL: https://proceedings.mlr.press/v80/garnelo18a.html
PDF: http://proceedings.mlr.press/v80/garnelo18a/garnelo18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-garnelo18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marta
family: Garnelo
- given: Dan
family: Rosenbaum
- given: Christopher
family: Maddison
- given: Tiago
family: Ramalho
- given: David
family: Saxton
- given: Murray
family: Shanahan
- given: Yee Whye
family: Teh
- given: Danilo
family: Rezende
- given: S. M. Ali
family: Eslami
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1704-1713
id: garnelo18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1704
lastpage: 1713
published: 2018-07-03 00:00:00 +0000
- title: 'Temporal Poisson Square Root Graphical Models'
abstract: 'We propose temporal Poisson square root graphical models (TPSQRs), a generalization of Poisson square root graphical models (PSQRs) specifically designed for modeling longitudinal event data. By estimating the temporal relationships for all possible pairs of event types, TPSQRs can offer a holistic perspective about whether the occurrences of any given event type could excite or inhibit any other type. A TPSQR is learned by estimating a collection of interrelated PSQRs that share the same template parameterization. These PSQRs are estimated jointly in a pseudo-likelihood fashion, where Poisson pseudo-likelihood is used to approximate the original more computationally intensive pseudo-likelihood problem stemming from PSQRs. Theoretically, we demonstrate that under mild assumptions, the Poisson pseudolikelihood approximation is sparsistent for recovering the underlying PSQR. Empirically, we learn TPSQRs from a real-world large-scale electronic health record (EHR) with millions of drug prescription and condition diagnosis events, for adverse drug reaction (ADR) detection. Experimental results demonstrate that the learned TPSQRs can recover ADR signals from the EHR effectively and efficiently.'
volume: 80
URL: https://proceedings.mlr.press/v80/geng18a.html
PDF: http://proceedings.mlr.press/v80/geng18a/geng18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-geng18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sinong
family: Geng
- given: Zhaobin
family: Kuang
- given: Peggy
family: Peissig
- given: David
family: Page
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1714-1723
id: geng18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1714
lastpage: 1723
published: 2018-07-03 00:00:00 +0000
- title: 'Budgeted Experiment Design for Causal Structure Learning'
abstract: 'We study the problem of causal structure learning when the experimenter is limited to perform at most $k$ non-adaptive experiments of size $1$. We formulate the problem of finding the best intervention target set as an optimization problem, which aims to maximize the average number of edges whose directions are resolved. We prove that the corresponding objective function is submodular and a greedy algorithm suffices to achieve $(1-\frac{1}{e})$-approximation of the optimal value. We further present an accelerated variant of the greedy algorithm, which can lead to orders of magnitude performance speedup. We validate our proposed approach on synthetic and real graphs. The results show that compared to the purely observational setting, our algorithm orients the majority of the edges through a considerably small number of interventions.'
volume: 80
URL: https://proceedings.mlr.press/v80/ghassami18a.html
PDF: http://proceedings.mlr.press/v80/ghassami18a/ghassami18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ghassami18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: AmirEmad
family: Ghassami
- given: Saber
family: Salehkaleybar
- given: Negar
family: Kiyavash
- given: Elias
family: Bareinboim
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1724-1733
id: ghassami18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1724
lastpage: 1733
published: 2018-07-03 00:00:00 +0000
- title: 'Linear Spectral Estimators and an Application to Phase Retrieval'
abstract: 'Phase retrieval refers to the problem of recovering real- or complex-valued vectors from magnitude measurements. The best-known algorithms for this problem are iterative in nature and rely on so-called spectral initializers that provide accurate initialization vectors. We propose a novel class of estimators suitable for general nonlinear measurement systems, called linear spectral estimators (LSPEs), which can be used to compute accurate initialization vectors for phase retrieval problems. The proposed LSPEs not only provide accurate initialization vectors for noisy phase retrieval systems with structured or random measurement matrices, but also enable the derivation of sharp and nonasymptotic mean-squared error bounds. We demonstrate the efficacy of LSPEs on synthetic and real-world phase retrieval problems, and we show that our estimators significantly outperform existing methods for structured measurement systems that arise in practice.'
volume: 80
URL: https://proceedings.mlr.press/v80/ghods18a.html
PDF: http://proceedings.mlr.press/v80/ghods18a/ghods18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ghods18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ramina
family: Ghods
- given: Andrew
family: Lan
- given: Tom
family: Goldstein
- given: Christoph
family: Studer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1734-1743
id: ghods18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1734
lastpage: 1743
published: 2018-07-03 00:00:00 +0000
- title: 'Structured Variational Learning of Bayesian Neural Networks with Horseshoe Priors'
abstract: 'Bayesian Neural Networks (BNNs) have recently received increasing attention for their ability to provide well-calibrated posterior uncertainties. However, model selection—even choosing the number of nodes—remains an open question. Recent work has proposed the use of a horseshoe prior over node pre-activations of a Bayesian neural network, which effectively turns off nodes that do not help explain the data. In this work, we propose several modeling and inference advances that consistently improve the compactness of the model learned while maintaining predictive performance, especially in smaller-sample settings including reinforcement learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/ghosh18a.html
PDF: http://proceedings.mlr.press/v80/ghosh18a/ghosh18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ghosh18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Soumya
family: Ghosh
- given: Jiayu
family: Yao
- given: Finale
family: Doshi-Velez
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1744-1753
id: ghosh18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1744
lastpage: 1753
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Maximum-A-Posteriori Perturbation Models for Structured Prediction in Polynomial Time'
abstract: 'MAP perturbation models have emerged as a powerful framework for inference in structured prediction. Such models provide a way to efficiently sample from the Gibbs distribution and facilitate predictions that are robust to random noise. In this paper, we propose a provably polynomial time randomized algorithm for learning the parameters of perturbed MAP predictors. Our approach is based on minimizing a novel Rademacher-based generalization bound on the expected loss of a perturbed MAP predictor, which can be computed in polynomial time. We obtain conditions under which our randomized learning algorithm can guarantee generalization to unseen examples.'
volume: 80
URL: https://proceedings.mlr.press/v80/ghoshal18a.html
PDF: http://proceedings.mlr.press/v80/ghoshal18a/ghoshal18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ghoshal18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Asish
family: Ghoshal
- given: Jean
family: Honorio
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1754-1762
id: ghoshal18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1754
lastpage: 1762
published: 2018-07-03 00:00:00 +0000
- title: 'Robust and Scalable Models of Microbiome Dynamics'
abstract: 'Microbes are everywhere, including in and on our bodies, and have been shown to play key roles in a variety of prevalent human diseases. Consequently, there has been intense interest in the design of bacteriotherapies or "bugs as drugs," which are communities of bacteria administered to patients for specific therapeutic applications. Central to the design of such therapeutics is an understanding of the causal microbial interaction network and the population dynamics of the organisms. In this work we present a Bayesian nonparametric model and associated efficient inference algorithm that addresses the key conceptual and practical challenges of learning microbial dynamics from time series microbe abundance data. These challenges include high-dimensional (300+ strains of bacteria in the gut) but temporally sparse and non-uniformly sampled data; high measurement noise; and, nonlinear and physically non-negative dynamics. Our contributions include a new type of dynamical systems model for microbial dynamics based on what we term interaction modules, or learned clusters of latent variables with redundant interaction structure (reducing the expected number of interaction coefficients from O(n^2) to O((log n)^2)); a fully Bayesian formulation of the stochastic dynamical systems model that propagates measurement and latent state uncertainty throughout the model; and introduction of a temporally varying auxiliary variable technique to enable efficient inference by relaxing the hard non-negativity constraint on states. We apply our method to simulated and real data, and demonstrate the utility of our technique for system identification from limited data and gaining new biological insights into bacteriotherapy design.'
volume: 80
URL: https://proceedings.mlr.press/v80/gibson18a.html
PDF: http://proceedings.mlr.press/v80/gibson18a/gibson18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-gibson18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Travis
family: Gibson
- given: Georg
family: Gerber
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1763-1772
id: gibson18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1763
lastpage: 1772
published: 2018-07-03 00:00:00 +0000
- title: 'Non-linear motor control by local learning in spiking neural networks'
abstract: 'Learning weights in a spiking neural network with hidden neurons, using local, stable and online rules, to control non-linear body dynamics is an open problem. Here, we employ a supervised scheme, Feedback-based Online Local Learning Of Weights (FOLLOW), to train a heterogeneous network of spiking neurons with hidden layers, to control a two-link arm so as to reproduce a desired state trajectory. We show that the network learns an inverse model of the non-linear dynamics, i.e. it infers from state trajectory as input to the network, the continuous-time command that produced the trajectory. Connection weights are adjusted via a local plasticity rule that involves pre-synaptic firing and post-synaptic feedback of the error in the inferred command. We propose a network architecture, termed differential feedforward, and show that it gives a lower test error than other feedforward and recurrent architectures. We demonstrate the performance of the inverse model to control a two-link arm along a desired trajectory.'
volume: 80
URL: https://proceedings.mlr.press/v80/gilra18a.html
PDF: http://proceedings.mlr.press/v80/gilra18a/gilra18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-gilra18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aditya
family: Gilra
- given: Wulfram
family: Gerstner
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1773-1782
id: gilra18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1773
lastpage: 1782
published: 2018-07-03 00:00:00 +0000
- title: 'Learning One Convolutional Layer with Overlapping Patches'
abstract: 'We give the first provably efficient algorithm for learning a one hidden layer convolutional network with respect to a general class of (potentially overlapping) patches under mild conditions on the underlying distribution. We prove that our framework captures commonly used schemes from computer vision, including one-dimensional and two-dimensional “patch and stride” convolutions. Our algorithm– *Convotron*– is inspired by recent work applying isotonic regression to learning neural networks. Convotron uses a simple, iterative update rule that is stochastic in nature and tolerant to noise (requires only that the conditional mean function is a one layer convolutional network, as opposed to the realizable setting). In contrast to gradient descent, Convotron requires no special initialization or learning-rate tuning to converge to the global optimum. We also point out that learning one hidden convolutional layer with respect to a Gaussian distribution and just *one* disjoint patch $P$ (the other patches may be arbitrary) is *easy* in the following sense: Convotron can efficiently recover the hidden weight vector by updating *only* in the direction of $P$.'
volume: 80
URL: https://proceedings.mlr.press/v80/goel18a.html
PDF: http://proceedings.mlr.press/v80/goel18a/goel18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-goel18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Surbhi
family: Goel
- given: Adam
family: Klivans
- given: Raghu
family: Meka
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1783-1791
id: goel18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1783
lastpage: 1791
published: 2018-07-03 00:00:00 +0000
- title: 'Visualizing and Understanding Atari Agents'
abstract: 'While deep reinforcement learning (deep RL) agents are effective at maximizing rewards, it is often unclear what strategies they use to do so. In this paper, we take a step toward explaining deep RL agents through a case study using Atari 2600 environments. In particular, we focus on using saliency maps to understand how an agent learns and executes a policy. We introduce a method for generating useful saliency maps and use it to show 1) what strong agents attend to, 2) whether agents are making decisions for the right or wrong reasons, and 3) how agents evolve during learning. We also test our method on non-expert human subjects and find that it improves their ability to reason about these agents. Overall, our results show that saliency information can provide significant insight into an RL agent’s decisions and learning behavior.'
volume: 80
URL: https://proceedings.mlr.press/v80/greydanus18a.html
PDF: http://proceedings.mlr.press/v80/greydanus18a/greydanus18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-greydanus18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Samuel
family: Greydanus
- given: Anurag
family: Koul
- given: Jonathan
family: Dodge
- given: Alan
family: Fern
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1792-1801
id: greydanus18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1792
lastpage: 1801
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Policy Representations in Multiagent Systems'
abstract: 'Modeling agent behavior is central to understanding the emergence of complex phenomena in multiagent systems. Prior work in agent modeling has largely been task-specific and driven by hand-engineering domain-specific prior knowledge. We propose a general learning framework for modeling agent behavior in any multiagent system using only a handful of interaction data. Our framework casts agent modeling as a representation learning problem. Consequently, we construct a novel objective inspired by imitation learning and agent identification and design an algorithm for unsupervised learning of representations of agent policies. We demonstrate empirically the utility of the proposed framework in (i) a challenging high-dimensional competitive environment for continuous control and (ii) a cooperative environment for communication, on supervised predictive tasks, unsupervised clustering, and policy optimization using deep reinforcement learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/grover18a.html
PDF: http://proceedings.mlr.press/v80/grover18a/grover18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-grover18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aditya
family: Grover
- given: Maruan
family: Al-Shedivat
- given: Jayesh
family: Gupta
- given: Yuri
family: Burda
- given: Harrison
family: Edwards
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1802-1811
id: grover18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1802
lastpage: 1811
published: 2018-07-03 00:00:00 +0000
- title: 'Faster Derivative-Free Stochastic Algorithm for Shared Memory Machines'
abstract: 'Asynchronous parallel stochastic gradient optimization has been playing a pivotal role to solve large-scale machine learning problems in big data applications. Zeroth-order (derivative-free) methods estimate the gradient only by two function evaluations, thus have been applied to solve the problems where the explicit gradient calculations are computationally expensive or infeasible. Recently, the first asynchronous parallel stochastic zeroth-order algorithm (AsySZO) was proposed. However, its convergence rate is O(1/SQRT{T}) for the smooth, possibly non-convex learning problems, which is significantly slower than O(1/T) the best convergence rate of (asynchronous) stochastic gradient algorithm. To fill this gap, in this paper, we first point out the fundamental reason leading to the slow convergence rate of AsySZO, and then propose a new asynchronous stochastic zerothorder algorithm (AsySZO+). We provide a faster convergence rate O(1/bT) (b is the mini-batch size) for AsySZO+ by the rigorous theoretical analysis, which is a significant improvement over O(1/SQRT{T}). The experimental results on the application of ensemble learning confirm that our AsySZO+ has a faster convergence rate than the existing (asynchronous) stochastic zeroth-order algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/gu18a.html
PDF: http://proceedings.mlr.press/v80/gu18a/gu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-gu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bin
family: Gu
- given: Zhouyuan
family: Huo
- given: Cheng
family: Deng
- given: Heng
family: Huang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1812-1821
id: gu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1812
lastpage: 1821
published: 2018-07-03 00:00:00 +0000
- title: 'Learning to search with MCTSnets'
abstract: 'Planning problems are among the most important and well-studied problems in artificial intelligence. They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree. Among these algorithms, Monte-Carlo tree search (MCTS) is one of the most general, powerful and widely used. A typical implementation of MCTS uses cleverly designed rules, optimised to the particular characteristics of the domain. These rules control where the simulation traverses, what to evaluate in the states that are reached, and how to back-up those evaluations. In this paper we instead learn where, what and how to search. Our architecture, which we call an MCTSnet, incorporates simulation-based search inside a neural network, by expanding, evaluating and backing-up a vector embedding. The parameters of the network are trained end-to-end using gradient-based optimisation. When applied to small searches in the well-known planning problem Sokoban, the learned search algorithm significantly outperformed MCTS baselines.'
volume: 80
URL: https://proceedings.mlr.press/v80/guez18a.html
PDF: http://proceedings.mlr.press/v80/guez18a/guez18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-guez18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arthur
family: Guez
- given: Theophane
family: Weber
- given: Ioannis
family: Antonoglou
- given: Karen
family: Simonyan
- given: Oriol
family: Vinyals
- given: Daan
family: Wierstra
- given: Remi
family: Munos
- given: David
family: Silver
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1822-1831
id: guez18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1822
lastpage: 1831
published: 2018-07-03 00:00:00 +0000
- title: 'Characterizing Implicit Bias in Terms of Optimization Geometry'
abstract: 'We study the bias of generic optimization methods, including Mirror Descent, Natural Gradient Descent and Steepest Descent with respect to different potentials and norms, when optimizing underdetermined linear models or separable linear classification problems. We ask the question of whether the global minimum (among the many possible global minima) reached by optimization can be characterized in terms of the potential or norm, and indecently of hyper-parameter choices, such as stepsize and momentum.'
volume: 80
URL: https://proceedings.mlr.press/v80/gunasekar18a.html
PDF: http://proceedings.mlr.press/v80/gunasekar18a/gunasekar18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-gunasekar18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Suriya
family: Gunasekar
- given: Jason
family: Lee
- given: Daniel
family: Soudry
- given: Nathan
family: Srebro
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1832-1841
id: gunasekar18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1832
lastpage: 1841
published: 2018-07-03 00:00:00 +0000
- title: 'Shampoo: Preconditioned Stochastic Tensor Optimization'
abstract: 'Preconditioned gradient methods are among the most general and powerful tools in optimization. However, preconditioning requires storing and manipulating prohibitively large matrices. We describe and analyze a new structure-aware preconditioning algorithm, called Shampoo, for stochastic optimization over tensor spaces. Shampoo maintains a set of preconditioning matrices, each of which operates on a single dimension, contracting over the remaining dimensions. We establish convergence guarantees in the stochastic convex setting, the proof of which builds upon matrix trace inequalities. Our experiments with state-of-the-art deep learning models show that Shampoo is capable of converging considerably faster than commonly used optimizers. Surprisingly, although it involves a more complex update rule, Shampoo’s runtime per step is comparable in practice to that of simple gradient methods such as SGD, AdaGrad, and Adam.'
volume: 80
URL: https://proceedings.mlr.press/v80/gupta18a.html
PDF: http://proceedings.mlr.press/v80/gupta18a/gupta18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-gupta18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vineet
family: Gupta
- given: Tomer
family: Koren
- given: Yoram
family: Singer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1842-1850
id: gupta18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1842
lastpage: 1850
published: 2018-07-03 00:00:00 +0000
- title: 'Latent Space Policies for Hierarchical Reinforcement Learning'
abstract: 'We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer’s policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.'
volume: 80
URL: https://proceedings.mlr.press/v80/haarnoja18a.html
PDF: http://proceedings.mlr.press/v80/haarnoja18a/haarnoja18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-haarnoja18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tuomas
family: Haarnoja
- given: Kristian
family: Hartikainen
- given: Pieter
family: Abbeel
- given: Sergey
family: Levine
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1851-1860
id: haarnoja18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1851
lastpage: 1860
published: 2018-07-03 00:00:00 +0000
- title: 'Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor'
abstract: 'Model-free deep reinforcement learning (RL) algorithms have been demonstrated on a range of challenging decision making and control tasks. However, these methods typically suffer from two major challenges: very high sample complexity and brittle convergence properties, which necessitate meticulous hyperparameter tuning. Both of these challenges severely limit the applicability of such methods to complex, real-world domains. In this paper, we propose soft actor-critic, an off-policy actor-critic deep RL algorithm based on the maximum entropy reinforcement learning framework. In this framework, the actor aims to maximize expected reward while also maximizing entropy. That is, to succeed at the task while acting as randomly as possible. Prior deep RL methods based on this framework have been formulated as Q-learning methods. By combining off-policy updates with a stable stochastic actor-critic formulation, our method achieves state-of-the-art performance on a range of continuous control benchmark tasks, outperforming prior on-policy and off-policy methods. Furthermore, we demonstrate that, in contrast to other off-policy algorithms, our approach is very stable, achieving very similar performance across different random seeds.'
volume: 80
URL: https://proceedings.mlr.press/v80/haarnoja18b.html
PDF: http://proceedings.mlr.press/v80/haarnoja18b/haarnoja18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-haarnoja18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tuomas
family: Haarnoja
- given: Aurick
family: Zhou
- given: Pieter
family: Abbeel
- given: Sergey
family: Levine
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1861-1870
id: haarnoja18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1861
lastpage: 1870
published: 2018-07-03 00:00:00 +0000
- title: 'Comparison-Based Random Forests'
abstract: 'Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A, B, C) and ask an oracle whether item A is closer to item B or to item C. In this paper, we propose a novel random forest algorithm for regression and classification that relies only on such triplet comparisons. In the theory part of this paper, we establish sufficient conditions for the consistency of such a forest. In a set of comprehensive experiments, we then demonstrate that the proposed random forest is efficient both for classification and regression. In particular, it is even competitive with other methods that have direct access to the metric representation of the data.'
volume: 80
URL: https://proceedings.mlr.press/v80/haghiri18a.html
PDF: http://proceedings.mlr.press/v80/haghiri18a/haghiri18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-haghiri18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siavash
family: Haghiri
- given: Damien
family: Garreau
- given: Ulrike
family: Luxburg
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1871-1880
id: haghiri18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1871
lastpage: 1880
published: 2018-07-03 00:00:00 +0000
- title: 'K-Beam Minimax: Efficient Optimization for Deep Adversarial Learning'
abstract: 'Minimax optimization plays a key role in adversarial training of machine learning algorithms, such as learning generative models, domain adaptation, privacy preservation, and robust learning. In this paper, we demonstrate the failure of alternating gradient descent in minimax optimization problems due to the discontinuity of solutions of the inner maximization. To address this, we propose a new $\epsilon$-subgradient descent algorithm that addresses this problem by simultaneously tracking $K$ candidate solutions. Practically, the algorithm can find solutions that previous saddle-point algorithms cannot find, with only a sublinear increase of complexity in $K$. We analyze the conditions under which the algorithm converges to the true solution in detail. A significant improvement in stability and convergence speed of the algorithm is observed in simple representative problems, GAN training, and domain-adaptation problems.'
volume: 80
URL: https://proceedings.mlr.press/v80/hamm18a.html
PDF: http://proceedings.mlr.press/v80/hamm18a/hamm18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hamm18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jihun
family: Hamm
- given: Yung-Kyun
family: Noh
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1881-1889
id: hamm18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1881
lastpage: 1889
published: 2018-07-03 00:00:00 +0000
- title: 'Candidates vs. Noises Estimation for Large Multi-Class Classification Problem'
abstract: 'This paper proposes a method for multi-class classification problems, where the number of classes K is large. The method, referred to as Candidates vs. Noises Estimation (CANE), selects a small subset of candidate classes and samples the remaining classes. We show that CANE is always consistent and computationally efficient. Moreover, the resulting estimator has low statistical variance approaching that of the maximum likelihood estimator, when the observed label belongs to the selected candidates with high probability. In practice, we use a tree structure with leaves as classes to promote fast beam search for candidate selection. We further apply the CANE method to estimate word probabilities in learning large neural language models. Extensive experimental results show that CANE achieves better prediction accuracy over the Noise-Contrastive Estimation (NCE), its variants and a number of the state-of-the-art tree classifiers, while it gains significant speedup compared to standard O(K) methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/han18a.html
PDF: http://proceedings.mlr.press/v80/han18a/han18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-han18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lei
family: Han
- given: Yiheng
family: Huang
- given: Tong
family: Zhang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1890-1899
id: han18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1890
lastpage: 1899
published: 2018-07-03 00:00:00 +0000
- title: 'Stein Variational Gradient Descent Without Gradient'
abstract: 'Stein variational gradient decent (SVGD) has been shown to be a powerful approximate inference algorithm for complex distributions. However, the standard SVGD requires calculating the gradient of the target density and cannot be applied when the gradient is unavailable. In this work, we develop a gradient-free variant of SVGD (GF-SVGD), which replaces the true gradient with a surrogate gradient, and corrects the introduced bias by re-weighting the gradients in a proper form. We show that our GF-SVGD can be viewed as the standard SVGD with a special choice of kernel, and hence directly inherits all the theoretical properties of SVGD. We shed insights on the empirical choice of the surrogate gradient and further, propose an annealed GF-SVGD that consistently outperforms a number of recent advanced gradient-free MCMC methods in our empirical studies.'
volume: 80
URL: https://proceedings.mlr.press/v80/han18b.html
PDF: http://proceedings.mlr.press/v80/han18b/han18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-han18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jun
family: Han
- given: Qiang
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1900-1908
id: han18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1900
lastpage: 1908
published: 2018-07-03 00:00:00 +0000
- title: 'Deep Models of Interactions Across Sets'
abstract: 'We use deep learning to model interactions across two or more sets of objects, such as user{–}movie ratings or protein{–}drug bindings. The canonical representation of such interactions is a matrix (or tensor) with an exchangeability property: the encoding’s meaning is not changed by permuting rows or columns. We argue that models should hence be Permutation Equivariant (PE): constrained to make the same predictions across such permutations. We present a parameter-sharing scheme and prove that it is maximally expressive under the PE constraint. This scheme yields three benefits. First, we demonstrate performance competitive with the state of the art on multiple matrix completion benchmarks. Second, our models require a number of parameters independent of the numbers of objects and thus scale well to large datasets. Third, models can be queried about new objects that were not available at training time, but for which interactions have since been observed. We observed surprisingly good generalization performance on this matrix extrapolation task, both within domains (e.g., new users and new movies drawn from the same distribution used for training) and even across domains (e.g., predicting music ratings after training on movie ratings).'
volume: 80
URL: https://proceedings.mlr.press/v80/hartford18a.html
PDF: http://proceedings.mlr.press/v80/hartford18a/hartford18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hartford18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jason
family: Hartford
- given: Devon
family: Graham
- given: Kevin
family: Leyton-Brown
- given: Siamak
family: Ravanbakhsh
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1909-1918
id: hartford18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1909
lastpage: 1918
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Memory Access Patterns'
abstract: 'The explosion in workload complexity and the recent slow-down in Moore’s law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations; augmenting or replacing traditional heuristics and data structures. However, the space of machine learning for computer hardware architecture is only lightly explored. In this paper, we demonstrate the potential of deep learning to address the von Neumann bottleneck of memory performance. We focus on the critical problem of learning memory access patterns, with the goal of constructing accurate and efficient memory prefetchers. We relate contemporary prefetching strategies to n-gram models in natural language processing, and show how recurrent neural networks can serve as a drop-in replacement. On a suite of challenging benchmark datasets, we find that neural networks consistently demonstrate superior performance in terms of precision and recall. This work represents the first step towards practical neural-network based prefetching, and opens a wide range of exciting directions for machine learning in computer architecture research.'
volume: 80
URL: https://proceedings.mlr.press/v80/hashemi18a.html
PDF: http://proceedings.mlr.press/v80/hashemi18a/hashemi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hashemi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Milad
family: Hashemi
- given: Kevin
family: Swersky
- given: Jamie
family: Smith
- given: Grant
family: Ayers
- given: Heiner
family: Litz
- given: Jichuan
family: Chang
- given: Christos
family: Kozyrakis
- given: Parthasarathy
family: Ranganathan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1919-1928
id: hashemi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1919
lastpage: 1928
published: 2018-07-03 00:00:00 +0000
- title: 'Fairness Without Demographics in Repeated Loss Minimization'
abstract: 'Machine learning models (e.g., speech recognizers) trained on average loss suffer from representation disparity—minority groups (e.g., non-native speakers) carry less weight in the training objective, and thus tend to suffer higher loss. Worse, as model accuracy affects user retention, a minority group can shrink over time. In this paper, we first show that the status quo of empirical risk minimization (ERM) amplifies representation disparity over time, which can even turn initially fair models unfair. To mitigate this, we develop an approach based on distributionally robust optimization (DRO), which minimizes the worst case risk over all distributions close to the empirical distribution. We prove that this approach controls the risk of the minority group at each time step, in the spirit of Rawlsian distributive justice, while remaining oblivious to the identity of the groups. We demonstrate that DRO prevents disparity amplification on examples where ERM fails, and show improvements in minority group user satisfaction in a real-world text autocomplete task.'
volume: 80
URL: https://proceedings.mlr.press/v80/hashimoto18a.html
PDF: http://proceedings.mlr.press/v80/hashimoto18a/hashimoto18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hashimoto18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tatsunori
family: Hashimoto
- given: Megha
family: Srivastava
- given: Hongseok
family: Namkoong
- given: Percy
family: Liang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1929-1938
id: hashimoto18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1929
lastpage: 1938
published: 2018-07-03 00:00:00 +0000
- title: 'Multicalibration: Calibration for the (Computationally-Identifiable) Masses'
abstract: 'We develop and study multicalibration as a new measure of fairness in machine learning that aims to mitigate inadvertent or malicious discrimination that is introduced at training time (even from ground truth data). Multicalibration guarantees meaningful (calibrated) predictions for every subpopulation that can be identified within a specified class of computations. The specified class can be quite rich; in particular, it can contain many overlapping subgroups of a protected group. We demonstrate that in many settings this strong notion of protection from discrimination is provably attainable and aligned with the goal of obtaining accurate predictions. Along the way, we present algorithms for learning a multicalibrated predictor, study the computational complexity of this task, and illustrate tight connections to the agnostic learning model.'
volume: 80
URL: https://proceedings.mlr.press/v80/hebert-johnson18a.html
PDF: http://proceedings.mlr.press/v80/hebert-johnson18a/hebert-johnson18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hebert-johnson18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ursula
family: Hebert-Johnson
- given: Michael
family: Kim
- given: Omer
family: Reingold
- given: Guy
family: Rothblum
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1939-1948
id: hebert-johnson18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1939
lastpage: 1948
published: 2018-07-03 00:00:00 +0000
- title: 'Recurrent Predictive State Policy Networks'
abstract: 'We introduce Recurrent Predictive State Policy(RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially ob-servable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environment, and a reactive policy that directly maps beliefs to actions, to maximize the cumulative reward. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz & Gordon, 2004; Sun et al., 2016) by modeling predictive state{—}a prediction of the distribution of future observations conditioned on history and future actions.This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al.,2017) to initialize the recursive filter. Predictive stats serves as an equivalent representation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behavior. Moreover, we use the PSR interpretation during training as well, by incorporating prediction error in the loss function. The entire network (recursive filter and reactive policy) is still differentiable and can be trained using gradient-based methods. We optimize our policy using a combination of policy gradient based on rewards (Williams, 1992)and gradient descent based on prediction error.We show the efficacy of RPSP-networks on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method.'
volume: 80
URL: https://proceedings.mlr.press/v80/hefny18a.html
PDF: http://proceedings.mlr.press/v80/hefny18a/hefny18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hefny18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ahmed
family: Hefny
- given: Zita
family: Marinho
- given: Wen
family: Sun
- given: Siddhartha
family: Srinivasa
- given: Geoffrey
family: Gordon
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1949-1958
id: hefny18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1949
lastpage: 1958
published: 2018-07-03 00:00:00 +0000
- title: 'Learning unknown ODE models with Gaussian processes'
abstract: 'In conventional ODE modelling coefficients of an equation driving the system state forward in time are estimated. However, for many complex systems it is practically impossible to determine the equations or interactions governing the underlying dynamics. In these settings, parametric ODE model cannot be formulated. Here, we overcome this issue by introducing a novel paradigm of nonparametric ODE modelling that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We propose to learn non-linear, unknown differential functions from state observations using Gaussian process vector fields within the exact ODE formalism. We demonstrate the model’s capabilities to infer dynamics from sparse data and to simulate the system forward into future.'
volume: 80
URL: https://proceedings.mlr.press/v80/heinonen18a.html
PDF: http://proceedings.mlr.press/v80/heinonen18a/heinonen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-heinonen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Markus
family: Heinonen
- given: Cagatay
family: Yildiz
- given: Henrik
family: Mannerström
- given: Jukka
family: Intosalmi
- given: Harri
family: Lähdesmäki
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1959-1968
id: heinonen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1959
lastpage: 1968
published: 2018-07-03 00:00:00 +0000
- title: 'Orthogonal Recurrent Neural Networks with Scaled Cayley Transform'
abstract: 'Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices without using complex valued matrices. This is done by parametrizing with a skew-symmetric matrix using the Cayley transform; such a parametrization is unable to represent matrices with negative one eigenvalues, but this limitation is overcome by scaling the recurrent weight matrix by a diagonal matrix consisting of ones and negative ones. The proposed training scheme involves a straightforward gradient calculation and update step. In several experiments, the proposed scaled Cayley orthogonal recurrent neural network (scoRNN) achieves superior results with fewer trainable parameters than other unitary RNNs.'
volume: 80
URL: https://proceedings.mlr.press/v80/helfrich18a.html
PDF: http://proceedings.mlr.press/v80/helfrich18a/helfrich18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-helfrich18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kyle
family: Helfrich
- given: Devin
family: Willmott
- given: Qiang
family: Ye
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1969-1978
id: helfrich18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1969
lastpage: 1978
published: 2018-07-03 00:00:00 +0000
- title: 'Fast Bellman Updates for Robust MDPs'
abstract: 'We describe two efficient, and exact, algorithms for computing Bellman updates in robust Markov decision processes (MDPs). The first algorithm uses a homotopy continuation method to compute updates for L1-constrained s,a-rectangular ambiguity sets. It runs in quasi-linear time for plain L1-norms and also generalizes to weighted L1-norms. The second algorithm uses bisection to compute updates for robust MDPs with s-rectangular ambiguity sets. This algorithm, when combined with the homotopy method, also has a quasi-linear runtime. Unlike previous methods, our algorithms compute the primal solution in addition to the optimal objective value, which makes them useful in policy iteration methods. Our experimental results indicate that the proposed methods are over 1,000 times faster than Gurobi, a state-of-the-art commercial optimization package, for small instances, and the performance gap grows considerably with problem size.'
volume: 80
URL: https://proceedings.mlr.press/v80/ho18a.html
PDF: http://proceedings.mlr.press/v80/ho18a/ho18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ho18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chin Pang
family: Ho
- given: Marek
family: Petrik
- given: Wolfram
family: Wiesemann
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1979-1988
id: ho18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1979
lastpage: 1988
published: 2018-07-03 00:00:00 +0000
- title: 'CyCADA: Cycle-Consistent Adversarial Domain Adaptation'
abstract: 'Domain adaptation is critical for success in new, unseen environments. Adversarial adaptation models have shown tremendous progress towards adapting to new environments by focusing either on discovering domain invariant representations or by mapping between unpaired image domains. While feature space methods are difficult to interpret and sometimes fail to capture pixel-level and low-level domain shifts, image space methods sometimes fail to incorporate high level semantic knowledge relevant for the end task. We propose a model which adapts between domains using both generative image space alignment and latent representation space alignment. Our approach, Cycle-Consistent Adversarial Domain Adaptation (CyCADA), guides transfer between domains according to a specific discriminatively trained task and avoids divergence by enforcing consistency of the relevant semantics before and after adaptation. We evaluate our method on a variety of visual recognition and prediction settings, including digit classification and semantic segmentation of road scenes, advancing state-of-the-art performance for unsupervised adaptation from synthetic to real world driving domains.'
volume: 80
URL: https://proceedings.mlr.press/v80/hoffman18a.html
PDF: http://proceedings.mlr.press/v80/hoffman18a/hoffman18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hoffman18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Judy
family: Hoffman
- given: Eric
family: Tzeng
- given: Taesung
family: Park
- given: Jun-Yan
family: Zhu
- given: Phillip
family: Isola
- given: Kate
family: Saenko
- given: Alexei
family: Efros
- given: Trevor
family: Darrell
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1989-1998
id: hoffman18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1989
lastpage: 1998
published: 2018-07-03 00:00:00 +0000
- title: 'Sound Abstraction and Decomposition of Probabilistic Programs'
abstract: 'Probabilistic programming languages are a flexible tool for specifying statistical models, but this flexibility comes at the cost of efficient analysis. It is currently difficult to compactly represent the subtle independence properties of a probabilistic program, and exploit independence properties to decompose inference. Classical graphical model abstractions do capture some properties of the underlying distribution, enabling inference algorithms to operate at the level of the graph topology. However, we observe that graph-based abstractions are often too coarse to capture interesting properties of programs. We propose a form of sound abstraction for probabilistic programs wherein the abstractions are themselves simplified programs. We provide a theoretical foundation for these abstractions, as well as an algorithm to generate them. Experimentally, we also illustrate the practical benefits of our framework as a tool to decompose probabilistic program inference.'
volume: 80
URL: https://proceedings.mlr.press/v80/holtzen18a.html
PDF: http://proceedings.mlr.press/v80/holtzen18a/holtzen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-holtzen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Steven
family: Holtzen
- given: Guy
family: Broeck
- given: Todd
family: Millstein
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 1999-2008
id: holtzen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 1999
lastpage: 2008
published: 2018-07-03 00:00:00 +0000
- title: 'Gradient Primal-Dual Algorithm Converges to Second-Order Stationary Solution for Nonconvex Distributed Optimization Over Networks'
abstract: 'In this work, we study two first-order primal-dual based algorithms, the Gradient Primal-Dual Algorithm (GPDA) and the Gradient Alternating Direction Method of Multipliers (GADMM), for solving a class of linearly constrained non-convex optimization problems. We show that with random initialization of the primal and dual variables, both algorithms are able to compute second-order stationary solutions (ss2) with probability one. This is the first result showing that primal-dual algorithm is capable of finding ss2 when only using first-order information; it also extends the existing results for first-order, but {primal-only} algorithms. An important implication of our result is that it also gives rise to the first global convergence result to the ss2, for two classes of unconstrained distributed non-convex learning problems over multi-agent networks.'
volume: 80
URL: https://proceedings.mlr.press/v80/hong18a.html
PDF: http://proceedings.mlr.press/v80/hong18a/hong18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hong18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mingyi
family: Hong
- given: Meisam
family: Razaviyayn
- given: Jason
family: Lee
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2009-2018
id: hong18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2009
lastpage: 2018
published: 2018-07-03 00:00:00 +0000
- title: 'Variational Bayesian dropout: pitfalls and fixes'
abstract: 'Dropout, a stochastic regularisation technique for training of neural networks, has recently been reinterpreted as a specific type of approximate inference algorithm for Bayesian neural networks. The main contribution of the reinterpretation is in providing a theoretical framework useful for analysing and extending the algorithm. We show that the proposed framework suffers from several issues; from undefined or pathological behaviour of the true posterior related to use of improper priors, to an ill-defined variational objective due to singularity of the approximating distribution relative to the true posterior. Our analysis of the improper log uniform prior used in variational Gaussian dropout suggests the pathologies are generally irredeemable, and that the algorithm still works only because the variational formulation annuls some of the pathologies. To address the singularity issue, we proffer Quasi-KL (QKL) divergence, a new approximate inference objective for approximation of high-dimensional distributions. We show that motivations for variational Bernoulli dropout based on discretisation and noise have QKL as a limit. Properties of QKL are studied both theoretically and on a simple practical example which shows that the QKL-optimal approximation of a full rank Gaussian with a degenerate one naturally leads to the Principal Component Analysis solution.'
volume: 80
URL: https://proceedings.mlr.press/v80/hron18a.html
PDF: http://proceedings.mlr.press/v80/hron18a/hron18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hron18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiri
family: Hron
- given: Alex
family: Matthews
- given: Zoubin
family: Ghahramani
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2019-2028
id: hron18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2019
lastpage: 2028
published: 2018-07-03 00:00:00 +0000
- title: 'Does Distributionally Robust Supervised Learning Give Robust Classifiers?'
abstract: 'Distributionally Robust Supervised Learning (DRSL) is necessary for building reliable machine learning systems. When machine learning is deployed in the real world, its performance can be significantly degraded because test data may follow a different distribution from training data. DRSL with f-divergences explicitly considers the worst-case distribution shift by minimizing the adversarially reweighted training loss. In this paper, we analyze this DRSL, focusing on the classification scenario. Since the DRSL is explicitly formulated for a distribution shift scenario, we naturally expect it to give a robust classifier that can aggressively handle shifted distributions. However, surprisingly, we prove that the DRSL just ends up giving a classifier that exactly fits the given training distribution, which is too pessimistic. This pessimism comes from two sources: the particular losses used in classification and the fact that the variety of distributions to which the DRSL tries to be robust is too wide. Motivated by our analysis, we propose simple DRSL that overcomes this pessimism and empirically demonstrate its effectiveness.'
volume: 80
URL: https://proceedings.mlr.press/v80/hu18a.html
PDF: http://proceedings.mlr.press/v80/hu18a/hu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Weihua
family: Hu
- given: Gang
family: Niu
- given: Issei
family: Sato
- given: Masashi
family: Sugiyama
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2029-2037
id: hu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2029
lastpage: 2037
published: 2018-07-03 00:00:00 +0000
- title: 'Dissipativity Theory for Accelerating Stochastic Variance Reduction: A Unified Analysis of SVRG and Katyusha Using Semidefinite Programs'
abstract: 'Techniques for reducing the variance of gradient estimates used in stochastic programming algorithms for convex finite-sum problems have received a great deal of attention in recent years. By leveraging dissipativity theory from control, we provide a new perspective on two important variance-reduction algorithms: SVRG and its direct accelerated variant Katyusha. Our perspective provides a physically intuitive understanding of the behavior of SVRG-like methods via a principle of energy conservation. The tools discussed here allow us to automate the convergence analysis of SVRG-like methods by capturing their essential properties in small semidefinite programs amenable to standard analysis and computational techniques. Our approach recovers existing convergence results for SVRG and Katyusha and generalizes the theory to alternative parameter choices. We also discuss how our approach complements the linear coupling technique. Our combination of perspectives leads to a better understanding of accelerated variance-reduced stochastic methods for finite-sum problems.'
volume: 80
URL: https://proceedings.mlr.press/v80/hu18b.html
PDF: http://proceedings.mlr.press/v80/hu18b/hu18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-hu18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bin
family: Hu
- given: Stephen
family: Wright
- given: Laurent
family: Lessard
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2038-2047
id: hu18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2038
lastpage: 2047
published: 2018-07-03 00:00:00 +0000
- title: 'Near Optimal Frequent Directions for Sketching Dense and Sparse Matrices'
abstract: 'Given a large matrix $A\in\real^{n\times d}$, we consider the problem of computing a sketch matrix $B\in\real^{\ell\times d}$ which is significantly smaller than but still well approximates $A$. We are interested in minimizing the *covariance error* $\norm{A^TA-B^TB}_2.$We consider the problems in the streaming model, where the algorithm can only make one pass over the input with limited working space. The popular Frequent Directions algorithm of Liberty (2013) and its variants achieve optimal space-error tradeoff. However, whether the running time can be improved remains an unanswered question.In this paper, we almost settle the time complexity of this problem. In particular, we provide new space-optimal algorithms with faster running times. Moreover, we also show that the running times of our algorithms are near-optimal unless the state-of-the-art running time of matrix multiplication can be improved significantly.'
volume: 80
URL: https://proceedings.mlr.press/v80/huang18a.html
PDF: http://proceedings.mlr.press/v80/huang18a/huang18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-huang18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zengfeng
family: Huang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2048-2057
id: huang18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2048
lastpage: 2057
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Deep ResNet Blocks Sequentially using Boosting Theory'
abstract: 'We prove a *multi-channel telescoping sum boosting* theory for the ResNet architectures which simultaneously creates a new technique for boosting over features (in contrast with labels) and provides a new algorithm for ResNet-style architectures. Our proposed training algorithm, *BoostResNet*, is particularly suitable in non-differentiable architectures. Our method only requires the relatively inexpensive sequential training of $T$ “shallow ResNets”. We prove that the training error decays exponentially with the depth $T$ if the weak module classifiers that we train perform slightly better than some weak baseline. In other words, we propose a weak learning condition and prove a boosting theory for ResNet under the weak learning condition. A generalization error bound based on margin theory is proved and suggests that ResNet could be resistant to overfitting using a network with $l_1$ norm bounded weights.'
volume: 80
URL: https://proceedings.mlr.press/v80/huang18b.html
PDF: http://proceedings.mlr.press/v80/huang18b/huang18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-huang18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Furong
family: Huang
- given: Jordan
family: Ash
- given: John
family: Langford
- given: Robert
family: Schapire
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2058-2067
id: huang18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2058
lastpage: 2067
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Hidden Markov Models from Pairwise Co-occurrences with Application to Topic Modeling'
abstract: 'We present a new algorithm for identifying the transition and emission probabilities of a hidden Markov model (HMM) from the emitted data. Expectation-maximization becomes computationally prohibitive for long observation records, which are often required for identification. The new algorithm is particularly suitable for cases where the available sample size is large enough to accurately estimate second-order output probabilities, but not higher-order ones. We show that if one is only able to obtain a reliable estimate of the pairwise co-occurrence probabilities of the emissions, it is still possible to uniquely identify the HMM if the emission probability is *sufficiently scattered*. We apply our method to hidden topic Markov modeling, and demonstrate that we can learn topics with higher quality if documents are modeled as observations of HMMs sharing the same emission (topic) probability, compared to the simple but widely used bag-of-words model.'
volume: 80
URL: https://proceedings.mlr.press/v80/huang18c.html
PDF: http://proceedings.mlr.press/v80/huang18c/huang18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-huang18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kejun
family: Huang
- given: Xiao
family: Fu
- given: Nicholas
family: Sidiropoulos
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2068-2077
id: huang18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2068
lastpage: 2077
published: 2018-07-03 00:00:00 +0000
- title: 'Neural Autoregressive Flows'
abstract: 'Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF) (Papamakarios et al., 2017), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time (Oord et al., 2017), via Inverse Autoregressive Flows (IAF) (Kingma et al., 2016). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST.'
volume: 80
URL: https://proceedings.mlr.press/v80/huang18d.html
PDF: http://proceedings.mlr.press/v80/huang18d/huang18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-huang18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chin-Wei
family: Huang
- given: David
family: Krueger
- given: Alexandre
family: Lacoste
- given: Aaron
family: Courville
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2078-2087
id: huang18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2078
lastpage: 2087
published: 2018-07-03 00:00:00 +0000
- title: 'Topological mixture estimation'
abstract: 'We introduce topological mixture estimation, a completely nonparametric and computationally efficient solution to the problem of estimating a one-dimensional mixture with generic unimodal components. We repeatedly perturb the unimodal decomposition of Baryshnikov and Ghrist to produce a topologically and information-theoretically optimal unimodal mixture. We also detail a smoothing process that optimally exploits topological persistence of the unimodal category in a natural way when working directly with sample data. Finally, we illustrate these techniques through examples.'
volume: 80
URL: https://proceedings.mlr.press/v80/huntsman18a.html
PDF: http://proceedings.mlr.press/v80/huntsman18a/huntsman18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-huntsman18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Steve
family: Huntsman
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2088-2097
id: huntsman18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2088
lastpage: 2097
published: 2018-07-03 00:00:00 +0000
- title: 'Decoupled Parallel Backpropagation with Convergence Guarantee'
abstract: 'Backpropagation algorithm is indispensable for the training of feedforward neural networks. It requires propagating error gradients sequentially from the output layer all the way back to the input layer. The backward locking in backpropagation algorithm constrains us from updating network layers in parallel and fully leveraging the computing resources. Recently, several algorithms have been proposed for breaking the backward locking. However, their performances degrade seriously when networks are deep. In this paper, we propose decoupled parallel backpropagation algorithm for deep learning optimization with convergence guarantee. Firstly, we decouple the backpropagation algorithm using delayed gradients, and show that the backward locking is removed when we split the networks into multiple modules. Then, we utilize decoupled parallel backpropagation in two stochastic methods and prove that our method guarantees convergence to critical points for the non-convex problem. Finally, we perform experiments for training deep convolutional neural networks on benchmark datasets. The experimental results not only confirm our theoretical analysis, but also demonstrate that the proposed method can achieve significant speedup without loss of accuracy.'
volume: 80
URL: https://proceedings.mlr.press/v80/huo18a.html
PDF: http://proceedings.mlr.press/v80/huo18a/huo18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-huo18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhouyuan
family: Huo
- given: Bin
family: Gu
- given:
family: Yang
- given: Heng
family: Huang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2098-2106
id: huo18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2098
lastpage: 2106
published: 2018-07-03 00:00:00 +0000
- title: 'Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning'
abstract: 'In this paper we propose Reward Machines {—} a type of finite state machine that supports the specification of reward functions while exposing reward function structure to the learner and supporting decomposition. We then present Q-Learning for Reward Machines (QRM), an algorithm which appropriately decomposes the reward machine and uses off-policy q-learning to simultaneously learn subpolicies for the different components. QRM is guaranteed to converge to an optimal policy in the tabular case, in contrast to Hierarchical Reinforcement Learning methods which might converge to suboptimal policies. We demonstrate this behavior experimentally in two discrete domains. We also show how function approximation methods like neural networks can be incorporated into QRM, and that doing so can find better policies more quickly than hierarchical methods in a domain with a continuous state space.'
volume: 80
URL: https://proceedings.mlr.press/v80/icarte18a.html
PDF: http://proceedings.mlr.press/v80/icarte18a/icarte18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-icarte18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rodrigo Toro
family: Icarte
- given: Toryn
family: Klassen
- given: Richard
family: Valenzano
- given: Sheila
family: McIlraith
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2107-2116
id: icarte18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2107
lastpage: 2116
published: 2018-07-03 00:00:00 +0000
- title: 'Deep Variational Reinforcement Learning for POMDPs'
abstract: 'Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown. Consequently, there is great need for reinforcement learning methods that can tackle such problems given only a stream of rewards and incomplete and noisy observations. In this paper, we propose deep variational reinforcement learning (DVRL), which introduces an inductive bias that allows an agent to learn a generative model of the environment and perform inference in that model to effectively aggregate the available information. We develop an n-step approximation to the evidence lower bound (ELBO), allowing the model to be trained jointly with the policy. This ensures that the latent state representation is suitable for the control task. In experiments on Mountain Hike and flickering Atari we show that our method outperforms previous approaches relying on recurrent neural networks to encode the past.'
volume: 80
URL: https://proceedings.mlr.press/v80/igl18a.html
PDF: http://proceedings.mlr.press/v80/igl18a/igl18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-igl18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maximilian
family: Igl
- given: Luisa
family: Zintgraf
- given: Tuan Anh
family: Le
- given: Frank
family: Wood
- given: Shimon
family: Whiteson
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2117-2126
id: igl18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2117
lastpage: 2126
published: 2018-07-03 00:00:00 +0000
- title: 'Attention-based Deep Multiple Instance Learning'
abstract: 'Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks. Furthermore, we propose a neural network-based permutation-invariant aggregation operator that corresponds to the attention mechanism. Notably, an application of the proposed attention-based operator provides insight into the contribution of each instance to the bag label. We show empirically that our approach achieves comparable performance to the best MIL methods on benchmark MIL datasets and it outperforms other methods on a MNIST-based MIL dataset and two real-life histopathology datasets without sacrificing interpretability.'
volume: 80
URL: https://proceedings.mlr.press/v80/ilse18a.html
PDF: http://proceedings.mlr.press/v80/ilse18a/ilse18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ilse18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maximilian
family: Ilse
- given: Jakub
family: Tomczak
- given: Max
family: Welling
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2127-2136
id: ilse18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2127
lastpage: 2136
published: 2018-07-03 00:00:00 +0000
- title: 'Black-box Adversarial Attacks with Limited Queries and Information'
abstract: 'Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model where the adversary can observe the full output of the network on arbitrarily many chosen inputs. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to break the Google Cloud Vision API.'
volume: 80
URL: https://proceedings.mlr.press/v80/ilyas18a.html
PDF: http://proceedings.mlr.press/v80/ilyas18a/ilyas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ilyas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrew
family: Ilyas
- given: Logan
family: Engstrom
- given: Anish
family: Athalye
- given: Jessy
family: Lin
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2137-2146
id: ilyas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2137
lastpage: 2146
published: 2018-07-03 00:00:00 +0000
- title: 'Analysis of Minimax Error Rate for Crowdsourcing and Its Application to Worker Clustering Model'
abstract: 'While crowdsourcing has become an important means to label data, there is great interest in estimating the ground truth from unreliable labels produced by crowdworkers. The Dawid and Skene (DS) model is one of the most well-known models in the study of crowdsourcing. Despite its practical popularity, theoretical error analysis for the DS model has been conducted only under restrictive assumptions on class priors, confusion matrices, or the number of labels each worker provides. In this paper, we derive a minimax error rate under more practical setting for a broader class of crowdsourcing models including the DS model as a special case. We further propose the worker clustering model, which is more practical than the DS model under real crowdsourcing settings. The wide applicability of our theoretical analysis allows us to immediately investigate the behavior of this proposed model, which can not be analyzed by existing studies. Experimental results showed that there is a strong similarity between the lower bound of the minimax error rate derived by our theoretical analysis and the empirical error of the estimated value.'
volume: 80
URL: https://proceedings.mlr.press/v80/imamura18a.html
PDF: http://proceedings.mlr.press/v80/imamura18a/imamura18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-imamura18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hideaki
family: Imamura
- given: Issei
family: Sato
- given: Masashi
family: Sugiyama
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2147-2156
id: imamura18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2147
lastpage: 2156
published: 2018-07-03 00:00:00 +0000
- title: 'Improving Regression Performance with Distributional Losses'
abstract: 'There is growing evidence that converting targets to soft targets in supervised learning can provide considerable gains in performance. Much of this work has considered classification, converting hard zero-one values to soft labels—such as by adding label noise, incorporating label ambiguity or using distillation. In parallel, there is some evidence from a regression setting in reinforcement learning that learning distributions can improve performance. In this work, we investigate the reasons for this improvement, in a regression setting. We introduce a novel distributional regression loss, and similarly find it significantly improves prediction accuracy. We investigate several common hypotheses, around reducing overfitting and improved representations. We instead find evidence for an alternative hypothesis: this loss is easier to optimize, with better behaved gradients, resulting in improved generalization. We provide theoretical support for this alternative hypothesis, by characterizing the norm of the gradients of this loss.'
volume: 80
URL: https://proceedings.mlr.press/v80/imani18a.html
PDF: http://proceedings.mlr.press/v80/imani18a/imani18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-imani18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ehsan
family: Imani
- given: Martha
family: White
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2157-2166
id: imani18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2157
lastpage: 2166
published: 2018-07-03 00:00:00 +0000
- title: 'Deep Density Destructors'
abstract: 'We propose a unified framework for deep density models by formally defining density destructors. A density destructor is an invertible function that transforms a given density to the uniform density—essentially destroying any structure in the original density. This destructive transformation generalizes Gaussianization via ICA and more recent autoregressive models such as MAF and Real NVP. Informally, this transformation can be seen as a generalized whitening procedure or a multivariate generalization of the univariate CDF function. Unlike Gaussianization, our destructive transformation has the elegant property that the density function is equal to the absolute value of the Jacobian determinant. Thus, each layer of a deep density can be seen as a shallow density—uncovering a fundamental connection between shallow and deep densities. In addition, our framework provides a common interface for all previous methods enabling them to be systematically combined, evaluated and improved. Leveraging the connection to shallow densities, we also propose a novel tree destructor based on tree densities and an image-specific destructor based on pixel locality. We illustrate our framework on a 2D dataset, MNIST, and CIFAR-10. Code is available on first author’s website.'
volume: 80
URL: https://proceedings.mlr.press/v80/inouye18a.html
PDF: http://proceedings.mlr.press/v80/inouye18a/inouye18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-inouye18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Inouye
- given: Pradeep
family: Ravikumar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2167-2175
id: inouye18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2167
lastpage: 2175
published: 2018-07-03 00:00:00 +0000
- title: 'Unbiased Objective Estimation in Predictive Optimization'
abstract: 'For data-driven decision-making, one promising approach, called predictive optimization, is to solve maximization problems i n which the objective function to be maximized is estimated from data. Predictive optimization, however, suffers from the problem of a calculated optimal solution’s being evaluated too optimistically, i.e., the value of the objective function is overestimated. This paper investigates such optimistic bias and presents two methods for correcting it. The first, which is analogous to cross-validation, successfully corrects the optimistic bias but results in underestimation of the true value. Our second method employs resampling techniques to avoid both overestimation and underestimation. We show that the second method, referred to as the parameter perturbation method, achieves asymptotically unbiased estimation. Empirical results for both artificial and real-world datasets demonstrate that our proposed approach successfully corrects the optimistic bias.'
volume: 80
URL: https://proceedings.mlr.press/v80/ito18a.html
PDF: http://proceedings.mlr.press/v80/ito18a/ito18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ito18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shinji
family: Ito
- given: Akihiro
family: Yabe
- given: Ryohei
family: Fujimaki
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2176-2185
id: ito18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2176
lastpage: 2185
published: 2018-07-03 00:00:00 +0000
- title: 'Anonymous Walk Embeddings'
abstract: 'The task of representing entire graphs has seen a surge of prominent results, mainly due to learning convolutional neural networks (CNNs) on graph-structured data. While CNNs demonstrate state-of-the-art performance in graph classification task, such methods are supervised and therefore steer away from the original problem of network representation in task-agnostic manner. Here, we coherently propose an approach for embedding entire graphs and show that our feature representations with SVM classifier increase classification accuracy of CNN algorithms and traditional graph kernels. For this we describe a recently discovered graph object, *anonymous walk*, on which we design task-independent algorithms for learning graph representations in explicit and distributed way. Overall, our work represents a new scalable unsupervised learning of state-of-the-art representations of entire graphs.'
volume: 80
URL: https://proceedings.mlr.press/v80/ivanov18a.html
PDF: http://proceedings.mlr.press/v80/ivanov18a/ivanov18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ivanov18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sergey
family: Ivanov
- given: Evgeny
family: Burnaev
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2186-2195
id: ivanov18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2186
lastpage: 2195
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Binary Latent Variable Models: A Tensor Eigenpair Approach'
abstract: 'Latent variable models with hidden binary units appear in various applications. Learning such models, in particular in the presence of noise, is a challenging computational problem. In this paper we propose a novel spectral approach to this problem, based on the eigenvectors of both the second order moment matrix and third order moment tensor of the observed data. We prove that under mild non-degeneracy conditions, our method consistently estimates the model parameters at the optimal parametric rate. Our tensor-based method generalizes previous orthogonal tensor decomposition approaches, where the hidden units were assumed to be either statistically independent or mutually exclusive. We illustrate the consistency of our method on simulated data and demonstrate its usefulness in learning a common model for population mixtures in genetics.'
volume: 80
URL: https://proceedings.mlr.press/v80/jaffe18a.html
PDF: http://proceedings.mlr.press/v80/jaffe18a/jaffe18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jaffe18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ariel
family: Jaffe
- given: Roi
family: Weiss
- given: Boaz
family: Nadler
- given: Shai
family: Carmi
- given: Yuval
family: Kluger
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2196-2205
id: jaffe18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2196
lastpage: 2205
published: 2018-07-03 00:00:00 +0000
- title: 'Firing Bandits: Optimizing Crowdfunding'
abstract: 'In this paper, we model the problem of optimizing crowdfunding platforms, such as the non-profit Kiva or for-profit KickStarter, as a variant of the multi-armed bandit problem. In our setting, Bernoulli arms emit no rewards until their cumulative number of successes over any number of trials exceeds a fixed threshold and then provides no additional reward for any additional trials - a process reminiscent to that of a neuron firing once it reaches the action potential and then saturates. In the spirit of an infinite armed bandit problem, the player can add new arms whose expected probability of success is drawn iid from an unknown distribution – this endless supply of projects models the harsh reality that the number of projects seeking funding greatly exceeds the total capital available by lenders. Crowdfunding platforms naturally fall under this setting where the arms are potential projects, and their probability of success is the probability that a potential funder decides to fund it after reviewing it. The goal is to play arms (prioritize the display of projects on a webpage) to maximize the number of arms that reach the firing threshold (meet their goal amount) using as few total trials (number of impressions) as possible over all the played arms. We provide an algorithm for this setting and prove sublinear regret bounds.'
volume: 80
URL: https://proceedings.mlr.press/v80/jain18a.html
PDF: http://proceedings.mlr.press/v80/jain18a/jain18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jain18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lalit
family: Jain
- given: Kevin
family: Jamieson
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2206-2214
id: jain18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2206
lastpage: 2214
published: 2018-07-03 00:00:00 +0000
- title: 'Differentially Private Matrix Completion Revisited'
abstract: 'We provide the first provably joint differentially private algorithm with formal utility guarantees for the problem of user-level privacy-preserving collaborative filtering. Our algorithm is based on the Frank-Wolfe method, and it consistently estimates the underlying preference matrix as long as the number of users $m$ is $\omega(n^{5/4})$, where $n$ is the number of items, and each user provides her preference for at least $\sqrt{n}$ randomly selected items. Along the way, we provide an optimal differentially private algorithm for singular vector computation, based on the celebrated Oja’s method, that provides significant savings in terms of space and time while operating on sparse matrices. We also empirically evaluate our algorithm on a suite of datasets, and show that it consistently outperforms the state-of-the-art private algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/jain18b.html
PDF: http://proceedings.mlr.press/v80/jain18b/jain18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jain18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Prateek
family: Jain
- given: Om Dipakbhai
family: Thakkar
- given: Abhradeep
family: Thakurta
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2215-2224
id: jain18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2215
lastpage: 2224
published: 2018-07-03 00:00:00 +0000
- title: 'Video Prediction with Appearance and Motion Conditions'
abstract: 'Video prediction aims to generate realistic future frames by learning dynamic visual patterns. One fundamental challenge is to deal with future uncertainty: How should a model behave when there are multiple correct, equally probable future? We propose an Appearance-Motion Conditional GAN to address this challenge. We provide appearance and motion information as conditions that specify how the future may look like, reducing the level of uncertainty. Our model consists of a generator, two discriminators taking charge of appearance and motion pathways, and a perceptual ranking module that encourages videos of similar conditions to look similar. To train our model, we develop a novel conditioning scheme that consists of different combinations of appearance and motion conditions. We evaluate our model using facial expression and human action datasets and report favorable results compared to existing methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/jang18a.html
PDF: http://proceedings.mlr.press/v80/jang18a/jang18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jang18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yunseok
family: Jang
- given: Gunhee
family: Kim
- given: Yale
family: Song
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2225-2234
id: jang18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2225
lastpage: 2234
published: 2018-07-03 00:00:00 +0000
- title: 'Pathwise Derivatives Beyond the Reparameterization Trick'
abstract: 'We observe that gradients computed via the reparameterization trick are in direct correspondence with solutions of the transport equation in the formalism of optimal transport. We use this perspective to compute (approximate) pathwise gradients for probability distributions not directly amenable to the reparameterization trick: Gamma, Beta, and Dirichlet. We further observe that when the reparameterization trick is applied to the Cholesky-factorized multivariate Normal distribution, the resulting gradients are suboptimal in the sense of optimal transport. We derive the optimal gradients and show that they have reduced variance in a Gaussian Process regression task. We demonstrate with a variety of synthetic experiments and stochastic variational inference tasks that our pathwise gradients are competitive with other methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/jankowiak18a.html
PDF: http://proceedings.mlr.press/v80/jankowiak18a/jankowiak18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jankowiak18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Martin
family: Jankowiak
- given: Fritz
family: Obermeyer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2235-2244
id: jankowiak18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2235
lastpage: 2244
published: 2018-07-03 00:00:00 +0000
- title: 'Detecting non-causal artifacts in multivariate linear regression models'
abstract: 'We consider linear models where d potential causes X_1,...,X_d are correlated with one target quantity Y and propose a method to infer whether the association is causal or whether it is an artifact caused by overfitting or hidden common causes. We employ the idea that in the former case the vector of regression coefficients has ‘generic’ orientation relative to the covariance matrix Sigma_{XX} of X. Using an ICA based model for confounding, we show that both confounding and overfitting yield regression vectors that concentrate mainly in the space of low eigenvalues of Sigma_{XX}.'
volume: 80
URL: https://proceedings.mlr.press/v80/janzing18a.html
PDF: http://proceedings.mlr.press/v80/janzing18a/janzing18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-janzing18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dominik
family: Janzing
- given: Bernhard
family: Schölkopf
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2245-2253
id: janzing18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2245
lastpage: 2253
published: 2018-07-03 00:00:00 +0000
- title: 'A Unified Framework for Structured Low-rank Matrix Learning'
abstract: 'We consider the problem of learning a low-rank matrix, constrained to lie in a linear subspace, and introduce a novel factorization for modeling such matrices. A salient feature of the proposed factorization scheme is it decouples the low-rank and the structural constraints onto separate factors. We formulate the optimization problem on the Riemannian spectrahedron manifold, where the Riemannian framework allows to develop computationally efficient conjugate gradient and trust-region algorithms. Experiments on problems such as standard/robust/non-negative matrix completion, Hankel matrix learning and multi-task learning demonstrate the efficacy of our approach.'
volume: 80
URL: https://proceedings.mlr.press/v80/jawanpuria18a.html
PDF: http://proceedings.mlr.press/v80/jawanpuria18a/jawanpuria18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jawanpuria18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pratik
family: Jawanpuria
- given: Bamdev
family: Mishra
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2254-2263
id: jawanpuria18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2254
lastpage: 2263
published: 2018-07-03 00:00:00 +0000
- title: 'Efficient end-to-end learning for quantizable representations'
abstract: 'Embedding representation learning via neural networks is at the core foundation of modern similarity based search. While much effort has been put in developing algorithms for learning binary hamming code representations for search efficiency, this still requires a linear scan of the entire dataset per each query and trades off the search accuracy through binarization. To this end, we consider the problem of directly learning a quantizable embedding representation and the sparse binary hash code end-to-end which can be used to construct an efficient hash table not only providing significant search reduction in the number of data but also achieving the state of the art search accuracy outperforming previous state of the art deep metric learning methods. We also show that finding the optimal sparse binary hash code in a mini-batch can be computed exactly in polynomial time by solving a minimum cost flow problem. Our results on Cifar-100 and on ImageNet datasets show the state of the art search accuracy in precision@k and NMI metrics while providing up to 98X and 478X search speedup respectively over exhaustive linear search. The source code is available at https://github.com/maestrojeong/Deep-Hash-Table-ICML18.'
volume: 80
URL: https://proceedings.mlr.press/v80/jeong18a.html
PDF: http://proceedings.mlr.press/v80/jeong18a/jeong18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jeong18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yeonwoo
family: Jeong
- given: Hyun Oh
family: Song
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2264-2273
id: jeong18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2264
lastpage: 2273
published: 2018-07-03 00:00:00 +0000
- title: 'Exploring Hidden Dimensions in Accelerating Convolutional Neural Networks'
abstract: 'The past few years have witnessed growth in the computational requirements for training deep convolutional neural networks. Current approaches parallelize training onto multiple devices by applying a single parallelization strategy (e.g., data or model parallelism) to all layers in a network. Although easy to reason about, these approaches result in suboptimal runtime performance in large-scale distributed training, since different layers in a network may prefer different parallelization strategies. In this paper, we propose layer-wise parallelism that allows each layer in a network to use an individual parallelization strategy. We jointly optimize how each layer is parallelized by solving a graph search problem. Our evaluation shows that layer-wise parallelism outperforms state-of-the-art approaches by increasing training throughput, reducing communication costs, achieving better scalability to multiple GPUs, while maintaining original network accuracy.'
volume: 80
URL: https://proceedings.mlr.press/v80/jia18a.html
PDF: http://proceedings.mlr.press/v80/jia18a/jia18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jia18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhihao
family: Jia
- given: Sina
family: Lin
- given: Charles R.
family: Qi
- given: Alex
family: Aiken
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2274-2283
id: jia18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2274
lastpage: 2283
published: 2018-07-03 00:00:00 +0000
- title: 'Feedback-Based Tree Search for Reinforcement Learning'
abstract: 'Inspired by recent successes of Monte-Carlo tree search (MCTS) in a number of artificial intelligence (AI) application domains, we propose a reinforcement learning (RL) technique that iteratively applies MCTS on batches of small, finite-horizon versions of the original infinite-horizon Markov decision process. The terminal condition of the finite-horizon problems, or the leaf-node evaluator of the decision tree generated by MCTS, is specified using a combination of an estimated value function and an estimated policy function. The recommendations generated by the MCTS procedure are then provided as feedback in order to refine, through classification and regression, the leaf-node evaluator for the next iteration. We provide the first sample complexity bounds for a tree search-based RL algorithm. In addition, we show that a deep neural network implementation of the technique can create a competitive AI agent for the popular multi-player online battle arena (MOBA) game King of Glory.'
volume: 80
URL: https://proceedings.mlr.press/v80/jiang18a.html
PDF: http://proceedings.mlr.press/v80/jiang18a/jiang18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jiang18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniel
family: Jiang
- given: Emmanuel
family: Ekwedike
- given: Han
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2284-2293
id: jiang18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2284
lastpage: 2293
published: 2018-07-03 00:00:00 +0000
- title: 'Quickshift++: Provably Good Initializations for Sample-Based Mean Shift'
abstract: 'We provide initial seedings to the Quick Shift clustering algorithm, which approximate the locally high-density regions of the data. Such seedings act as more stable and expressive cluster-cores than the singleton modes found by Quick Shift. We establish statistical consistency guarantees for this modification. We then show strong clustering performance on real datasets as well as promising applications to image segmentation.'
volume: 80
URL: https://proceedings.mlr.press/v80/jiang18b.html
PDF: http://proceedings.mlr.press/v80/jiang18b/jiang18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jiang18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Heinrich
family: Jiang
- given: Jennifer
family: Jang
- given: Samory
family: Kpotufe
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2294-2303
id: jiang18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2294
lastpage: 2303
published: 2018-07-03 00:00:00 +0000
- title: 'MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels'
abstract: 'Recent deep networks are capable of memorizing the entire data even when the labels are completely random. To overcome the overfitting on corrupted labels, we propose a novel technique of learning another neural network, called MentorNet, to supervise the training of the base deep networks, namely, StudentNet. During training, MentorNet provides a curriculum (sample weighting scheme) for StudentNet to focus on the sample the label of which is probably correct. Unlike the existing curriculum that is usually predefined by human experts, MentorNet learns a data-driven curriculum dynamically with StudentNet. Experimental results demonstrate that our approach can significantly improve the generalization performance of deep networks trained on corrupted training data. Notably, to the best of our knowledge, we achieve the best-published result on WebVision, a large benchmark containing 2.2 million images of real-world noisy labels.'
volume: 80
URL: https://proceedings.mlr.press/v80/jiang18c.html
PDF: http://proceedings.mlr.press/v80/jiang18c/jiang18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jiang18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lu
family: Jiang
- given: Zhengyuan
family: Zhou
- given: Thomas
family: Leung
- given: Li-Jia
family: Li
- given: Li
family: Fei-Fei
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2304-2313
id: jiang18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2304
lastpage: 2313
published: 2018-07-03 00:00:00 +0000
- title: 'The Weighted Kendall and High-order Kernels for Permutations'
abstract: 'We propose new positive definite kernels for permutations. First we introduce a weighted version of the Kendall kernel, which allows to weight unequally the contributions of different item pairs in the permutations depending on their ranks. Like the Kendall kernel, we show that the weighted version is invariant to relabeling of items and can be computed efficiently in O(n ln(n)) operations, where n is the number of items in the permutation. Second, we propose a supervised approach to learn the weights by jointly optimizing them with the function estimated by a kernel machine. Third, while the Kendall kernel considers pairwise comparison between items, we extend it by considering higher-order comparisons among tuples of items and show that the supervised approach of learning the weights can be systematically generalized to higher-order permutation kernels.'
volume: 80
URL: https://proceedings.mlr.press/v80/jiao18a.html
PDF: http://proceedings.mlr.press/v80/jiao18a/jiao18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jiao18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yunlong
family: Jiao
- given: Jean-Philippe
family: Vert
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2314-2322
id: jiao18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2314
lastpage: 2322
published: 2018-07-03 00:00:00 +0000
- title: 'Junction Tree Variational Autoencoder for Molecular Graph Generation'
abstract: 'We seek to automate the design of molecules based on specific chemical properties. In computational terms, this task involves continuous embedding and generation of molecular graphs. Our primary contribution is the direct realization of molecular graphs, a task previously approached by generating linear SMILES strings instead of graphs. Our junction tree variational autoencoder generates molecular graphs in two phases, by first generating a tree-structured scaffold over chemical substructures, and then combining them into a molecule with a graph message passing network. This approach allows us to incrementally expand molecules while maintaining chemical validity at every step. We evaluate our model on multiple tasks ranging from molecular generation to optimization. Across these tasks, our model outperforms previous state-of-the-art baselines by a significant margin.'
volume: 80
URL: https://proceedings.mlr.press/v80/jin18a.html
PDF: http://proceedings.mlr.press/v80/jin18a/jin18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jin18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wengong
family: Jin
- given: Regina
family: Barzilay
- given: Tommi
family: Jaakkola
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2323-2332
id: jin18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2323
lastpage: 2332
published: 2018-07-03 00:00:00 +0000
- title: 'Network Global Testing by Counting Graphlets'
abstract: 'Consider a large social network with possibly severe degree heterogeneity and mixed-memberships. We are interested in testing whether the network has only one community or there are more than one communities. The problem is known to be non-trivial, partially due to the presence of severe degree heterogeneity. We construct a class of test statistics using the numbers of short paths and short cycles, and the key to our approach is a general framework for canceling the effects of degree heterogeneity. The tests compare favorably with existing methods. We support our methods with careful analysis and numerical study with simulated data and a real data example.'
volume: 80
URL: https://proceedings.mlr.press/v80/jin18b.html
PDF: http://proceedings.mlr.press/v80/jin18b/jin18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jin18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiashun
family: Jin
- given: Zheng
family: Ke
- given: Shengming
family: Luo
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2333-2341
id: jin18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2333
lastpage: 2341
published: 2018-07-03 00:00:00 +0000
- title: 'Regret Minimization for Partially Observable Deep Reinforcement Learning'
abstract: 'Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strategies from raw image pixels. However, algorithms that estimate state and state-action value functions typically assume a fully observed state and must compensate for partial observations by using finite length observation histories or recurrent networks. In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an approximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.'
volume: 80
URL: https://proceedings.mlr.press/v80/jin18c.html
PDF: http://proceedings.mlr.press/v80/jin18c/jin18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jin18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Peter
family: Jin
- given: Kurt
family: Keutzer
- given: Sergey
family: Levine
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2342-2351
id: jin18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2342
lastpage: 2351
published: 2018-07-03 00:00:00 +0000
- title: 'WSNet: Compact and Efficient Networks Through Weight Sampling'
abstract: 'We present a new approach and a novel architecture, termed WSNet, for learning compact and efficient deep neural networks. Existing approaches conventionally learn full model parameters independently and then compress them via ad hoc processing such as model pruning or filter factorization. Alternatively, WSNet proposes learning model parameters by sampling from a compact set of learnable parameters, which naturally enforces parameter sharing throughout the learning process. We demonstrate that such a novel weight sampling approach (and induced WSNet) promotes both weights and computation sharing favorably. By employing this method, we can more efficiently learn much smaller networks with competitive performance compared to baseline networks with equal numbers of convolution filters. Specifically, we consider learning compact and efficient 1D convolutional neural networks for audio classification. Extensive experiments on multiple audio classification datasets verify the effectiveness of WSNet. Combined with weight quantization, the resulted models are up to 180x smaller and theoretically up to 16x faster than the well-established baselines, without noticeable performance drop.'
volume: 80
URL: https://proceedings.mlr.press/v80/jin18d.html
PDF: http://proceedings.mlr.press/v80/jin18d/jin18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jin18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xiaojie
family: Jin
- given: Yingzhen
family: Yang
- given: Ning
family: Xu
- given: Jianchao
family: Yang
- given: Nebojsa
family: Jojic
- given: Jiashi
family: Feng
- given: Shuicheng
family: Yan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2352-2361
id: jin18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2352
lastpage: 2361
published: 2018-07-03 00:00:00 +0000
- title: 'Large-Scale Cox Process Inference using Variational Fourier Features'
abstract: 'Gaussian process modulated Poisson processes provide a flexible framework for modeling spatiotemporal point patterns. So far this had been restricted to one dimension, binning to a pre-determined grid, or small data sets of up to a few thousand data points. Here we introduce Cox process inference based on Fourier features. This sparse representation induces global rather than local constraints on the function space and is computationally efficient. This allows us to formulate a grid-free approximation that scales well with the number of data points and the size of the domain. We demonstrate that this allows MCMC approximations to the non-Gaussian posterior. In practice, we find that Fourier features have more consistent optimization behavior than previous approaches. Our approximate Bayesian method can fit over 100 000 events with complex spatiotemporal patterns in three dimensions on a single GPU.'
volume: 80
URL: https://proceedings.mlr.press/v80/john18a.html
PDF: http://proceedings.mlr.press/v80/john18a/john18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-john18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: ST
family: John
- given: James
family: Hensman
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2362-2370
id: john18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2362
lastpage: 2370
published: 2018-07-03 00:00:00 +0000
- title: 'Composite Functional Gradient Learning of Generative Adversarial Models'
abstract: 'This paper first presents a theory for generative adversarial methods that does not rely on the traditional minimax formulation. It shows that with a strong discriminator, a good generator can be learned so that the KL divergence between the distributions of real data and generated data improves after each functional gradient step until it converges to zero. Based on the theory, we propose a new stable generative adversarial method. A theoretical insight into the original GAN from this new viewpoint is also provided. The experiments on image generation show the effectiveness of our new method.'
volume: 80
URL: https://proceedings.mlr.press/v80/johnson18a.html
PDF: http://proceedings.mlr.press/v80/johnson18a/johnson18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-johnson18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rie
family: Johnson
- given: Tong
family: Zhang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2371-2379
id: johnson18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2371
lastpage: 2379
published: 2018-07-03 00:00:00 +0000
- title: 'Kronecker Recurrent Units'
abstract: 'Our work addresses two important issues with recurrent neural networks: (1) they are over-parametrized, and (2) the recurrent weight matrix is ill-conditioned. The former increases the sample complexity of learning and the training time. The latter causes the vanishing and exploding gradient problem. We present a flexible recurrent neural network model called Kronecker Recurrent Units (KRU). KRU achieves parameter efficiency in RNNs through a Kronecker factored recurrent matrix. It overcomes the ill-conditioning of the recurrent matrix by enforcing soft unitary constraints on the factors. Thanks to the small dimensionality of the factors, maintaining these constraints is computationally efficient. Our experimental results on seven standard data-sets reveal that KRU can reduce the number of parameters by three orders of magnitude in the recurrent weight matrix compared to the existing recurrent models, without trading the statistical performance. These results in particular show that while there are advantages in having a high dimensional recurrent space, the capacity of the recurrent part of the model can be dramatically reduced.'
volume: 80
URL: https://proceedings.mlr.press/v80/jose18a.html
PDF: http://proceedings.mlr.press/v80/jose18a/jose18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-jose18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Cijo
family: Jose
- given: Moustapha
family: Cisse
- given: Francois
family: Fleuret
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2380-2389
id: jose18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2380
lastpage: 2389
published: 2018-07-03 00:00:00 +0000
- title: 'Fast Decoding in Sequence Models Using Discrete Latent Variables'
abstract: 'Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and Transformer are the state-of-the-art on many tasks. However, they lack parallelism and are thus slow for long sequences. RNNs lack parallelism both during training and decoding, while architectures like WaveNet and Transformer are much more parallel during training, but still lack parallelism during decoding. We present a method to extend sequence models using discrete latent variables that makes decoding much more parallel. The main idea behind this approach is to first autoencode the target sequence into a shorter discrete latent sequence, which is generated autoregressively, and finally decode the full sequence from this shorter latent sequence in a parallel manner. To this end, we introduce a new method for constructing discrete latent variables and compare it with previously introduced methods. Finally, we verify that our model works on the task of neural machine translation, where our models are an order of magnitude faster than comparable autoregressive models and, while lower in BLEU than purely autoregressive models, better than previously proposed non-autogregressive translation.'
volume: 80
URL: https://proceedings.mlr.press/v80/kaiser18a.html
PDF: http://proceedings.mlr.press/v80/kaiser18a/kaiser18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kaiser18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lukasz
family: Kaiser
- given: Samy
family: Bengio
- given: Aurko
family: Roy
- given: Ashish
family: Vaswani
- given: Niki
family: Parmar
- given: Jakob
family: Uszkoreit
- given: Noam
family: Shazeer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2390-2399
id: kaiser18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2390
lastpage: 2399
published: 2018-07-03 00:00:00 +0000
- title: 'Kernel Recursive ABC: Point Estimation with Intractable Likelihood'
abstract: 'We propose a novel approach to parameter estimation for simulator-based statistical models with intractable likelihood. Our proposed method involves recursive application of kernel ABC and kernel herding to the same observed data. We provide a theoretical explanation regarding why the approach works, showing (for the population setting) that, under a certain assumption, point estimates obtained with this method converge to the true parameter, as recursion proceeds. We have conducted a variety of numerical experiments, including parameter estimation for a real-world pedestrian flow simulator, and show that in most cases our method outperforms existing approaches.'
volume: 80
URL: https://proceedings.mlr.press/v80/kajihara18a.html
PDF: http://proceedings.mlr.press/v80/kajihara18a/kajihara18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kajihara18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Takafumi
family: Kajihara
- given: Motonobu
family: Kanagawa
- given: Keisuke
family: Yamazaki
- given: Kenji
family: Fukumizu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2400-2409
id: kajihara18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2400
lastpage: 2409
published: 2018-07-03 00:00:00 +0000
- title: 'Efficient Neural Audio Synthesis'
abstract: 'Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating desired samples. Efficient sampling for this class of models at the cost of little to no loss in quality has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high output quality. We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it possible to generate 24 kHz 16-bit audio 4 times faster than real time on a GPU. Secondly, we apply a weight pruning technique to reduce the number of weights in the WaveRNN. We find that, for a constant number of parameters, large sparse networks perform better than small dense networks and this relationship holds past sparsity levels of more than 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile phone CPU in real time. Finally, we describe a new dependency scheme for sampling that lets us trade a constant number of non-local, distant dependencies for the ability to generate samples in batches. The Batch WaveRNN produces 8 samples per step without loss of quality and offers orthogonal ways of further increasing sampling efficiency.'
volume: 80
URL: https://proceedings.mlr.press/v80/kalchbrenner18a.html
PDF: http://proceedings.mlr.press/v80/kalchbrenner18a/kalchbrenner18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kalchbrenner18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nal
family: Kalchbrenner
- given: Erich
family: Elsen
- given: Karen
family: Simonyan
- given: Seb
family: Noury
- given: Norman
family: Casagrande
- given: Edward
family: Lockhart
- given: Florian
family: Stimberg
- given: Aaron
family: Oord
- given: Sander
family: Dieleman
- given: Koray
family: Kavukcuoglu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2410-2419
id: kalchbrenner18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2410
lastpage: 2419
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Diffusion using Hyperparameters'
abstract: 'In this paper we advocate for a hyperparametric approach to learn diffusion in the independent cascade (IC) model. The sample complexity of this model is a function of the number of edges in the network and consequently learning becomes infeasible when the network is large. We study a natural restriction of the hypothesis class using additional information available in order to dramatically reduce the sample complexity of the learning process. In particular we assume that diffusion probabilities can be described as a function of a global hyperparameter and features of the individuals in the network. One of the main challenges with this approach is that training a model reduces to optimizing a non-convex objective. Despite this obstacle, we can shrink the best-known sample complexity bound for learning IC by a factor of |E|/d where |E| is the number of edges in the graph and d is the dimension of the hyperparameter. We show that under mild assumptions about the distribution generating the samples one can provably train a model with low generalization error. Finally, we use large-scale diffusion data from Facebook to show that a hyperparametric model using approximately 20 features per node achieves remarkably high accuracy.'
volume: 80
URL: https://proceedings.mlr.press/v80/kalimeris18a.html
PDF: http://proceedings.mlr.press/v80/kalimeris18a/kalimeris18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kalimeris18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dimitris
family: Kalimeris
- given: Yaron
family: Singer
- given: Karthik
family: Subbian
- given: Udi
family: Weinsberg
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2420-2428
id: kalimeris18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2420
lastpage: 2428
published: 2018-07-03 00:00:00 +0000
- title: 'Signal and Noise Statistics Oblivious Orthogonal Matching Pursuit'
abstract: 'Orthogonal matching pursuit (OMP) is a widely used algorithm for recovering sparse high dimensional vectors in linear regression models. The optimal performance of OMP requires a priori knowledge of either the sparsity of regression vector or noise statistics. Both these statistics are rarely known a priori and are very difficult to estimate. In this paper, we present a novel technique called residual ratio thresholding (RRT) to operate OMP without any a priori knowledge of sparsity and noise statistics and establish finite sample and large sample support recovery guarantees for the same. Both analytical results and numerical simulations in real and synthetic data sets indicate that RRT has a performance comparable to OMP with a priori knowledge of sparsity and noise statistics.'
volume: 80
URL: https://proceedings.mlr.press/v80/kallummil18a.html
PDF: http://proceedings.mlr.press/v80/kallummil18a/kallummil18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kallummil18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sreejith
family: Kallummil
- given: Sheetal
family: Kalyani
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2429-2438
id: kallummil18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2429
lastpage: 2438
published: 2018-07-03 00:00:00 +0000
- title: 'Residual Unfairness in Fair Machine Learning from Prejudiced Data'
abstract: 'Recent work in fairness in machine learning has proposed adjusting for fairness by equalizing accuracy metrics across groups and has also studied how datasets affected by historical prejudices may lead to unfair decision policies. We connect these lines of work and study the residual unfairness that arises when a fairness-adjusted predictor is not actually fair on the target population due to systematic censoring of training data by existing biased policies. This scenario is particularly common in the same applications where fairness is a concern. We characterize theoretically the impact of such censoring on standard fairness metrics for binary classifiers and provide criteria for when residual unfairness may or may not appear. We prove that, under certain conditions, fairness-adjusted classifiers will in fact induce residual unfairness that perpetuates the same injustices, against the same groups, that biased the data to begin with, thus showing that even state-of-the-art fair machine learning can have a "bias in, bias out" property. When certain benchmark data is available, we show how sample reweighting can estimate and adjust fairness metrics while accounting for censoring. We use this to study the case of Stop, Question, and Frisk (SQF) and demonstrate that attempting to adjust for fairness perpetuates the same injustices that the policy is infamous for.'
volume: 80
URL: https://proceedings.mlr.press/v80/kallus18a.html
PDF: http://proceedings.mlr.press/v80/kallus18a/kallus18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kallus18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nathan
family: Kallus
- given: Angela
family: Zhou
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2439-2448
id: kallus18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2439
lastpage: 2448
published: 2018-07-03 00:00:00 +0000
- title: 'Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations'
abstract: 'Many structured prediction problems (particularly in vision and language domains) are ambiguous, with multiple outputs being ‘correct’ for an input {–} e.g. there are many ways of describing an image, multiple ways of translating a sentence; however, exhaustively annotating the applicability of all possible outputs is intractable due to exponentially large output spaces (e.g. all English sentences). In practice, these problems are cast as multi-class prediction, with the likelihood of only a sparse set of annotations being maximized {–} unfortunately penalizing for placing beliefs on plausible but unannotated outputs. We make and test the following hypothesis {–} for a given input, the annotations of its neighbors may serve as an additional supervisory signal. Specifically, we propose an objective that transfers supervision from neighboring examples. We first study the properties of our developed method in a controlled toy setup before reporting results on multi-label classification and two image-grounded sequence modeling tasks {–} captioning and question generation. We evaluate using standard task-specific metrics and measures of output diversity, finding consistent improvements over standard maximum likelihood training and other baselines.'
volume: 80
URL: https://proceedings.mlr.press/v80/kalyan18a.html
PDF: http://proceedings.mlr.press/v80/kalyan18a/kalyan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kalyan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ashwin
family: Kalyan
- given: Stefan
family: Lee
- given: Anitha
family: Kannan
- given: Dhruv
family: Batra
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2449-2458
id: kalyan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2449
lastpage: 2458
published: 2018-07-03 00:00:00 +0000
- title: 'Semi-Supervised Learning via Compact Latent Space Clustering'
abstract: 'We present a novel cost function for semi-supervised learning of neural networks that encourages compact clustering of the latent space to facilitate separation. The key idea is to dynamically create a graph over embeddings of labeled and unlabeled samples of a training batch to capture underlying structure in feature space, and use label propagation to estimate its high and low density regions. We then devise a cost function based on Markov chains on the graph that regularizes the latent space to form a single compact cluster per class, while avoiding to disturb existing clusters during optimization. We evaluate our approach on three benchmarks and compare to state-of-the art with promising results. Our approach combines the benefits of graph-based regularization with efficient, inductive inference, does not require modifications to a network architecture, and can thus be easily applied to existing networks to enable an effective use of unlabeled data.'
volume: 80
URL: https://proceedings.mlr.press/v80/kamnitsas18a.html
PDF: http://proceedings.mlr.press/v80/kamnitsas18a/kamnitsas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kamnitsas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Konstantinos
family: Kamnitsas
- given: Daniel
family: Castro
- given: Loic Le
family: Folgoc
- given: Ian
family: Walker
- given: Ryutaro
family: Tanno
- given: Daniel
family: Rueckert
- given: Ben
family: Glocker
- given: Antonio
family: Criminisi
- given: Aditya
family: Nori
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2459-2468
id: kamnitsas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2459
lastpage: 2468
published: 2018-07-03 00:00:00 +0000
- title: 'Policy Optimization with Demonstrations'
abstract: 'Exploration remains a significant challenge to reinforcement learning methods, especially in environments where reward signals are sparse. Recent methods of learning from demonstrations have shown to be promising in overcoming exploration difficulties but typically require considerable high-quality demonstrations that are difficult to collect. We propose to effectively leverage available demonstrations to guide exploration through enforcing occupancy measure matching between the learned policy and current demonstrations, and develop a novel Policy Optimization from Demonstration (POfD) method. We show that POfD induces implicit dynamic reward shaping and brings provable benefits for policy improvement. Furthermore, it can be combined with policy gradient methods to produce state-of-the-art results, as demonstrated experimentally on a range of popular benchmark sparse-reward tasks, even when the demonstrations are few and imperfect.'
volume: 80
URL: https://proceedings.mlr.press/v80/kang18a.html
PDF: http://proceedings.mlr.press/v80/kang18a/kang18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kang18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bingyi
family: Kang
- given: Zequn
family: Jie
- given: Jiashi
family: Feng
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2469-2478
id: kang18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2469
lastpage: 2478
published: 2018-07-03 00:00:00 +0000
- title: 'Improving Sign Random Projections With Additional Information'
abstract: 'Sign random projections (SRP) is a technique which allows the user to quickly estimate the angular similarity and inner products between data. We propose using additional information to improve these estimates which is easy to implement and cost efficient. We prove that the variance of our estimator is lower than the variance of SRP. Our proposed method can also be used together with other modifications of SRP, such as Super-Bit LSH (SBLSH). We demonstrate the effectiveness of our method on the MNIST test dataset and the Gisette dataset. We discuss how our proposed method can be extended to random projections or even other hashing algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/kang18b.html
PDF: http://proceedings.mlr.press/v80/kang18b/kang18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kang18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Keegan
family: Kang
- given: Weipin
family: Wong
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2479-2487
id: kang18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2479
lastpage: 2487
published: 2018-07-03 00:00:00 +0000
- title: 'Let’s be Honest: An Optimal No-Regret Framework for Zero-Sum Games'
abstract: 'We revisit the problem of solving two-player zero-sum games in the decentralized setting. We propose a simple algorithmic framework that simultaneously achieves the best rates for honest regret as well as adversarial regret, and in addition resolves the open problem of removing the logarithmic terms in convergence to the value of the game. We achieve this goal in three steps. First, we provide a novel analysis of the optimistic mirror descent (OMD), showing that it can be modified to guarantee fast convergence for both honest regret and value of the game, when the players are playing collaboratively. Second, we propose a new algorithm, dubbed as robust optimistic mirror descent (ROMD), which attains optimal adversarial regret without knowing the time horizon beforehand. Finally, we propose a simple signaling scheme, which enables us to bridge OMD and ROMD to achieve the best of both worlds. Numerical examples are presented to support our theoretical claims and show that our non-adaptive ROMD algorithm can be competitive to OMD with adaptive step-size selection.'
volume: 80
URL: https://proceedings.mlr.press/v80/kangarshahi18a.html
PDF: http://proceedings.mlr.press/v80/kangarshahi18a/kangarshahi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kangarshahi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ehsan Asadi
family: Kangarshahi
- given: Ya-Ping
family: Hsieh
- given: Mehmet Fatih
family: Sahin
- given: Volkan
family: Cevher
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2488-2496
id: kangarshahi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2488
lastpage: 2496
published: 2018-07-03 00:00:00 +0000
- title: 'Continual Reinforcement Learning with Complex Synapses'
abstract: 'Unlike humans, who are capable of continual learning over their lifetimes, artificial neural networks have long been known to suffer from a phenomenon known as catastrophic forgetting, whereby new learning can lead to abrupt erasure of previously acquired knowledge. Whereas in a neural network the parameters are typically modelled as scalar values, an individual synapse in the brain comprises a complex network of interacting biochemical components that evolve at different timescales. In this paper, we show that by equipping tabular and deep reinforcement learning agents with a synaptic model that incorporates this biological complexity (Benna & Fusi, 2016), catastrophic forgetting can be mitigated at multiple timescales. In particular, we find that as well as enabling continual learning across sequential training of two simple tasks, it can also be used to overcome within-task forgetting by reducing the need for an experience replay database.'
volume: 80
URL: https://proceedings.mlr.press/v80/kaplanis18a.html
PDF: http://proceedings.mlr.press/v80/kaplanis18a/kaplanis18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kaplanis18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christos
family: Kaplanis
- given: Murray
family: Shanahan
- given: Claudia
family: Clopath
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2497-2506
id: kaplanis18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2497
lastpage: 2506
published: 2018-07-03 00:00:00 +0000
- title: 'LaVAN: Localized and Visible Adversarial Noise'
abstract: 'Most works on adversarial examples for deep-learning based image classifiers use noise that, while small, covers the entire image. We explore the case where the noise is allowed to be visible but confined to a small, localized patch of the image, without covering any of the main object(s) in the image. We show that it is possible to generate localized adversarial noises that cover only 2% of the pixels in the image, none of them over the main object, and that are transferable across images and locations, and successfully fool a state-of-the-art Inception v3 model with very high success rates.'
volume: 80
URL: https://proceedings.mlr.press/v80/karmon18a.html
PDF: http://proceedings.mlr.press/v80/karmon18a/karmon18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-karmon18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Danny
family: Karmon
- given: Daniel
family: Zoran
- given: Yoav
family: Goldberg
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2507-2515
id: karmon18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2507
lastpage: 2515
published: 2018-07-03 00:00:00 +0000
- title: 'Riemannian Stochastic Recursive Gradient Algorithm'
abstract: 'Stochastic variance reduction algorithms have recently become popular for minimizing the average of a large, but finite number of loss functions on a Riemannian manifold. The present paper proposes a Riemannian stochastic recursive gradient algorithm (R-SRG), which does not require the inverse of retraction between two distant iterates on the manifold. Convergence analyses of R-SRG are performed on both retraction-convex and non-convex functions under computationally efficient retraction and vector transport operations. The key challenge is analysis of the influence of vector transport along the retraction curve. Numerical evaluations reveal that R-SRG competes well with state-of-the-art Riemannian batch and stochastic gradient algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/kasai18a.html
PDF: http://proceedings.mlr.press/v80/kasai18a/kasai18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kasai18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hiroyuki
family: Kasai
- given: Hiroyuki
family: Sato
- given: Bamdev
family: Mishra
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2516-2524
id: kasai18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2516
lastpage: 2524
published: 2018-07-03 00:00:00 +0000
- title: 'Not All Samples Are Created Equal: Deep Learning with Importance Sampling'
abstract: 'Deep Neural Network training spends most of the computation on examples that are properly handled, and could be ignored. We propose to mitigate this phenomenon with a principled importance sampling scheme that focuses computation on "informative" examples, and reduces the variance of the stochastic gradients during training. Our contribution is twofold: first, we derive a tractable upper bound to the per-sample gradient norm, and second we derive an estimator of the variance reduction achieved with importance sampling, which enables us to switch it on when it will result in an actual speedup. The resulting scheme can be used by changing a few lines of code in a standard SGD procedure, and we demonstrate experimentally on image classification, CNN fine-tuning, and RNN training, that for a fixed wall-clock time budget, it provides a reduction of the train losses of up to an order of magnitude and a relative improvement of test errors between 5% and 17%.'
volume: 80
URL: https://proceedings.mlr.press/v80/katharopoulos18a.html
PDF: http://proceedings.mlr.press/v80/katharopoulos18a/katharopoulos18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-katharopoulos18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Angelos
family: Katharopoulos
- given: Francois
family: Fleuret
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2525-2534
id: katharopoulos18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2525
lastpage: 2534
published: 2018-07-03 00:00:00 +0000
- title: 'Feasible Arm Identification'
abstract: 'We introduce the feasible arm identification problem, a pure exploration multi-armed bandit problem where the agent is given a set of $D$-dimensional arms and a polyhedron $P = \{x : A x \leq b \} \subset R^D$. Pulling an arm gives a random vector and the goal is to determine, using a fixed budget of $T$ pulls, which of the arms have means belonging to $P$. We propose three algorithms MD-UCBE, MD-SAR, and MD-APT and provide a unified analysis establishing upper bounds for each of them. We also establish a lower bound that matches up to constants the upper bounds of MD-UCBE and MD-APT. Finally, we demonstrate the effectiveness of our algorithms on synthetic and real-world datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/katz-samuels18a.html
PDF: http://proceedings.mlr.press/v80/katz-samuels18a/katz-samuels18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-katz-samuels18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Julian
family: Katz-Samuels
- given: Clay
family: Scott
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2535-2543
id: katz-samuels18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2535
lastpage: 2543
published: 2018-07-03 00:00:00 +0000
- title: 'Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints'
abstract: 'Can we efficiently extract useful information from a large user-generated dataset while protecting the privacy of the users and/or ensuring fairness in representation? We cast this problem as an instance of a deletion-robust submodular maximization where part of the data may be deleted or masked due to privacy concerns or fairness criteria. We propose the first memory-efficient centralized, streaming, and distributed methods with constant-factor approximation guarantees against *any* number of adversarial deletions. We extensively evaluate the performance of our algorithms on real-world applications, including (i) Uber-pick up locations with location privacy constraints; (ii) feature selection with fairness constraints for income prediction and crime rate prediction; and (iii) robust to deletion summarization of census data, consisting of 2,458,285 feature vectors. Our experiments show that our solution is robust against even $80%$ of data deletion.'
volume: 80
URL: https://proceedings.mlr.press/v80/kazemi18a.html
PDF: http://proceedings.mlr.press/v80/kazemi18a/kazemi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kazemi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ehsan
family: Kazemi
- given: Morteza
family: Zadimoghaddam
- given: Amin
family: Karbasi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2544-2553
id: kazemi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2544
lastpage: 2553
published: 2018-07-03 00:00:00 +0000
- title: 'Focused Hierarchical RNNs for Conditional Sequence Processing'
abstract: 'Recurrent Neural Networks (RNNs) with attention mechanisms have obtained state-of-the-art results for many sequence processing tasks. Most of these models use a simple form of encoder with attention that looks over the entire sequence and assigns a weight to each token independently. We present a mechanism for focusing RNN encoders for sequence modelling tasks which allows them to attend to key parts of the input as needed. We formulate this using a multi-layer conditional hierarchical sequence encoder that reads in one token at a time and makes a discrete decision on whether the token is relevant to the context or question being asked. The discrete gating mechanism takes in the context embedding and the current hidden state as inputs and controls information flow into the layer above. We train it using policy gradient methods. We evaluate this method on several types of tasks with different attributes. First, we evaluate the method on synthetic tasks which allow us to evaluate the model for its generalization ability and probe the behavior of the gates in more controlled settings. We then evaluate this approach on large scale Question Answering tasks including the challenging MS MARCO and SearchQA tasks. Our models shows consistent improvements for both tasks over prior work and our baselines. It has also shown to generalize significantly better on synthetic tasks as compared to the baselines.'
volume: 80
URL: https://proceedings.mlr.press/v80/ke18a.html
PDF: http://proceedings.mlr.press/v80/ke18a/ke18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ke18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nan Rosemary
family: Ke
- given: Konrad
family: Żołna
- given: Alessandro
family: Sordoni
- given: Zhouhan
family: Lin
- given: Adam
family: Trischler
- given: Yoshua
family: Bengio
- given: Joelle
family: Pineau
- given: Laurent
family: Charlin
- given: Christopher
family: Pal
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2554-2563
id: ke18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2554
lastpage: 2563
published: 2018-07-03 00:00:00 +0000
- title: 'Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness'
abstract: 'The most prevalent notions of fairness in machine learning fix a small collection of pre-defined groups (such as race or gender), and then ask for approximate parity of some statistic of the classifier (such as false positive rate) across these groups. Constraints of this form are susceptible to fairness gerrymandering, in which a classifier is fair on each individual group, but badly violates the fairness constraint on structured subgroups, such as certain combinations of protected attribute values. We thus consider fairness across exponentially or infinitely many subgroups, defined by a structured class of functions over the protected attributes. We first prove that the problem of auditing subgroup fairness for both equality of false positive rates and statistical parity is computationally equivalent to the problem of weak agnostic learning — which means it is hard in the worst case, even for simple structured subclasses. However, it also suggests that common heuristics for learning can be applied to successfully solve the auditing problem in practice. We then derive an algorithm that provably converges in a polynomial number of steps to the best subgroup-fair distribution over classifiers, given access to an oracle which can solve the agnostic learning problem. The algorithm is based on a formulation of subgroup fairness as a zero-sum game between a Learner (the primal player) and an Auditor (the dual player). We implement a variant of this algorithm using heuristic oracles, and show that we can effectively both audit and learn fair classifiers on a real dataset.'
volume: 80
URL: https://proceedings.mlr.press/v80/kearns18a.html
PDF: http://proceedings.mlr.press/v80/kearns18a/kearns18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kearns18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Michael
family: Kearns
- given: Seth
family: Neel
- given: Aaron
family: Roth
- given: Zhiwei Steven
family: Wu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2564-2572
id: kearns18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2564
lastpage: 2572
published: 2018-07-03 00:00:00 +0000
- title: 'Improved nearest neighbor search using auxiliary information and priority functions'
abstract: 'Nearest neighbor search using random projection trees has recently been shown to achieve superior performance, in terms of better accuracy while retrieving less number of data points, compared to locality sensitive hashing based methods. However, to achieve acceptable nearest neighbor search accuracy for large scale applications, where number of data points and/or number of features can be very large, it requires users to maintain, store and search through large number of such independent random projection trees, which may be undesirable for many practical applications. To address this issue, in this paper we present different search strategies to improve nearest neighbor search performance of a single random projection tree. Our approach exploits properties of single and multiple random projections, which allows us to store meaningful auxiliary information at internal nodes of a random projection tree as well as to design priority functions to guide the search process that results in improved nearest neighbor search performance. Empirical results on multiple real world datasets show that our proposed method improves the search accuracy of a single tree compared to baseline methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/keivani18a.html
PDF: http://proceedings.mlr.press/v80/keivani18a/keivani18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-keivani18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Omid
family: Keivani
- given: Kaushik
family: Sinha
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2573-2581
id: keivani18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2573
lastpage: 2581
published: 2018-07-03 00:00:00 +0000
- title: 'ContextNet: Deep learning for Star Galaxy Classification'
abstract: 'We present a framework to compose artificial neural networks in cases where the data cannot be treated as independent events. Our particular motivation is star galaxy classification for ground based optical surveys. Due to a turbulent atmosphere and imperfect instruments, a single image of an astronomical object is not enough to definitively classify it as a star or galaxy. Instead the context of the surrounding objects imaged at the same time need to be considered in order to make an optimal classification. The model we present is divided into three distinct ANNs: one designed to capture local features about each object, the second to compare these features across all objects in an image, and the third to make a final prediction for each object based on the local and compared features. By exploiting the ability to replicate the weights of an ANN, the model can handle an arbitrary and variable number of individual objects embedded in a larger exposure. We train and test our model on simulations of a large up and coming ground based survey, the Large Synoptic Survey Telescope (LSST). We compare to the state of the art approach, showing improved overall performance as well as better performance for a specific class of objects that is important for the LSST.'
volume: 80
URL: https://proceedings.mlr.press/v80/kennamer18a.html
PDF: http://proceedings.mlr.press/v80/kennamer18a/kennamer18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kennamer18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Noble
family: Kennamer
- given: David
family: Kirkby
- given: Alexander
family: Ihler
- given: Francisco Javier
family: Sanchez-Lopez
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2582-2590
id: kennamer18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2582
lastpage: 2590
published: 2018-07-03 00:00:00 +0000
- title: 'Frank-Wolfe with Subsampling Oracle'
abstract: 'We analyze two novel randomized variants of the Frank-Wolfe (FW) or conditional gradient algorithm. While classical FW algorithms require solving a linear minimization problem over the domain at each iteration, the proposed method only requires to solve a linear minimization problem over a small *subset* of the original domain. The first algorithm that we propose is a randomized variant of the original FW algorithm and achieves a $\mathcal{O}(1/t)$ sublinear convergence rate as in the deterministic counterpart. The second algorithm is a randomized variant of the Away-step FW algorithm, and again as its deterministic counterpart, reaches linear (i.e., exponential) convergence rate making it the first provably convergent randomized variant of Away-step FW. In both cases, while subsampling reduces the convergence rate by a constant factor, the linear minimization step can be a fraction of the cost of that of the deterministic versions, especially when the data is streamed. We illustrate computational gains of both algorithms on regression problems, involving both $\ell_1$ and latent group lasso penalties.'
volume: 80
URL: https://proceedings.mlr.press/v80/kerdreux18a.html
PDF: http://proceedings.mlr.press/v80/kerdreux18a/kerdreux18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kerdreux18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Kerdreux
- given: Fabian
family: Pedregosa
- given: Alexandre
family: d’Aspremont
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2591-2600
id: kerdreux18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2591
lastpage: 2600
published: 2018-07-03 00:00:00 +0000
- title: 'Convergence guarantees for a class of non-convex and non-smooth optimization problems'
abstract: 'Non-convex optimization problems arise frequently in machine learning, including feature selection, structured matrix learning, mixture modeling, and neural network training. We consider the problem of finding critical points of a broad class of non-convex problems with non-smooth components. We analyze the behavior of two gradient-based methods—namely a sub-gradient method, and a proximal method. Our main results are to establish rates of convergence for general problems, and also exhibit faster rates for sub-analytic functions. As an application of our theory, we obtain a simplification of the popular CCCP algorithm, which retains all the desirable convergence properties of the original method, along with a significantly lower cost per iteration. We illustrate our methods and theory via application to the problems of best subset selection, robust estimation, and shape from shading reconstruction.'
volume: 80
URL: https://proceedings.mlr.press/v80/khamaru18a.html
PDF: http://proceedings.mlr.press/v80/khamaru18a/khamaru18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-khamaru18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Koulik
family: Khamaru
- given: Martin
family: Wainwright
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2601-2610
id: khamaru18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2601
lastpage: 2610
published: 2018-07-03 00:00:00 +0000
- title: 'Fast and Scalable Bayesian Deep Learning by Weight-Perturbation in Adam'
abstract: 'Uncertainty computation in deep learning is essential to design robust and reliable systems. Variational inference (VI) is a promising approach for such computation, but requires more effort to implement and execute compared to maximum-likelihood methods. In this paper, we propose new natural-gradient algorithms to reduce such efforts for Gaussian mean-field VI. Our algorithms can be implemented within the Adam optimizer by perturbing the network weights during gradient evaluations, and uncertainty estimates can be cheaply obtained by using the vector that adapts the learning rate. This requires lower memory, computation, and implementation effort than existing VI methods, while obtaining uncertainty estimates of comparable quality. Our empirical results confirm this and further suggest that the weight-perturbation in our algorithm could be useful for exploration in reinforcement learning and stochastic optimization.'
volume: 80
URL: https://proceedings.mlr.press/v80/khan18a.html
PDF: http://proceedings.mlr.press/v80/khan18a/khan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-khan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mohammad
family: Khan
- given: Didrik
family: Nielsen
- given: Voot
family: Tangkaratt
- given: Wu
family: Lin
- given: Yarin
family: Gal
- given: Akash
family: Srivastava
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2611-2620
id: khan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2611
lastpage: 2620
published: 2018-07-03 00:00:00 +0000
- title: 'Geometry Score: A Method For Comparing Generative Adversarial Networks'
abstract: 'One of the biggest challenges in the research of generative adversarial networks (GANs) is assessing the quality of generated samples and detecting various levels of mode collapse. In this work, we construct a novel measure of performance of a GAN by comparing geometrical properties of the underlying data manifold and the generated one, which provides both qualitative and quantitative means for evaluation. Our algorithm can be applied to datasets of an arbitrary nature and is not limited to visual data. We test the obtained metric on various real-life models and datasets and demonstrate that our method provides new insights into properties of GANs.'
volume: 80
URL: https://proceedings.mlr.press/v80/khrulkov18a.html
PDF: http://proceedings.mlr.press/v80/khrulkov18a/khrulkov18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-khrulkov18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Valentin
family: Khrulkov
- given: Ivan
family: Oseledets
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2621-2629
id: khrulkov18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2621
lastpage: 2629
published: 2018-07-03 00:00:00 +0000
- title: 'Blind Justice: Fairness with Encrypted Sensitive Attributes'
abstract: 'Recent work has explored how to train machine learning models which do not discriminate against any subgroup of the population as determined by sensitive attributes such as gender or race. To avoid disparate treatment, sensitive attributes should not be considered. On the other hand, in order to avoid disparate impact, sensitive attributes must be examined, e.g., in order to learn a fair model, or to check if a given model is fair. We introduce methods from secure multi-party computation which allow us to avoid both. By encrypting sensitive attributes, we show how an outcome-based fair model may be learned, checked, or have its outputs verified and held to account, without users revealing their sensitive attributes.'
volume: 80
URL: https://proceedings.mlr.press/v80/kilbertus18a.html
PDF: http://proceedings.mlr.press/v80/kilbertus18a/kilbertus18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kilbertus18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Niki
family: Kilbertus
- given: Adria
family: Gascon
- given: Matt
family: Kusner
- given: Michael
family: Veale
- given: Krishna
family: Gummadi
- given: Adrian
family: Weller
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2630-2639
id: kilbertus18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2630
lastpage: 2639
published: 2018-07-03 00:00:00 +0000
- title: 'Markov Modulated Gaussian Cox Processes for Semi-Stationary Intensity Modeling of Events Data'
abstract: 'The Cox process is a flexible event model that can account for uncertainty of the intensity function in the Poisson process. However, previous approaches make strong assumptions in terms of time stationarity, potentially failing to generalize when the data do not conform to the assumed stationarity conditions. In this paper we bring up two most popular Cox models representing two extremes, and propose a novel semi-stationary Cox process model that can take benefits from both models. Our model has a set of Gaussian process latent functions governed by a latent stationary Markov process where we provide analytic derivations for the variational inference. Empirical evaluations on several synthetic and real-world events data including the football shot attempts and daily earthquakes, demonstrate that the proposed model is promising, can yield improved generalization performance over existing approaches.'
volume: 80
URL: https://proceedings.mlr.press/v80/kim18a.html
PDF: http://proceedings.mlr.press/v80/kim18a/kim18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kim18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Minyoung
family: Kim
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2640-2648
id: kim18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2640
lastpage: 2648
published: 2018-07-03 00:00:00 +0000
- title: 'Disentangling by Factorising'
abstract: 'We define and address the problem of unsupervised learning of disentangled representations on data generated from independent factors of variation. We propose FactorVAE, a method that disentangles by encouraging the distribution of representations to be factorial and hence independent across the dimensions. We show that it improves upon beta-VAE by providing a better trade-off between disentanglement and reconstruction quality and being more robust to the number of training iterations. Moreover, we highlight the problems of a commonly used disentanglement metric and introduce a new metric that does not suffer from them.'
volume: 80
URL: https://proceedings.mlr.press/v80/kim18b.html
PDF: http://proceedings.mlr.press/v80/kim18b/kim18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kim18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hyunjik
family: Kim
- given: Andriy
family: Mnih
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2649-2658
id: kim18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2649
lastpage: 2658
published: 2018-07-03 00:00:00 +0000
- title: 'Self-Bounded Prediction Suffix Tree via Approximate String Matching'
abstract: 'Prediction suffix trees (PST) provide an effective tool for sequence modelling and prediction. Current prediction techniques for PSTs rely on exact matching between the suffix of the current sequence and the previously observed sequence. We present a provably correct algorithm for learning a PST with approximate suffix matching by relaxing the exact matching condition. We then present a self-bounded enhancement of our algorithm where the depth of suffix tree grows automatically in response to the model performance on a training sequence. Through experiments on synthetic datasets as well as three real-world datasets, we show that the approximate matching PST results in better predictive performance than the other variants of PST.'
volume: 80
URL: https://proceedings.mlr.press/v80/kim18c.html
PDF: http://proceedings.mlr.press/v80/kim18c/kim18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kim18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dongwoo
family: Kim
- given: Christian
family: Walder
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2659-2667
id: kim18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2659
lastpage: 2667
published: 2018-07-03 00:00:00 +0000
- title: 'Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)'
abstract: 'The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts. To address these challenges, we introduce Concept Activation Vectors (CAVs), which provide an interpretation of a neural net’s internal state in terms of human-friendly concepts. The key idea is to view the high-dimensional internal state of a neural net as an aid, not an obstacle. We show how to use CAVs as part of a technique, Testing with CAVs (TCAV), that uses directional derivatives to quantify the degree to which a user-defined concept is important to a classification result–for example, how sensitive a prediction of “zebra” is to the presence of stripes. Using the domain of image classification as a testing ground, we describe how CAVs may be used to explore hypotheses and generate insights for a standard image classification network as well as a medical application.'
volume: 80
URL: https://proceedings.mlr.press/v80/kim18d.html
PDF: http://proceedings.mlr.press/v80/kim18d/kim18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kim18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Been
family: Kim
- given: Martin
family: Wattenberg
- given: Justin
family: Gilmer
- given: Carrie
family: Cai
- given: James
family: Wexler
- given: Fernanda
family: Viegas
- given: Rory
family: sayres
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2668-2677
id: kim18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2668
lastpage: 2677
published: 2018-07-03 00:00:00 +0000
- title: 'Semi-Amortized Variational Autoencoders'
abstract: 'Amortized variational inference (AVI) replaces instance-specific local inference with a global inference network. While AVI has enabled efficient training of deep generative models such as variational autoencoders (VAE), recent empirical work suggests that inference networks can produce suboptimal variational parameters. We propose a hybrid approach, to use AVI to initialize the variational parameters and run stochastic variational inference (SVI) to refine them. Crucially, the local SVI procedure is itself differentiable, so the inference network and generative model can be trained end-to-end with gradient-based optimization. This semi-amortized approach enables the use of rich generative models without experiencing the posterior-collapse phenomenon common in training VAEs for problems like text generation. Experiments show this approach outperforms strong autoregressive and variational baselines on standard text and image datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/kim18e.html
PDF: http://proceedings.mlr.press/v80/kim18e/kim18e.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kim18e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yoon
family: Kim
- given: Sam
family: Wiseman
- given: Andrew
family: Miller
- given: David
family: Sontag
- given: Alexander
family: Rush
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2678-2687
id: kim18e
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2678
lastpage: 2687
published: 2018-07-03 00:00:00 +0000
- title: 'Neural Relational Inference for Interacting Systems'
abstract: 'Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system’s constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data.'
volume: 80
URL: https://proceedings.mlr.press/v80/kipf18a.html
PDF: http://proceedings.mlr.press/v80/kipf18a/kipf18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kipf18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Kipf
- given: Ethan
family: Fetaya
- given: Kuan-Chieh
family: Wang
- given: Max
family: Welling
- given: Richard
family: Zemel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2688-2697
id: kipf18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2688
lastpage: 2697
published: 2018-07-03 00:00:00 +0000
- title: 'An Alternative View: When Does SGD Escape Local Minima?'
abstract: 'Stochastic gradient descent (SGD) is widely used in machine learning. Although being commonly viewed as a fast but not accurate version of gradient descent (GD), it always finds better solutions than GD for modern neural networks. In order to understand this phenomenon, we take an alternative view that SGD is working on the convolved (thus smoothed) version of the loss function. We show that, even if the function $f$ has many bad local minima or saddle points, as long as for every point $x$, the weighted average of the gradients of its neighborhoods is one point convex with respect to the desired solution $x^*$, SGD will get close to, and then stay around $x^*$ with constant probability. Our result identifies a set of functions that SGD provably works, which is much larger than the set of convex functions. Empirically, we observe that the loss surface of neural networks enjoys nice one point convexity properties locally, therefore our theorem helps explain why SGD works so well for neural networks.'
volume: 80
URL: https://proceedings.mlr.press/v80/kleinberg18a.html
PDF: http://proceedings.mlr.press/v80/kleinberg18a/kleinberg18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kleinberg18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bobby
family: Kleinberg
- given: Yuanzhi
family: Li
- given: Yang
family: Yuan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2698-2707
id: kleinberg18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2698
lastpage: 2707
published: 2018-07-03 00:00:00 +0000
- title: 'Crowdsourcing with Arbitrary Adversaries'
abstract: 'Most existing works on crowdsourcing assume that the workers follow the Dawid-Skene model, or the one-coin model as its special case, where every worker makes mistakes independently of other workers and with the same error probability for every task. We study a significant extension of this restricted model. We allow almost half of the workers to deviate from the one-coin model and for those workers, their probabilities of making an error to be task-dependent and to be arbitrarily correlated. In other words, we allow for arbitrary adversaries, for which not only error probabilities can be high, but which can also perfectly collude. In this adversarial scenario, we design an efficient algorithm to consistently estimate the workers’ error probabilities.'
volume: 80
URL: https://proceedings.mlr.press/v80/kleindessner18a.html
PDF: http://proceedings.mlr.press/v80/kleindessner18a/kleindessner18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kleindessner18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthaeus
family: Kleindessner
- given: Pranjal
family: Awasthi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2708-2717
id: kleindessner18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2708
lastpage: 2717
published: 2018-07-03 00:00:00 +0000
- title: 'Spatio-temporal Bayesian On-line Changepoint Detection with Model Selection'
abstract: 'Bayesian On-line Changepoint Detection is extended to on-line model selection and non-stationary spatio-temporal processes. We propose spatially structured Vector Autoregressions (VARs) for modelling the process between changepoints (CPs) and give an upper bound on the approximation error of such models. The resulting algorithm performs prediction, model selection and CP detection on-line. Its time complexity is linear and its space complexity constant, and thus it is two orders of magnitudes faster than its closest competitor. In addition, it outperforms the state of the art for multivariate data.'
volume: 80
URL: https://proceedings.mlr.press/v80/knoblauch18a.html
PDF: http://proceedings.mlr.press/v80/knoblauch18a/knoblauch18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-knoblauch18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jeremias
family: Knoblauch
- given: Theodoros
family: Damoulas
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2718-2727
id: knoblauch18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2718
lastpage: 2727
published: 2018-07-03 00:00:00 +0000
- title: 'Fast Gradient-Based Methods with Exponential Rate: A Hybrid Control Framework'
abstract: 'Ordinary differential equations, and in general a dynamical system viewpoint, have seen a resurgence of interest in developing fast optimization methods, mainly thanks to the availability of well-established analysis tools. In this study, we pursue a similar objective and propose a class of hybrid control systems that adopts a 2nd-order differential equation as its continuous flow. A distinctive feature of the proposed differential equation in comparison with the existing literature is a state-dependent, time-invariant damping term that acts as a feedback control input. Given a user-defined scalar $\alpha$, it is shown that the proposed control input steers the state trajectories to the global optimizer of a desired objective function with a guaranteed rate of convergence $\mathcal{O}(e^{-\alpha t})$. Our framework requires that the objective function satisfies the so called Polyak–{Ł}ojasiewicz inequality. Furthermore, a discretization method is introduced such that the resulting discrete dynamical system possesses an exponential rate of convergence.'
volume: 80
URL: https://proceedings.mlr.press/v80/kolarijani18a.html
PDF: http://proceedings.mlr.press/v80/kolarijani18a/kolarijani18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kolarijani18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arman Sharifi
family: Kolarijani
- given: Peyman Mohajerin
family: Esfahani
- given: Tamas
family: Keviczky
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2728-2736
id: kolarijani18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2728
lastpage: 2736
published: 2018-07-03 00:00:00 +0000
- title: 'Nonconvex Optimization for Regression with Fairness Constraints'
abstract: 'The unfairness of a regressor is evaluated by measuring the correlation between the estimator and the sensitive attribute (e.g., race, gender, age), and the coefficient of determination (CoD) is a natural extension of the correlation coefficient when more than one sensitive attribute exists. As is well known, there is a trade-off between fairness and accuracy of a regressor, which implies a perfectly fair optimizer does not always yield a useful prediction. Taking this into consideration, we optimize the accuracy of the estimation subject to a user-defined level of fairness. However, a fairness level as a constraint induces a nonconvexity of the feasible region, which disables the use of an off-the-shelf convex optimizer. Despite such nonconvexity, we show an exact solution is available by using tools of global optimization theory. Furthermore, we propose a nonlinear extension of the method by kernel representation. Unlike most of existing fairness-aware machine learning methods, our method allows us to deal with numeric and multiple sensitive attributes.'
volume: 80
URL: https://proceedings.mlr.press/v80/komiyama18a.html
PDF: http://proceedings.mlr.press/v80/komiyama18a/komiyama18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-komiyama18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Junpei
family: Komiyama
- given: Akiko
family: Takeda
- given: Junya
family: Honda
- given: Hajime
family: Shimao
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2737-2746
id: komiyama18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2737
lastpage: 2746
published: 2018-07-03 00:00:00 +0000
- title: 'On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups'
abstract: 'Convolutional neural networks have been extremely successful in the image recognition domain because they ensure equivariance with respect to translations. There have been many recent attempts to generalize this framework to other domains, including graphs and data lying on manifolds. In this paper we give a rigorous, theoretical treatment of convolution and equivariance in neural networks with respect to not just translations, but the action of any compact group. Our main result is to prove that (given some natural constraints) convolutional structure is not just a sufficient, but also a necessary condition for equivariance to the action of a compact group. Our exposition makes use of concepts from representation theory and noncommutative harmonic analysis and derives new generalized convolution formulae.'
volume: 80
URL: https://proceedings.mlr.press/v80/kondor18a.html
PDF: http://proceedings.mlr.press/v80/kondor18a/kondor18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kondor18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Risi
family: Kondor
- given: Shubhendu
family: Trivedi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2747-2755
id: kondor18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2747
lastpage: 2755
published: 2018-07-03 00:00:00 +0000
- title: 'Compiling Combinatorial Prediction Games'
abstract: 'In online optimization, the goal is to iteratively choose solutions from a decision space, so as to minimize the average cost over time. As long as this decision space is described by combinatorial constraints, the problem is generally intractable. In this paper, we consider the paradigm of compiling the set of combinatorial constraints into a deterministic and Decomposable Negation Normal Form (dDNNF) circuit, for which the tasks of linear optimization and solution sampling take linear time. Based on this framework, we provide efficient characterizations of existing combinatorial prediction strategies, with a particular attention to mirror descent techniques. These strategies are compared on several real-world benchmarks for which the set of Boolean constraints is preliminarily compiled into a dDNNF circuit.'
volume: 80
URL: https://proceedings.mlr.press/v80/koriche18a.html
PDF: http://proceedings.mlr.press/v80/koriche18a/koriche18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-koriche18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Frederic
family: Koriche
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2756-2765
id: koriche18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2756
lastpage: 2765
published: 2018-07-03 00:00:00 +0000
- title: 'Dynamic Evaluation of Neural Sequence Models'
abstract: 'We explore dynamic evaluation, where sequence models are adapted to the recent sequence history using gradient descent, assigning higher probabilities to re-occurring sequential patterns. We develop a dynamic evaluation approach that outperforms existing adaptation approaches in our comparisons. We apply dynamic evaluation to outperform all previous word-level perplexities on the Penn Treebank and WikiText-2 datasets (achieving 51.1 and 44.3 respectively) and all previous character-level cross-entropies on the text8 and Hutter Prize datasets (achieving 1.19 bits/char and 1.08 bits/char respectively).'
volume: 80
URL: https://proceedings.mlr.press/v80/krause18a.html
PDF: http://proceedings.mlr.press/v80/krause18a/krause18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-krause18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ben
family: Krause
- given: Emmanuel
family: Kahembwe
- given: Iain
family: Murray
- given: Steve
family: Renals
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2766-2775
id: krause18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2766
lastpage: 2775
published: 2018-07-03 00:00:00 +0000
- title: 'Semiparametric Contextual Bandits'
abstract: 'This paper studies semiparametric contextual bandits, a generalization of the linear stochastic bandit problem where the reward for a chosen action is modeled as a linear function of known action features confounded by a non-linear action-independent term. We design new algorithms that achieve $\tilde{O}(d\sqrt{T})$ regret over $T$ rounds, when the linear function is $d$-dimensional, which matches the best known bounds for the simpler unconfounded case and improves on a recent result of Greenwald et al. (2017). Via an empirical evaluation, we show that our algorithms outperform prior approaches when there are non-linear confounding effects on the rewards. Technically, our algorithms use a new reward estimator inspired by doubly-robust approaches and our proofs require new concentration inequalities for self-normalized martingales.'
volume: 80
URL: https://proceedings.mlr.press/v80/krishnamurthy18a.html
PDF: http://proceedings.mlr.press/v80/krishnamurthy18a/krishnamurthy18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-krishnamurthy18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Akshay
family: Krishnamurthy
- given: Zhiwei Steven
family: Wu
- given: Vasilis
family: Syrgkanis
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2776-2785
id: krishnamurthy18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2776
lastpage: 2785
published: 2018-07-03 00:00:00 +0000
- title: 'Fast Maximization of Non-Submodular, Monotonic Functions on the Integer Lattice'
abstract: 'The optimization of submodular functions on the integer lattice has received much attention recently, but the objective functions of many applications are non-submodular. We provide two approximation algorithms for maximizing a non-submodular function on the integer lattice subject to a cardinality constraint; these are the first algorithms for this purpose that have polynomial query complexity. We propose a general framework for influence maximization on the integer lattice that generalizes prior works on this topic, and we demonstrate the efficiency of our algorithms in this context.'
volume: 80
URL: https://proceedings.mlr.press/v80/kuhnle18a.html
PDF: http://proceedings.mlr.press/v80/kuhnle18a/kuhnle18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kuhnle18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alan
family: Kuhnle
- given: J. David
family: Smith
- given: Victoria
family: Crawford
- given: My
family: Thai
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2786-2795
id: kuhnle18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2786
lastpage: 2795
published: 2018-07-03 00:00:00 +0000
- title: 'Accurate Uncertainties for Deep Learning Using Calibrated Regression'
abstract: 'Methods for reasoning under uncertainty are a key building block of accurate and reliable machine learning systems. Bayesian methods provide a general framework to quantify uncertainty. However, because of model misspecification and the use of approximate inference, Bayesian uncertainty estimates are often inaccurate {—} for example, a 90% credible interval may not contain the true outcome 90% of the time. Here, we propose a simple procedure for calibrating any regression algorithm; when applied to Bayesian and probabilistic models, it is guaranteed to produce calibrated uncertainty estimates given enough data. Our procedure is inspired by Platt scaling and extends previous work on classification. We evaluate this approach on Bayesian linear regression, feedforward, and recurrent neural networks, and find that it consistently outputs well-calibrated credible intervals while improving performance on time series forecasting and model-based reinforcement learning tasks.'
volume: 80
URL: https://proceedings.mlr.press/v80/kuleshov18a.html
PDF: http://proceedings.mlr.press/v80/kuleshov18a/kuleshov18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kuleshov18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Volodymyr
family: Kuleshov
- given: Nathan
family: Fenner
- given: Stefano
family: Ermon
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2796-2804
id: kuleshov18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2796
lastpage: 2804
published: 2018-07-03 00:00:00 +0000
- title: 'Trainable Calibration Measures for Neural Networks from Kernel Mean Embeddings'
abstract: 'Modern neural networks have recently been found to be poorly calibrated, primarily in the direction of over-confidence. Methods like entropy penalty and temperature smoothing improve calibration by clamping confidence, but in doing so compromise the many legitimately confident predictions. We propose a more principled fix that minimizes an explicit calibration error during training. We present MMCE, a RKHS kernel based measure of calibration that is efficiently trainable alongside the negative likelihood loss without careful hyper-parameter tuning. Theoretically too, MMCE is a sound measure of calibration that is minimized at perfect calibration, and whose finite sample estimates are consistent and enjoy fast convergence rates. Extensive experiments on several network architectures demonstrate that MMCE is a fast, stable, and accurate method to minimize calibration error while maximally preserving the number of high confidence predictions.'
volume: 80
URL: https://proceedings.mlr.press/v80/kumar18a.html
PDF: http://proceedings.mlr.press/v80/kumar18a/kumar18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kumar18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aviral
family: Kumar
- given: Sunita
family: Sarawagi
- given: Ujjwal
family: Jain
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2805-2814
id: kumar18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2805
lastpage: 2814
published: 2018-07-03 00:00:00 +0000
- title: 'Data-Dependent Stability of Stochastic Gradient Descent'
abstract: 'We establish a data-dependent notion of algorithmic stability for Stochastic Gradient Descent (SGD), and employ it to develop novel generalization bounds. This is in contrast to previous distribution-free algorithmic stability results for SGD which depend on the worst-case constants. By virtue of the data-dependent argument, our bounds provide new insights into learning with SGD on convex and non-convex problems. In the convex case, we show that the bound on the generalization error depends on the risk at the initialization point. In the non-convex case, we prove that the expected curvature of the objective function around the initialization point has crucial influence on the generalization error. In both cases, our results suggest a simple data-driven strategy to stabilize SGD by pre-screening its initialization. As a corollary, our results allow us to show optimistic generalization bounds that exhibit fast convergence rates for SGD subject to a vanishing empirical risk and low noise of stochastic gradient.'
volume: 80
URL: https://proceedings.mlr.press/v80/kuzborskij18a.html
PDF: http://proceedings.mlr.press/v80/kuzborskij18a/kuzborskij18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-kuzborskij18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ilja
family: Kuzborskij
- given: Christoph
family: Lampert
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2815-2824
id: kuzborskij18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2815
lastpage: 2824
published: 2018-07-03 00:00:00 +0000
- title: 'Explicit Inductive Bias for Transfer Learning with Convolutional Networks'
abstract: 'In inductive transfer learning, fine-tuning pre-trained convolutional networks substantially outperforms training from scratch. When using fine-tuning, the underlying assumption is that the pre-trained model extracts generic features, which are at least partially relevant for solving the target task, but would be difficult to extract from the limited amount of data available on the target task. However, besides the initialization with the pre-trained model and the early stopping, there is no mechanism in fine-tuning for retaining the features learned on the source task. In this paper, we investigate several regularization schemes that explicitly promote the similarity of the final solution with the initial model. We show the benefit of having an explicit inductive bias towards the initial model, and we eventually recommend a simple $L^2$ penalty with the pre-trained model being a reference as the baseline of penalty for transfer learning tasks.'
volume: 80
URL: https://proceedings.mlr.press/v80/li18a.html
PDF: http://proceedings.mlr.press/v80/li18a/li18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-li18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xuhong
family: LI
- given: Yves
family: Grandvalet
- given: Franck
family: Davoine
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2825-2834
id: li18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2825
lastpage: 2834
published: 2018-07-03 00:00:00 +0000
- title: 'Understanding the Loss Surface of Neural Networks for Binary Classification'
abstract: 'It is widely conjectured that training algorithms for neural networks are successful because all local minima lead to similar performance; for example, see (LeCun et al., 2015; Choromanska et al., 2015; Dauphin et al., 2014). Performance is typically measured in terms of two metrics: training performance and generalization performance. Here we focus on the training performance of neural networks for binary classification, and provide conditions under which the training error is zero at all local minima of appropriately chosen surrogate loss functions. Our conditions are roughly in the following form: the neurons have to be increasing and strictly convex, the neural network should either be single-layered or is multi-layered with a shortcut-like connection, and the surrogate loss function should be a smooth version of hinge loss. We also provide counterexamples to show that, when these conditions are relaxed, the result may not hold.'
volume: 80
URL: https://proceedings.mlr.press/v80/liang18a.html
PDF: http://proceedings.mlr.press/v80/liang18a/liang18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liang18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: SHIYU
family: LIANG
- given: Ruoyu
family: Sun
- given: Yixuan
family: Li
- given: Rayadurgam
family: Srikant
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2835-2843
id: liang18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2835
lastpage: 2843
published: 2018-07-03 00:00:00 +0000
- title: 'Mixed batches and symmetric discriminators for GAN training'
abstract: 'Generative adversarial networks (GANs) are pow- erful generative models based on providing feed- back to a generative network via a discriminator network. However, the discriminator usually as- sesses individual samples. This prevents the dis- criminator from accessing global distributional statistics of generated samples, and often leads to mode dropping: the generator models only part of the target distribution. We propose to feed the discriminator with mixed batches of true and fake samples, and train it to predict the ratio of true samples in the batch. The latter score does not depend on the order of samples in a batch. Rather than learning this invariance, we introduce a generic permutation-invariant discriminator ar- chitecture. This architecture is provably a uni- versal approximator of all symmetric functions. Experimentally, our approach reduces mode col- lapse in GANs on two synthetic datasets, and obtains good results on the CIFAR10 and CelebA datasets, both qualitatively and quantitatively.'
volume: 80
URL: https://proceedings.mlr.press/v80/lucas18a.html
PDF: http://proceedings.mlr.press/v80/lucas18a/lucas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lucas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: LUCAS
- given: Corentin
family: Tallec
- given: Yann
family: Ollivier
- given: Jakob
family: Verbeek
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2844-2853
id: lucas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2844
lastpage: 2853
published: 2018-07-03 00:00:00 +0000
- title: 'Binary Partitions with Approximate Minimum Impurity'
abstract: 'The problem of splitting attributes is one of the main steps in the construction of decision trees. In order to decide the best split, impurity measures such as Entropy and Gini are widely used. In practice, decision-tree inducers use heuristics for finding splits with small impurity when they consider nominal attributes with a large number of distinct values. However, there are no known guarantees for the quality of the splits obtained by these heuristics. To fill this gap, we propose two new splitting procedures that provably achieve near-optimal impurity. We also report experiments that provide evidence that the proposed methods are interesting candidates to be employed in splitting nominal attributes with many values during decision tree/random forest induction.'
volume: 80
URL: https://proceedings.mlr.press/v80/laber18a.html
PDF: http://proceedings.mlr.press/v80/laber18a/laber18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-laber18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eduardo
family: Laber
- given: Marco
family: Molinaro
- given: Felipe Mello
family: Pereira
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2854-2862
id: laber18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2854
lastpage: 2862
published: 2018-07-03 00:00:00 +0000
- title: 'Canonical Tensor Decomposition for Knowledge Base Completion'
abstract: 'The problem of Knowledge Base Completion can be framed as a 3rd-order binary tensor completion problem. In this light, the Canonical Tensor Decomposition (CP) seems like a natural solution; however, current implementations of CP on standard Knowledge Base Completion benchmarks are lagging behind their competitors. In this work, we attempt to understand the limits of CP for knowledge base completion. First, we motivate and test a novel regularizer, based on tensor nuclear p-norms. Then, we present a reformulation of the problem that makes it invariant to arbitrary choices in the inclusion of predicates or their reciprocals in the dataset. These two methods combined allow us to beat the current state of the art on several datasets with a CP decomposition, and obtain even better results using the more advanced ComplEx model.'
volume: 80
URL: https://proceedings.mlr.press/v80/lacroix18a.html
PDF: http://proceedings.mlr.press/v80/lacroix18a/lacroix18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lacroix18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Timothee
family: Lacroix
- given: Nicolas
family: Usunier
- given: Guillaume
family: Obozinski
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2863-2872
id: lacroix18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2863
lastpage: 2872
published: 2018-07-03 00:00:00 +0000
- title: 'Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks'
abstract: 'Humans can understand and produce new utterances effortlessly, thanks to their compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax." In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can make successful zero-shot generalizations when the differences between training and test commands are small, so that they can apply "mix-and-match" strategies to solve the task. However, when generalization requires systematic compositional skills (as in the "dax" example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, suggesting that lack of systematicity might be partially responsible for neural networks’ notorious training data thirst.'
volume: 80
URL: https://proceedings.mlr.press/v80/lake18a.html
PDF: http://proceedings.mlr.press/v80/lake18a/lake18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lake18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brenden
family: Lake
- given: Marco
family: Baroni
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2873-2882
id: lake18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2873
lastpage: 2882
published: 2018-07-03 00:00:00 +0000
- title: 'An Estimation and Analysis Framework for the Rasch Model'
abstract: 'The Rasch model is widely used for item response analysis in applications ranging from recommender systems to psychology, education, and finance. While a number of estimators have been proposed for the Rasch model over the last decades, the associated analytical performance guarantees are mostly asymptotic. This paper provides a framework that relies on a novel linear minimum mean-squared error (L-MMSE) estimator which enables an exact, nonasymptotic, and closed-form analysis of the parameter estimation error under the Rasch model. The proposed framework provides guidelines on the number of items and responses required to attain low estimation errors in tests or surveys. We furthermore demonstrate its efficacy on a number of real-world collaborative filtering datasets, which reveals that the proposed L-MMSE estimator performs on par with state-of-the-art nonlinear estimators in terms of predictive performance.'
volume: 80
URL: https://proceedings.mlr.press/v80/lan18a.html
PDF: http://proceedings.mlr.press/v80/lan18a/lan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrew
family: Lan
- given: Mung
family: Chiang
- given: Christoph
family: Studer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2883-2891
id: lan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2883
lastpage: 2891
published: 2018-07-03 00:00:00 +0000
- title: 'Partial Optimality and Fast Lower Bounds for Weighted Correlation Clustering'
abstract: 'Weighted correlation clustering is hard to solve and hard to approximate for general graphs. Its applications in network analysis and computer vision call for efficient algorithms. To this end, we make three contributions: We establish partial optimality conditions that can be checked efficiently, and doing so recursively solves the problem for series-parallel graphs to optimality, in linear time. We exploit the packing dual of the problem to compute a heuristic, but non-trivial lower bound faster than that of a canonical linear program relaxation. We introduce a re-weighting with the dual solution by which efficient local search algorithms converge to better feasible solutions. The effectiveness of our methods is demonstrated empirically on a number of benchmark instances.'
volume: 80
URL: https://proceedings.mlr.press/v80/lange18a.html
PDF: http://proceedings.mlr.press/v80/lange18a/lange18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lange18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jan-Hendrik
family: Lange
- given: Andreas
family: Karrenbauer
- given: Bjoern
family: Andres
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2892-2901
id: lange18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2892
lastpage: 2901
published: 2018-07-03 00:00:00 +0000
- title: 'Deep Linear Networks with Arbitrary Loss: All Local Minima Are Global'
abstract: 'We consider deep linear networks with arbitrary convex differentiable loss. We provide a short and elementary proof of the fact that all local minima are global minima if the hidden layers are either 1) at least as wide as the input layer, or 2) at least as wide as the output layer. This result is the strongest possible in the following sense: If the loss is convex and Lipschitz but not differentiable then deep linear networks can have sub-optimal local minima.'
volume: 80
URL: https://proceedings.mlr.press/v80/laurent18a.html
PDF: http://proceedings.mlr.press/v80/laurent18a/laurent18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-laurent18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Laurent
- given: James
family: Brecht
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2902-2907
id: laurent18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2902
lastpage: 2907
published: 2018-07-03 00:00:00 +0000
- title: 'The Multilinear Structure of ReLU Networks'
abstract: 'We study the loss surface of neural networks equipped with a hinge loss criterion and ReLU or leaky ReLU nonlinearities. Any such network defines a piecewise multilinear form in parameter space. By appealing to harmonic analysis we show that all local minima of such network are non-differentiable, except for those minima that occur in a region of parameter space where the loss surface is perfectly flat. Non-differentiable minima are therefore not technicalities or pathologies; they are heart of the problem when investigating the loss of ReLU networks. As a consequence, we must employ techniques from nonsmooth analysis to study these loss surfaces. We show how to apply these techniques in some illustrative cases.'
volume: 80
URL: https://proceedings.mlr.press/v80/laurent18b.html
PDF: http://proceedings.mlr.press/v80/laurent18b/laurent18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-laurent18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Laurent
- given: James
family: Brecht
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2908-2916
id: laurent18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2908
lastpage: 2916
published: 2018-07-03 00:00:00 +0000
- title: 'Hierarchical Imitation and Reinforcement Learning'
abstract: 'We study how to effectively leverage expert feedback to learn sequential decision-making policies. We focus on problems with sparse rewards and long time horizons, which typically pose significant challenges in reinforcement learning. We propose an algorithmic framework, called hierarchical guidance, that leverages the hierarchical structure of the underlying problem to integrate different modes of expert interaction. Our framework can incorporate different combinations of imitation learning (IL) and reinforcement learning (RL) at different levels, leading to dramatic reductions in both expert effort and cost of exploration. Using long-horizon benchmarks, including Montezuma’s Revenge, we demonstrate that our approach can learn significantly faster than hierarchical RL, and be significantly more label-efficient than standard IL. We also theoretically analyze labeling cost for certain instantiations of our framework.'
volume: 80
URL: https://proceedings.mlr.press/v80/le18a.html
PDF: http://proceedings.mlr.press/v80/le18a/le18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-le18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hoang
family: Le
- given: Nan
family: Jiang
- given: Alekh
family: Agarwal
- given: Miroslav
family: Dudik
- given: Yisong
family: Yue
- given: Hal
family: Daumé
suffix: III
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2917-2926
id: le18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2917
lastpage: 2926
published: 2018-07-03 00:00:00 +0000
- title: 'Gradient-Based Meta-Learning with Learned Layerwise Metric and Subspace'
abstract: 'Gradient-based meta-learning methods leverage gradient descent to learn the commonalities among various tasks. While previous such methods have been successful in meta-learning tasks, they resort to simple gradient descent during meta-testing. Our primary contribution is the *MT-net*, which enables the meta-learner to learn on each layer’s activation space a subspace that the task-specific learner performs gradient descent on. Additionally, a task-specific learner of an *MT-net* performs gradient descent with respect to a meta-learned distance metric, which warps the activation space to be more sensitive to task identity. We demonstrate that the dimension of this learned subspace reflects the complexity of the task-specific learner’s adaptation task, and also that our model is less sensitive to the choice of initial learning rates than previous gradient-based meta-learning methods. Our method achieves state-of-the-art or comparable performance on few-shot classification and regression tasks.'
volume: 80
URL: https://proceedings.mlr.press/v80/lee18a.html
PDF: http://proceedings.mlr.press/v80/lee18a/lee18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lee18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yoonho
family: Lee
- given: Seungjin
family: Choi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2927-2936
id: lee18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2927
lastpage: 2936
published: 2018-07-03 00:00:00 +0000
- title: 'Deep Reinforcement Learning in Continuous Action Spaces: a Case Study in the Game of Simulated Curling'
abstract: 'Many real-world applications of reinforcement learning require an agent to select optimal actions from continuous spaces. Recently, deep neural networks have successfully been applied to games with discrete actions spaces. However, deep neural networks for discrete actions are not suitable for devising strategies for games where a very small change in an action can dramatically affect the outcome. In this paper, we present a new self-play reinforcement learning framework which equips a continuous search algorithm which enables to search in continuous action spaces with a kernel regression method. Without any hand-crafted features, our network is trained by supervised learning followed by self-play reinforcement learning with a high-fidelity simulator for the Olympic sport of curling. The program trained under our framework outperforms existing programs equipped with several hand-crafted features and won an international digital curling competition.'
volume: 80
URL: https://proceedings.mlr.press/v80/lee18b.html
PDF: http://proceedings.mlr.press/v80/lee18b/lee18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lee18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kyowoon
family: Lee
- given: Sol-A
family: Kim
- given: Jaesik
family: Choi
- given: Seong-Whan
family: Lee
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2937-2946
id: lee18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2937
lastpage: 2946
published: 2018-07-03 00:00:00 +0000
- title: 'Gated Path Planning Networks'
abstract: 'Value Iteration Networks (VINs) are effective differentiable path planning modules that can be used by agents to perform navigation while still maintaining end-to-end differentiability of the entire architecture. Despite their effectiveness, they suffer from several disadvantages including training instability, random seed sensitivity, and other optimization problems. In this work, we reframe VINs as recurrent-convolutional networks which demonstrates that VINs couple recurrent convolutions with an unconventional max-pooling activation. From this perspective, we argue that standard gated recurrent update equations could potentially alleviate the optimization issues plaguing VIN. The resulting architecture, which we call the Gated Path Planning Network, is shown to empirically outperform VIN on a variety of metrics such as learning speed, hyperparameter sensitivity, iteration count, and even generalization. Furthermore, we show that this performance gap is consistent across different maze transition types, maze sizes and even show success on a challenging 3D environment, where the planner is only provided with first-person RGB images.'
volume: 80
URL: https://proceedings.mlr.press/v80/lee18c.html
PDF: http://proceedings.mlr.press/v80/lee18c/lee18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lee18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lisa
family: Lee
- given: Emilio
family: Parisotto
- given: Devendra Singh
family: Chaplot
- given: Eric
family: Xing
- given: Ruslan
family: Salakhutdinov
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2947-2955
id: lee18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2947
lastpage: 2955
published: 2018-07-03 00:00:00 +0000
- title: 'Deep Asymmetric Multi-task Feature Learning'
abstract: 'We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows reliable predictors for the easy tasks to have high contribution to the feature learning while suppressing the influences of unreliable predictors for more difficult tasks. This allows the learning of less noisy representations, and enables unreliable predictors to exploit knowledge from the reliable predictors via the shared latent features. Such asymmetric knowledge transfer through shared features is also more scalable and efficient than inter-task asymmetric transfer. We validate our Deep-AMTFL model on multiple benchmark datasets for multitask learning and image classification, on which it significantly outperforms existing symmetric and asymmetric multitask learning models, by effectively preventing negative transfer in deep feature learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/lee18d.html
PDF: http://proceedings.mlr.press/v80/lee18d/lee18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lee18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hae Beom
family: Lee
- given: Eunho
family: Yang
- given: Sung Ju
family: Hwang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2956-2964
id: lee18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2956
lastpage: 2964
published: 2018-07-03 00:00:00 +0000
- title: 'Noise2Noise: Learning Image Restoration without Clean Data'
abstract: 'We apply basic statistical reasoning to signal reconstruction by machine learning - learning to map corrupted observations to clean signals - with a simple and powerful conclusion: it is possible to learn to restore images by only looking at corrupted examples, at performance at and sometimes exceeding training using clean data, without explicit image priors or likelihood models of the corruption. In practice, we show that a single model learns photographic noise removal, denoising synthetic Monte Carlo images, and reconstruction of undersampled MRI scans - all corrupted by different processes - based on noisy data only.'
volume: 80
URL: https://proceedings.mlr.press/v80/lehtinen18a.html
PDF: http://proceedings.mlr.press/v80/lehtinen18a/lehtinen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lehtinen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jaakko
family: Lehtinen
- given: Jacob
family: Munkberg
- given: Jon
family: Hasselgren
- given: Samuli
family: Laine
- given: Tero
family: Karras
- given: Miika
family: Aittala
- given: Timo
family: Aila
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2965-2974
id: lehtinen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2965
lastpage: 2974
published: 2018-07-03 00:00:00 +0000
- title: 'Out-of-sample extension of graph adjacency spectral embedding'
abstract: 'Many popular dimensionality reduction procedures have out-of-sample extensions, which allow a practitioner to apply a learned embedding to observations not seen in the initial training sample. In this work, we consider the problem of obtaining an out-of-sample extension for the adjacency spectral embedding, a procedure for embedding the vertices of a graph into Euclidean space. We present two different approaches to this problem, one based on a least-squares objective and the other based on a maximum-likelihood formulation. We show that if the graph of interest is drawn according to a certain latent position model called a random dot product graph, then both of these out-of-sample extensions estimate the true latent position of the out-of-sample vertex with the same error rate. Further, we prove a central limit theorem for the least-squares-based extension, showing that the estimate is asymptotically normal about the truth in the large-graph limit.'
volume: 80
URL: https://proceedings.mlr.press/v80/levin18a.html
PDF: http://proceedings.mlr.press/v80/levin18a/levin18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-levin18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Keith
family: Levin
- given: Fred
family: Roosta
- given: Michael
family: Mahoney
- given: Carey
family: Priebe
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2975-2984
id: levin18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2975
lastpage: 2984
published: 2018-07-03 00:00:00 +0000
- title: 'An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks'
abstract: 'Deep learning is formulated as a discrete-time optimal control problem. This allows one to characterize necessary conditions for optimality and develop training algorithms that do not rely on gradients with respect to the trainable parameters. In particular, we introduce the discrete-time method of successive approximations (MSA), which is based on the Pontryagin’s maximum principle, for training neural networks. A rigorous error estimate for the discrete MSA is obtained, which sheds light on its dynamics and the means to stabilize the algorithm. The developed methods are applied to train, in a rather principled way, neural networks with weights that are constrained to take values in a discrete set. We obtain competitive performance and interestingly, very sparse weights in the case of ternary networks, which may be useful in model deployment in low-memory devices.'
volume: 80
URL: https://proceedings.mlr.press/v80/li18b.html
PDF: http://proceedings.mlr.press/v80/li18b/li18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-li18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Qianxiao
family: Li
- given: Shuji
family: Hao
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2985-2994
id: li18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2985
lastpage: 2994
published: 2018-07-03 00:00:00 +0000
- title: 'Towards Binary-Valued Gates for Robust LSTM Training'
abstract: 'Long Short-Term Memory (LSTM) is one of the most widely used recurrent structures in sequence modeling. It aims to use gates to control information flow (e.g., whether to skip some information or not) in the recurrent computations, although its practical implementation based on soft gates only partially achieves this goal. In this paper, we propose a new way for LSTM training, which pushes the output values of the gates towards 0 or 1. By doing so, we can better control the information flow: the gates are mostly open or closed, instead of in a middle state, which makes the results more interpretable. Empirical studies show that (1) Although it seems that we restrict the model capacity, there is no performance drop: we achieve better or comparable performances due to its better generalization ability; (2) The outputs of gates are not sensitive to their inputs: we can easily compress the LSTM unit in multiple ways, e.g., low-rank approximation and low-precision approximation. The compressed models are even better than the baseline models without compression.'
volume: 80
URL: https://proceedings.mlr.press/v80/li18c.html
PDF: http://proceedings.mlr.press/v80/li18c/li18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-li18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhuohan
family: Li
- given: Di
family: He
- given: Fei
family: Tian
- given: Wei
family: Chen
- given: Tao
family: Qin
- given: Liwei
family: Wang
- given: Tieyan
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 2995-3004
id: li18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 2995
lastpage: 3004
published: 2018-07-03 00:00:00 +0000
- title: 'On the Limitations of First-Order Approximation in GAN Dynamics'
abstract: 'While Generative Adversarial Networks (GANs) have demonstrated promising performance on multiple vision tasks, their learning dynamics are not yet well understood, both in theory and in practice. To address this issue, we study GAN dynamics in a simple yet rich parametric model that exhibits several of the common problematic convergence behaviors such as vanishing gradients, mode collapse, and diverging or oscillatory behavior. In spite of the non-convex nature of our model, we are able to perform a rigorous theoretical analysis of its convergence behavior. Our analysis reveals an interesting dichotomy: a GAN with an optimal discriminator provably converges, while first order approximations of the discriminator steps lead to unstable GAN dynamics and mode collapse. Our result suggests that using first order discriminator steps (the de-facto standard in most existing GAN setups) might be one of the factors that makes GAN training challenging in practice.'
volume: 80
URL: https://proceedings.mlr.press/v80/li18d.html
PDF: http://proceedings.mlr.press/v80/li18d/li18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-li18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jerry
family: Li
- given: Aleksander
family: Madry
- given: John
family: Peebles
- given: Ludwig
family: Schmidt
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3005-3013
id: li18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3005
lastpage: 3013
published: 2018-07-03 00:00:00 +0000
- title: 'Submodular Hypergraphs: p-Laplacians, Cheeger Inequalities and Spectral Clustering'
abstract: 'We introduce submodular hypergraphs, a family of hypergraphs that have different submodular weights associated with different cuts of hyperedges. Submodular hypergraphs arise in cluster- ing applications in which higher-order structures carry relevant information. For such hypergraphs, we define the notion of p-Laplacians and derive corresponding nodal domain theorems and k-way Cheeger inequalities. We conclude with the description of algorithms for computing the spectra of 1- and 2-Laplacians that constitute the basis of new spectral hypergraph clustering methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/li18e.html
PDF: http://proceedings.mlr.press/v80/li18e/li18e.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-li18e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pan
family: Li
- given: Olgica
family: Milenkovic
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3014-3023
id: li18e
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3014
lastpage: 3023
published: 2018-07-03 00:00:00 +0000
- title: 'The Well-Tempered Lasso'
abstract: 'We study the complexity of the entire regularization path for least squares regression with 1-norm penalty, known as the Lasso. Every regression parameter in the Lasso changes linearly as a function of the regularization value. The number of changes is regarded as the Lasso’s complexity. Experimental results using exact path following exhibit polynomial complexity of the Lasso in the problem size. Alas, the path complexity of the Lasso on artificially designed regression problems is exponential We use smoothed analysis as a mechanism for bridging the gap between worst case settings and the de facto low complexity. Our analysis assumes that the observed data has a tiny amount of intrinsic noise. We then prove that the Lasso’s complexity is polynomial in the problem size.'
volume: 80
URL: https://proceedings.mlr.press/v80/li18f.html
PDF: http://proceedings.mlr.press/v80/li18f/li18f.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-li18f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuanzhi
family: Li
- given: Yoram
family: Singer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3024-3032
id: li18f
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3024
lastpage: 3032
published: 2018-07-03 00:00:00 +0000
- title: 'Estimation of Markov Chain via Rank-Constrained Likelihood'
abstract: 'This paper studies the estimation of low-rank Markov chains from empirical trajectories. We propose a non-convex estimator based on rank-constrained likelihood maximization. Statistical upper bounds are provided for the Kullback-Leiber divergence and the $\ell_2$ risk between the estimator and the true transition matrix. The estimator reveals a compressed state space of the Markov chain. We also develop a novel DC (difference of convex function) programming algorithm to tackle the rank-constrained non-smooth optimization problem. Convergence results are established. Experiments show that the proposed estimator achieves better empirical performance than other popular approaches.'
volume: 80
URL: https://proceedings.mlr.press/v80/li18g.html
PDF: http://proceedings.mlr.press/v80/li18g/li18g.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-li18g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xudong
family: Li
- given: Mengdi
family: Wang
- given: Anru
family: Zhang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3033-3042
id: li18g
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3033
lastpage: 3042
published: 2018-07-03 00:00:00 +0000
- title: 'Asynchronous Decentralized Parallel Stochastic Gradient Descent'
abstract: 'Most commonly used distributed machine learning systems are either synchronous or centralized asynchronous. Synchronous algorithms like AllReduce-SGD perform poorly in a heterogeneous environment, while asynchronous algorithms using a parameter server suffer from 1) communication bottleneck at parameter servers when workers are many, and 2) significantly worse convergence when the traffic to parameter server is congested. Can we design an algorithm that is robust in a heterogeneous environment, while being communication efficient and maintaining the best-possible convergence rate? In this paper, we propose an asynchronous decentralized stochastic gradient decent algorithm (AD-PSGD) satisfying all above expectations. Our theoretical analysis shows AD-PSGD converges at the optimal $O(1/\sqrt{K})$ rate as SGD and has linear speedup w.r.t. number of workers. Empirically, AD-PSGD outperforms the best of decentralized parallel SGD (D-PSGD), asynchronous parallel SGD (A-PSGD), and standard data parallel SGD (AllReduce-SGD), often by orders of magnitude in a heterogeneous environment. When training ResNet-50 on ImageNet with up to 128 GPUs, AD-PSGD converges (w.r.t epochs) similarly to the AllReduce-SGD, but each epoch can be up to 4-8x faster than its synchronous counterparts in a network-sharing HPC environment. To the best of our knowledge, AD-PSGD is the first asynchronous algorithm that achieves a similar epoch-wise convergence rate as AllReduce-SGD, at an over 100-GPU scale.'
volume: 80
URL: https://proceedings.mlr.press/v80/lian18a.html
PDF: http://proceedings.mlr.press/v80/lian18a/lian18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lian18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xiangru
family: Lian
- given: Wei
family: Zhang
- given: Ce
family: Zhang
- given: Ji
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3043-3052
id: lian18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3043
lastpage: 3052
published: 2018-07-03 00:00:00 +0000
- title: 'RLlib: Abstractions for Distributed Reinforcement Learning'
abstract: 'Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available as part of the open source Ray project at http://rllib.io/.'
volume: 80
URL: https://proceedings.mlr.press/v80/liang18b.html
PDF: http://proceedings.mlr.press/v80/liang18b/liang18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liang18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eric
family: Liang
- given: Richard
family: Liaw
- given: Robert
family: Nishihara
- given: Philipp
family: Moritz
- given: Roy
family: Fox
- given: Ken
family: Goldberg
- given: Joseph
family: Gonzalez
- given: Michael
family: Jordan
- given: Ion
family: Stoica
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3053-3062
id: liang18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3053
lastpage: 3062
published: 2018-07-03 00:00:00 +0000
- title: 'On the Spectrum of Random Features Maps of High Dimensional Data'
abstract: 'Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators. In this paper we leverage the "concentration" phenomenon induced by random matrix theory to perform a spectral analysis on the Gram matrix of these random feature maps, here for Gaussian mixture models of simultaneously large dimension and size. Our results are instrumental to a deeper understanding on the interplay of the nonlinearity and the statistics of the data, thereby allowing for a better tuning of random feature-based techniques.'
volume: 80
URL: https://proceedings.mlr.press/v80/liao18a.html
PDF: http://proceedings.mlr.press/v80/liao18a/liao18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liao18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhenyu
family: Liao
- given: Romain
family: Couillet
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3063-3071
id: liao18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3063
lastpage: 3071
published: 2018-07-03 00:00:00 +0000
- title: 'The Dynamics of Learning: A Random Matrix Approach'
abstract: 'Understanding the learning dynamics of neural networks is one of the key issues for the improvement of optimization algorithms as well as for the theoretical comprehension of why deep neural nets work so well today. In this paper, we introduce a random matrix-based framework to analyze the learning dynamics of a single-layer linear network on a binary classification problem, for data of simultaneously large dimension and size, trained by gradient descent. Our results provide rich insights into common questions in neural nets, such as overfitting, early stopping and the initialization of training, thereby opening the door for future studies of more elaborate structures and models appearing in today’s neural networks.'
volume: 80
URL: https://proceedings.mlr.press/v80/liao18b.html
PDF: http://proceedings.mlr.press/v80/liao18b/liao18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liao18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhenyu
family: Liao
- given: Romain
family: Couillet
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3072-3081
id: liao18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3072
lastpage: 3081
published: 2018-07-03 00:00:00 +0000
- title: 'Reviving and Improving Recurrent Back-Propagation'
abstract: 'In this paper, we revisit the recurrent back-propagation (RBP) algorithm, discuss the conditions under which it applies as well as how to satisfy them in deep neural networks. We show that RBP can be unstable and propose two variants based on conjugate gradient on the normal equations (CG-RBP) and Neumann series (Neumann-RBP). We further investigate the relationship between Neumann-RBP and back propagation through time (BPTT) and its truncated version (TBPTT). Our Neumann-RBP has the same time complexity as TBPTT but only requires constant memory, whereas TBPTT’s memory cost scales linearly with the number of truncation steps. We examine all RBP variants along with BPTT and TBPTT in three different application domains: associative memory with continuous Hopfield networks, document classification in citation networks using graph neural networks and hyperparameter optimization for fully connected networks. All experiments demonstrate that RBPs, especially the Neumann-RBP variant, are efficient and effective for optimizing convergent recurrent neural networks.'
volume: 80
URL: https://proceedings.mlr.press/v80/liao18c.html
PDF: http://proceedings.mlr.press/v80/liao18c/liao18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liao18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Renjie
family: Liao
- given: Yuwen
family: Xiong
- given: Ethan
family: Fetaya
- given: Lisa
family: Zhang
- given: KiJung
family: Yoon
- given: Xaq
family: Pitkow
- given: Raquel
family: Urtasun
- given: Richard
family: Zemel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3082-3091
id: liao18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3082
lastpage: 3091
published: 2018-07-03 00:00:00 +0000
- title: 'Optimal Distributed Learning with Multi-pass Stochastic Gradient Methods'
abstract: 'We study generalization properties of distributed algorithms in the setting of nonparametric regression over a reproducing kernel Hilbert space (RKHS). We investigate distributed stochastic gradient methods (SGM), with mini-batches and multi-passes over the data. We show that optimal generalization error bounds can be retained for distributed SGM provided that the partition level is not too large. Our results are superior to the state-of-the-art theory, covering the cases that the regression function may not be in the hypothesis spaces. Particularly, our results show that distributed SGM has a smaller theoretical computational complexity, compared with distributed kernel ridge regression (KRR) and classic SGM.'
volume: 80
URL: https://proceedings.mlr.press/v80/lin18a.html
PDF: http://proceedings.mlr.press/v80/lin18a/lin18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lin18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Junhong
family: Lin
- given: Volkan
family: Cevher
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3092-3101
id: lin18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3092
lastpage: 3101
published: 2018-07-03 00:00:00 +0000
- title: 'Optimal Rates of Sketched-regularized Algorithms for Least-Squares Regression over Hilbert Spaces'
abstract: 'We investigate regularized algorithms combining with projection for least-squares regression problem over a Hilbert space, covering nonparametric regression over a reproducing kernel Hilbert space. We prove convergence results with respect to variants of norms, under a capacity assumption on the hypothesis space and a regularity condition on the target function. As a result, we obtain optimal rates for regularized algorithms with randomized sketches, provided that the sketch dimension is proportional to the effective dimension up to a logarithmic factor. As a byproduct, we obtain similar results for Nyström regularized algorithms. Our results provide optimal, distribution-dependent rates for sketched/Nyström regularized algorithms, considering both the attainable and non-attainable cases.'
volume: 80
URL: https://proceedings.mlr.press/v80/lin18b.html
PDF: http://proceedings.mlr.press/v80/lin18b/lin18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lin18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Junhong
family: Lin
- given: Volkan
family: Cevher
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3102-3111
id: lin18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3102
lastpage: 3111
published: 2018-07-03 00:00:00 +0000
- title: 'Level-Set Methods for Finite-Sum Constrained Convex Optimization'
abstract: 'We consider the constrained optimization where the objective function and the constraints are defined as summation of finitely many loss functions. This model has applications in machine learning such as Neyman-Pearson classification. We consider two level-set methods to solve this class of problems, an existing inexact Newton method and a new feasible level-set method. To update the level parameter towards the optimality, both methods require an oracle that generates upper and lower bounds as well as an affine-minorant of the level function. To construct the desired oracle, we reformulate the level function as the value of a saddle-point problem using the conjugate and perspective of the loss functions. Then a stochastic variance-reduced gradient method with a special Bregman divergence is proposed as the oracle for solving that saddle-point problem. The special divergence ensures the proximal mapping in each iteration can be solved in a closed form. The total complexity of both level-set methods using the proposed oracle are analyzed.'
volume: 80
URL: https://proceedings.mlr.press/v80/lin18c.html
PDF: http://proceedings.mlr.press/v80/lin18c/lin18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lin18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Qihang
family: Lin
- given: Runchao
family: Ma
- given: Tianbao
family: Yang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3112-3121
id: lin18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3112
lastpage: 3121
published: 2018-07-03 00:00:00 +0000
- title: 'Detecting and Correcting for Label Shift with Black Box Predictors'
abstract: 'Faced with distribution shift between training and test set, we wish to detect and quantify the shift, and to correct our classifiers without test set labels. Motivated by medical diagnosis, where diseases (targets), cause symptoms (observations), we focus on label shift, where the label marginal p(y) changes but the conditional p(x| y) does not. We propose Black Box Shift Estimation (BBSE) to estimate the test distribution p(y). BBSE exploits arbitrary black box predictors to reduce dimensionality prior to shift correction. While better predictors give tighter estimates, BBSE works even when predictors are biased, inaccurate, or uncalibrated, so long as their confusion matrices are invertible. We prove BBSE’s consistency, bound its error, and introduce a statistical test that uses BBSE to detect shift. We also leverage BBSE to correct classifiers. Experiments demonstrate accurate estimates and improved prediction, even on high-dimensional datasets of natural images.'
volume: 80
URL: https://proceedings.mlr.press/v80/lipton18a.html
PDF: http://proceedings.mlr.press/v80/lipton18a/lipton18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lipton18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zachary
family: Lipton
- given: Yu-Xiang
family: Wang
- given: Alexander
family: Smola
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3122-3130
id: lipton18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3122
lastpage: 3130
published: 2018-07-03 00:00:00 +0000
- title: 'Generalized Robust Bayesian Committee Machine for Large-scale Gaussian Process Regression'
abstract: 'In order to scale standard Gaussian process (GP) regression to large-scale datasets, aggregation models employ factorized training process and then combine predictions from distributed experts. The state-of-the-art aggregation models, however, either provide inconsistent predictions or require time-consuming aggregation process. We first prove the inconsistency of typical aggregations using disjoint or random data partition, and then present a consistent yet efficient aggregation model for large-scale GP. The proposed model inherits the advantages of aggregations, e.g., closed-form inference and aggregation, parallelization and distributed computing. Furthermore, theoretical and empirical analyses reveal that the new aggregation model performs better due to the consistent predictions that converge to the true underlying function when the training size approaches infinity.'
volume: 80
URL: https://proceedings.mlr.press/v80/liu18a.html
PDF: http://proceedings.mlr.press/v80/liu18a/liu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Haitao
family: Liu
- given: Jianfei
family: Cai
- given: Yi
family: Wang
- given: Yew Soon
family: Ong
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3131-3140
id: liu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3131
lastpage: 3140
published: 2018-07-03 00:00:00 +0000
- title: 'Towards Black-box Iterative Machine Teaching'
abstract: 'In this paper, we make an important step towards the black-box machine teaching by considering the cross-space machine teaching, where the teacher and the learner use different feature representations and the teacher can not fully observe the learner’s model. In such scenario, we study how the teacher is still able to teach the learner to achieve faster convergence rate than the traditional passive learning. We propose an active teacher model that can actively query the learner (i.e., make the learner take exams) for estimating the learner’s status and provably guide the learner to achieve faster convergence. The sample complexities for both teaching and query are provided. In the experiments, we compare the proposed active teacher with the omniscient teacher and verify the effectiveness of the active teacher model.'
volume: 80
URL: https://proceedings.mlr.press/v80/liu18b.html
PDF: http://proceedings.mlr.press/v80/liu18b/liu18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liu18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Weiyang
family: Liu
- given: Bo
family: Dai
- given: Xingguo
family: Li
- given: Zhen
family: Liu
- given: James
family: Rehg
- given: Le
family: Song
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3141-3149
id: liu18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3141
lastpage: 3149
published: 2018-07-03 00:00:00 +0000
- title: 'Delayed Impact of Fair Machine Learning'
abstract: 'Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect. We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not. We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably. Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.'
volume: 80
URL: https://proceedings.mlr.press/v80/liu18c.html
PDF: http://proceedings.mlr.press/v80/liu18c/liu18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liu18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lydia T.
family: Liu
- given: Sarah
family: Dean
- given: Esther
family: Rolf
- given: Max
family: Simchowitz
- given: Moritz
family: Hardt
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3150-3158
id: liu18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3150
lastpage: 3158
published: 2018-07-03 00:00:00 +0000
- title: 'A Two-Step Computation of the Exact GAN Wasserstein Distance'
abstract: 'In this paper, we propose a two-step method to compute the Wasserstein distance in Wasserstein Generative Adversarial Networks (WGANs): 1) The convex part of our objective can be solved by linear programming; 2) The non-convex residual can be approximated by a deep neural network. We theoretically prove that the proposed formulation is equivalent to the discrete Monge-Kantorovich dual formulation. Furthermore, we give the approximation error bound of the Wasserstein distance and the error bound of generalizing the Wasserstein distance from discrete to continuous distributions. Our approach optimizes the exact Wasserstein distance, obviating the need for weight clipping previously used in WGANs. Results on synthetic data show that the our method computes the Wasserstein distance more accurately. Qualitative and quantitative results on MNIST, LSUN and CIFAR-10 datasets show that the proposed method is more efficient than state-of-the-art WGAN methods, and still produces images of comparable quality.'
volume: 80
URL: https://proceedings.mlr.press/v80/liu18d.html
PDF: http://proceedings.mlr.press/v80/liu18d/liu18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liu18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Huidong
family: Liu
- given: Xianfeng
family: GU
- given: Dimitris
family: Samaras
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3159-3168
id: liu18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3159
lastpage: 3168
published: 2018-07-03 00:00:00 +0000
- title: 'Open Category Detection with PAC Guarantees'
abstract: 'Open category detection is the problem of detecting "alien" test instances that belong to categories or classes that were not present in the training data. In many applications, reliably detecting such aliens is central to ensuring the safety and accuracy of test set predictions. Unfortunately, there are no algorithms that provide theoretical guarantees on their ability to detect aliens under general assumptions. Further, while there are algorithms for open category detection, there are few empirical results that directly report alien detection rates. Thus, there are significant theoretical and empirical gaps in our understanding of open category detection. In this paper, we take a step toward addressing this gap by studying a simple, but practically-relevant variant of open category detection. In our setting, we are provided with a "clean" training set that contains only the target categories of interest and an unlabeled "contaminated” training set that contains a fraction alpha of alien examples. Under the assumption that we know an upper bound on alpha we develop an algorithm with PAC-style guarantees on the alien detection rate, while aiming to minimize false alarms. Empirical results on synthetic and standard benchmark datasets demonstrate the regimes in which the algorithm can be effective and provide a baseline for further advancements.'
volume: 80
URL: https://proceedings.mlr.press/v80/liu18e.html
PDF: http://proceedings.mlr.press/v80/liu18e/liu18e.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liu18e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Si
family: Liu
- given: Risheek
family: Garrepalli
- given: Thomas
family: Dietterich
- given: Alan
family: Fern
- given: Dan
family: Hendrycks
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3169-3178
id: liu18e
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3169
lastpage: 3178
published: 2018-07-03 00:00:00 +0000
- title: 'Fast Variance Reduction Method with Stochastic Batch Size'
abstract: 'In this paper we study a family of variance reduction methods with randomized batch size—at each step, the algorithm first randomly chooses the batch size and then selects a batch of samples to conduct a variance-reduced stochastic update. We give the linear converge rate for this framework for composite functions, and show that the optimal strategy to achieve the best converge rate per data access is to always choose batch size equalling to 1, which is equivalent to the SAGA algorithm. However, due to the presence of cache/disk IO effect in computer architecture, number of data access cannot reflect the running time because of 1) random memory access is much slower than sequential access, 2) when data is too big to fit into memory, disk seeking takes even longer time. After taking these into account, choosing batch size equals to 1 is no longer optimal, so we propose a new algorithm called SAGA++ and theoretically show how to calculate the optimal average batch size. Our algorithm outperforms SAGA and other existing batch and stochastic solvers on real datasets. In addition, we also conduct a precise analysis to compare different update rules for variance reduction methods, showing that SAGA++ converges faster than SVRG in theory.'
volume: 80
URL: https://proceedings.mlr.press/v80/liu18f.html
PDF: http://proceedings.mlr.press/v80/liu18f/liu18f.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liu18f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xuanqing
family: Liu
- given: Cho-Jui
family: Hsieh
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3179-3188
id: liu18f
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3179
lastpage: 3188
published: 2018-07-03 00:00:00 +0000
- title: 'Fast Stochastic AUC Maximization with $O(1/n)$-Convergence Rate'
abstract: 'In this paper, we consider statistical learning with AUC (area under ROC curve) maximization in the classical stochastic setting where one random data drawn from an unknown distribution is revealed at each iteration for updating the model. Although consistent convex surrogate losses for AUC maximization have been proposed to make the problem tractable, it remains an challenging problem to design fast optimization algorithms in the classical stochastic setting due to that the convex surrogate loss depends on random pairs of examples from positive and negative classes. Building on a saddle point formulation for a consistent square loss, this paper proposes a novel stochastic algorithm to improve the standard $O(1/\sqrt{n})$ convergence rate to $\widetilde O(1/n)$ convergence rate without strong convexity assumption or any favorable statistical assumptions (e.g., low noise), where $n$ is the number of random samples. To the best of our knowledge, this is the first stochastic algorithm for AUC maximization with a statistical convergence rate as fast as $O(1/n)$ up to a logarithmic factor. Extensive experiments on eight large-scale benchmark data sets demonstrate the superior performance of the proposed algorithm comparing with existing stochastic or online algorithms for AUC maximization.'
volume: 80
URL: https://proceedings.mlr.press/v80/liu18g.html
PDF: http://proceedings.mlr.press/v80/liu18g/liu18g.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-liu18g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mingrui
family: Liu
- given: Xiaoxuan
family: Zhang
- given: Zaiyi
family: Chen
- given: Xiaoyu
family: Wang
- given: Tianbao
family: Yang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3189-3197
id: liu18g
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3189
lastpage: 3197
published: 2018-07-03 00:00:00 +0000
- title: 'On Matching Pursuit and Coordinate Descent'
abstract: 'Two popular examples of first-order optimization methods over linear spaces are coordinate descent and matching pursuit algorithms, with their randomized variants. While the former targets the optimization by moving along coordinates, the latter considers a generalized notion of directions. Exploiting the connection between the two algorithms, we present a unified analysis of both, providing affine invariant sublinear $O(1/t)$ rates on smooth objectives and linear convergence on strongly convex objectives. As a byproduct of our affine invariant analysis of matching pursuit, our rates for steepest coordinate descent are the tightest known. Furthermore, we show the first accelerated convergence rate $O(1/t^2)$ for matching pursuit and steepest coordinate descent on convex objectives.'
volume: 80
URL: https://proceedings.mlr.press/v80/locatello18a.html
PDF: http://proceedings.mlr.press/v80/locatello18a/locatello18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-locatello18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Francesco
family: Locatello
- given: Anant
family: Raj
- given: Sai Praneeth
family: Karimireddy
- given: Gunnar
family: Raetsch
- given: Bernhard
family: Schölkopf
- given: Sebastian
family: Stich
- given: Martin
family: Jaggi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3198-3207
id: locatello18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3198
lastpage: 3207
published: 2018-07-03 00:00:00 +0000
- title: 'PDE-Net: Learning PDEs from Data'
abstract: 'Partial differential equations (PDEs) play a prominent role in many disciplines of science and engineering. PDEs are commonly derived based on empirical observations. However, with the rapid development of sensors, computational power, and data storage in the past decade, huge quantities of data can be easily collected and efficiently stored. Such vast quantity of data offers new opportunities for data-driven discovery of physical laws. Inspired by the latest development of neural network designs in deep learning, we propose a new feed-forward deep network, called PDE-Net, to fulfill two objectives at the same time: to accurately predict dynamics of complex systems and to uncover the underlying hidden PDE models. Comparing with existing approaches, our approach has the most flexibility by learning both differential operators and the nonlinear response function of the underlying PDE model. A special feature of the proposed PDE-Net is that all filters are properly constrained, which enables us to easily identify the governing PDE models while still maintaining the expressive and predictive power of the network. These constrains are carefully designed by fully exploiting the relation between the orders of differential operators and the orders of sum rules of filters (an important concept originated from wavelet theory). Numerical experiments show that the PDE-Net has the potential to uncover the hidden PDE of the observed dynamics, and predict the dynamical behavior for a relatively long time, even in a noisy environment.'
volume: 80
URL: https://proceedings.mlr.press/v80/long18a.html
PDF: http://proceedings.mlr.press/v80/long18a/long18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-long18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zichao
family: Long
- given: Yiping
family: Lu
- given: Xianzhong
family: Ma
- given: Bin
family: Dong
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3208-3216
id: long18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3208
lastpage: 3216
published: 2018-07-03 00:00:00 +0000
- title: 'Error Estimation for Randomized Least-Squares Algorithms via the Bootstrap'
abstract: 'Over the course of the past decade, a variety of randomized algorithms have been proposed for computing approximate least-squares (LS) solutions in large-scale settings. A longstanding practical issue is that, for any given input, the user rarely knows the actual error of an approximate solution (relative to the exact solution). Likewise, it is difficult for the user to know precisely how much computation is needed to achieve the desired error tolerance. Consequently, the user often appeals to worst-case error bounds that tend to offer only qualitative guidance. As a more practical alternative, we propose a bootstrap method to compute a posteriori error estimates for randomized LS algorithms. These estimates permit the user to numerically assess the error of a given solution, and to predict how much work is needed to improve a "preliminary" solution. In addition, we provide theoretical consistency results for the method, which are the first such results in this context (to the best of our knowledge). From a practical standpoint, the method also has considerable flexibility, insofar as it can be applied to several popular sketching algorithms, as well as a variety of error metrics. Moreover, the extra step of error estimation does not add much cost to an underlying sketching algorithm. Finally, we demonstrate the effectiveness of the method with empirical results.'
volume: 80
URL: https://proceedings.mlr.press/v80/lopes18a.html
PDF: http://proceedings.mlr.press/v80/lopes18a/lopes18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lopes18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Miles
family: Lopes
- given: Shusen
family: Wang
- given: Michael
family: Mahoney
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3217-3226
id: lopes18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3217
lastpage: 3226
published: 2018-07-03 00:00:00 +0000
- title: 'Constraining the Dynamics of Deep Probabilistic Models'
abstract: 'We introduce a novel generative formulation of deep probabilistic models implementing "soft" constraints on their function dynamics. In particular, we develop a flexible methodological framework where the modeled functions and derivatives of a given order are subject to inequality or equality constraints. We then characterize the posterior distribution over model and constraint parameters through stochastic variational inference. As a result, the proposed approach allows for accurate and scalable uncertainty quantification on the predictions and on all parameters. We demonstrate the application of equality constraints in the challenging problem of parameter inference in ordinary differential equation models, while we showcase the application of inequality constraints on the problem of monotonic regression of count data. The proposed approach is extensively tested in several experimental settings, leading to highly competitive results in challenging modeling applications, while offering high expressiveness, flexibility and scalability.'
volume: 80
URL: https://proceedings.mlr.press/v80/lorenzi18a.html
PDF: http://proceedings.mlr.press/v80/lorenzi18a/lorenzi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lorenzi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marco
family: Lorenzi
- given: Maurizio
family: Filippone
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3227-3236
id: lorenzi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3227
lastpage: 3236
published: 2018-07-03 00:00:00 +0000
- title: 'Spectrally Approximating Large Graphs with Smaller Graphs'
abstract: 'How does coarsening affect the spectrum of a general graph? We provide conditions such that the principal eigenvalues and eigenspaces of a coarsened and original graph Laplacian matrices are close. The achieved approximation is shown to depend on standard graph-theoretic properties, such as the degree and eigenvalue distributions, as well as on the ratio between the coarsened and actual graph sizes. Our results carry implications for learning methods that utilize coarsening. For the particular case of spectral clustering, they imply that coarse eigenvectors can be used to derive good quality assignments even without refinement{—}this phenomenon was previously observed, but lacked formal justification.'
volume: 80
URL: https://proceedings.mlr.press/v80/loukas18a.html
PDF: http://proceedings.mlr.press/v80/loukas18a/loukas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-loukas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andreas
family: Loukas
- given: Pierre
family: Vandergheynst
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3237-3246
id: loukas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3237
lastpage: 3246
published: 2018-07-03 00:00:00 +0000
- title: 'The Edge Density Barrier: Computational-Statistical Tradeoffs in Combinatorial Inference'
abstract: 'We study the hypothesis testing problem of inferring the existence of combinatorial structures in undirected graphical models. Although there exist extensive studies on the information-theoretic limits of this problem, it remains largely unexplored whether such limits can be attained by efficient algorithms. In this paper, we quantify the minimum computational complexity required to attain the information-theoretic limits based on an oracle computational model. We prove that, for testing common combinatorial structures, such as clique, nearest neighbor graph and perfect matching, against an empty graph, or large clique against small clique, the information-theoretic limits are provably unachievable by tractable algorithms in general. More importantly, we define structural quantities called the weak and strong edge densities, which offer deep insight into the existence of such computational-statistical tradeoffs. To the best of our knowledge, our characterization is the first to identify and explain the fundamental tradeoffs between statistics and computation for combinatorial inference problems in undirected graphical models.'
volume: 80
URL: https://proceedings.mlr.press/v80/lu18a.html
PDF: http://proceedings.mlr.press/v80/lu18a/lu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hao
family: Lu
- given: Yuan
family: Cao
- given: Zhuoran
family: Yang
- given: Junwei
family: Lu
- given: Han
family: Liu
- given: Zhaoran
family: Wang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3247-3256
id: lu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3247
lastpage: 3256
published: 2018-07-03 00:00:00 +0000
- title: 'Accelerating Greedy Coordinate Descent Methods'
abstract: 'We introduce and study two algorithms to accelerate greedy coordinate descent in theory and in practice: Accelerated Semi-Greedy Coordinate Descent (ASCD) and Accelerated Greedy Coordinate Descent (AGCD). On the theory side, our main results are for ASCD: we show that ASCD achieves $O(1/k^2)$ convergence, and it also achieves accelerated linear convergence for strongly convex functions. On the empirical side, while both AGCD and ASCD outperform Accelerated Randomized Coordinate Descent on most instances in our numerical experiments, we note that AGCD significantly outperforms the other two methods in our experiments, in spite of a lack of theoretical guarantees for this method. To complement this empirical finding for AGCD, we present an explanation why standard proof techniques for acceleration cannot work for AGCD, and we further introduce a technical condition under which AGCD is guaranteed to have accelerated convergence. Finally, we confirm that this technical condition holds in our numerical experiments.'
volume: 80
URL: https://proceedings.mlr.press/v80/lu18b.html
PDF: http://proceedings.mlr.press/v80/lu18b/lu18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lu18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Haihao
family: Lu
- given: Robert
family: Freund
- given: Vahab
family: Mirrokni
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3257-3266
id: lu18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3257
lastpage: 3266
published: 2018-07-03 00:00:00 +0000
- title: 'Structured Variationally Auto-encoded Optimization'
abstract: 'We tackle the problem of optimizing a black-box objective function defined over a highly-structured input space. This problem is ubiquitous in science and engineering. In machine learning, inferring the structure of a neural network or the Automatic Statistician (AS), where the optimal kernel combination for a Gaussian process is selected, are two important examples. We use the \as as a case study to describe our approach, that can be easily generalized to other domains. We propose an Structure Generating Variational Auto-encoder (SG-VAE) to embed the original space of kernel combinations into some low-dimensional continuous manifold where Bayesian optimization (BO) ideas are used. This is possible when structural knowledge of the problem is available, which can be given via a simulator or any other form of generating potentially good solutions. The right exploration-exploitation balance is imposed by propagating into the search the uncertainty of the latent space of the SG-VAE, that is computed using variational inference. The key aspect of our approach is that the SG-VAE can be used to bias the search towards relevant regions, making it suitable for transfer learning tasks. Several experiments in various application domains are used to illustrate the utility and generality of the approach described in this work.'
volume: 80
URL: https://proceedings.mlr.press/v80/lu18c.html
PDF: http://proceedings.mlr.press/v80/lu18c/lu18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lu18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xiaoyu
family: Lu
- given: Javier
family: Gonzalez
- given: Zhenwen
family: Dai
- given: Neil D.
family: Lawrence
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3267-3275
id: lu18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3267
lastpage: 3275
published: 2018-07-03 00:00:00 +0000
- title: 'Beyond Finite Layer Neural Networks: Bridging Deep Architectures and Numerical Differential Equations'
abstract: 'Deep neural networks have become the state-of-the-art models in numerous machine learning tasks. However, general guidance to network architecture design is still missing. In our work, we bridge deep neural network design with numerical differential equations. We show that many effective networks, such as ResNet, PolyNet, FractalNet and RevNet, can be interpreted as different numerical discretizations of differential equations. This finding brings us a brand new perspective on the design of effective deep architectures. We can take advantage of the rich knowledge in numerical analysis to guide us in designing new and potentially more effective deep networks. As an example, we propose a linear multi-step architecture (LM-architecture) which is inspired by the linear multi-step method solving ordinary differential equations. The LM-architecture is an effective structure that can be used on any ResNet-like networks. In particular, we demonstrate that LM-ResNet and LM-ResNeXt (i.e. the networks obtained by applying the LM-architecture on ResNet and ResNeXt respectively) can achieve noticeably higher accuracy than ResNet and ResNeXt on both CIFAR and ImageNet with comparable numbers of trainable parameters. In particular, on both CIFAR and ImageNet, LM-ResNet/LM-ResNeXt can significantly compress (>50%) the original networks while maintaining a similar performance. This can be explained mathematically using the concept of modified equation from numerical analysis. Last but not least, we also establish a connection between stochastic control and noise injection in the training process which helps to improve generalization of the networks. Furthermore, by relating stochastic training strategy with stochastic dynamic system, we can easily apply stochastic training to the networks with the LM-architecture. As an example, we introduced stochastic depth to LM-ResNet and achieve significant improvement over the original LM-ResNet on CIFAR10.'
volume: 80
URL: https://proceedings.mlr.press/v80/lu18d.html
PDF: http://proceedings.mlr.press/v80/lu18d/lu18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lu18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yiping
family: Lu
- given: Aoxiao
family: Zhong
- given: Quanzheng
family: Li
- given: Bin
family: Dong
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3276-3285
id: lu18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3276
lastpage: 3285
published: 2018-07-03 00:00:00 +0000
- title: 'End-to-end Active Object Tracking via Reinforcement Learning'
abstract: 'We study active object tracking, where a tracker takes as input the visual observation (i.e. frame sequence) and produces the camera control signal (e.g., move forward, turn left, etc). Conventional methods tackle the tracking and the camera control separately, which is challenging to tune jointly. It also incurs many human efforts for labeling and many expensive trial-and-errors in real-world. To address these issues, we propose, in this paper, an end-to-end solution via deep reinforcement learning, where a ConvNet-LSTM function approximator is adopted for the direct frame-to-action prediction. We further propose an environment augmentation technique and a customized reward function, which are crucial for a successful training. The tracker trained in simulators (ViZDoom, Unreal Engine) shows good generalization in the case of unseen object moving path, unseen object appearance, unseen background, and distracting object. It can restore tracking when occasionally losing the target. With the experiments over the VOT dataset, we also find that the tracking ability, obtained solely from simulators, can potentially transfer to real-world scenarios.'
volume: 80
URL: https://proceedings.mlr.press/v80/luo18a.html
PDF: http://proceedings.mlr.press/v80/luo18a/luo18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-luo18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wenhan
family: Luo
- given: Peng
family: Sun
- given: Fangwei
family: Zhong
- given: Wei
family: Liu
- given: Tong
family: Zhang
- given: Yizhou
family: Wang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3286-3295
id: luo18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3286
lastpage: 3295
published: 2018-07-03 00:00:00 +0000
- title: 'Competitive Caching with Machine Learned Advice'
abstract: 'We develop a framework for augmenting online algorithms with a machine learned oracle to achieve competitive ratios that provably improve upon unconditional worst case lower bounds when the oracle has low error. Our approach treats the oracle as a complete black box, and is not dependent on its inner workings, or the exact distribution of its errors. We apply this framework to the traditional caching problem {—} creating an eviction strategy for a cache of size k. We demonstrate that naively following the oracle’s recommendations may lead to very poor performance, even when the average error is quite low. Instead we show how to modify the Marker algorithm to take into account the oracle’s predictions, and prove that this combined approach achieves a competitive ratio that both (i) decreases as the oracle’s error decreases, and (ii) is always capped by O(log k), which can be achieved without any oracle input. We complement our results with an empirical evaluation of our algorithm on real world datasets, and show that it performs well empirically even using simple off the shelf predictions.'
volume: 80
URL: https://proceedings.mlr.press/v80/lykouris18a.html
PDF: http://proceedings.mlr.press/v80/lykouris18a/lykouris18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lykouris18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thodoris
family: Lykouris
- given: Sergei
family: Vassilvtiskii
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3296-3305
id: lykouris18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3296
lastpage: 3305
published: 2018-07-03 00:00:00 +0000
- title: 'Batch Bayesian Optimization via Multi-objective Acquisition Ensemble for Automated Analog Circuit Design'
abstract: 'Bayesian optimization methods are promising for the optimization of black-box functions that are expensive to evaluate. In this paper, a novel batch Bayesian optimization approach is proposed. The parallelization is realized via a multi-objective ensemble of multiple acquisition functions. In each iteration, the multi-objective optimization of the multiple acquisition functions is performed to search for the Pareto front of the acquisition functions. The batch of inputs are then selected from the Pareto front. The Pareto front represents the best trade-off between the multiple acquisition functions. Such a policy for batch Bayesian optimization can significantly improve the efficiency of optimization. The proposed method is compared with several state-of-the-art batch Bayesian optimization algorithms using analytical benchmark functions and real-world analog integrated circuits. The experimental results show that the proposed method is competitive compared with the state-of-the-art algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/lyu18a.html
PDF: http://proceedings.mlr.press/v80/lyu18a/lyu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-lyu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wenlong
family: Lyu
- given: Fan
family: Yang
- given: Changhao
family: Yan
- given: Dian
family: Zhou
- given: Xuan
family: Zeng
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3306-3314
id: lyu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3306
lastpage: 3314
published: 2018-07-03 00:00:00 +0000
- title: 'Celer: a Fast Solver for the Lasso with Dual Extrapolation'
abstract: 'Convex sparsity-inducing regularizations are ubiquitous in high-dimensional machine learning, but solving the resulting optimization problems can be slow. To accelerate solvers, state-of-the-art approaches consist in reducing the size of the optimization problem at hand. In the context of regression, this can be achieved either by discarding irrelevant features (screening techniques) or by prioritizing features likely to be included in the support of the solution (working set techniques). Duality comes into play at several steps in these techniques. Here, we propose an extrapolation technique starting from a sequence of iterates in the dual that leads to the construction of improved dual points. This enables a tighter control of optimality as used in stopping criterion, as well as better screening performance of Gap Safe rules. Finally, we propose a working set strategy based on an aggressive use of Gap Safe screening rules. Thanks to our new dual point construction, we show significant computational speedups on multiple real-world problems.'
volume: 80
URL: https://proceedings.mlr.press/v80/massias18a.html
PDF: http://proceedings.mlr.press/v80/massias18a/massias18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-massias18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mathurin
family: MASSIAS
- given: Alexandre
family: Gramfort
- given: Joseph
family: Salmon
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3315-3324
id: massias18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3315
lastpage: 3324
published: 2018-07-03 00:00:00 +0000
- title: 'The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning'
abstract: 'In this paper we aim to formally explain the phenomenon of fast convergence of Stochastic Gradient Descent (SGD) observed in modern machine learning. The key observation is that most modern learning architectures are over-parametrized and are trained to interpolate the data by driving the empirical loss (classification and regression) close to zero. While it is still unclear why these interpolated solutions perform well on test data, we show that these regimes allow for fast convergence of SGD, comparable in number of iterations to full gradient descent. For convex loss functions we obtain an exponential convergence bound for *mini-batch* SGD parallel to that for full gradient descent. We show that there is a critical batch size $m^*$ such that: (a) SGD iteration with mini-batch size $m\leq m^*$ is nearly equivalent to $m$ iterations of mini-batch size $1$ (*linear scaling regime*). (b) SGD iteration with mini-batch $m> m^*$ is nearly equivalent to a full gradient descent iteration (*saturation regime*). Moreover, for the quadratic loss, we derive explicit expressions for the optimal mini-batch and step size and explicitly characterize the two regimes above. The critical mini-batch size can be viewed as the limit for effective mini-batch parallelization. It is also nearly independent of the data size, implying $O(n)$ acceleration over GD per unit of computation. We give experimental evidence on real data which closely follows our theoretical analyses. Finally, we show how our results fit in the recent developments in training deep neural networks and discuss connections to adaptive rates for SGD and variance reduction.'
volume: 80
URL: https://proceedings.mlr.press/v80/ma18a.html
PDF: http://proceedings.mlr.press/v80/ma18a/ma18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ma18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siyuan
family: Ma
- given: Raef
family: Bassily
- given: Mikhail
family: Belkin
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3325-3334
id: ma18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3325
lastpage: 3334
published: 2018-07-03 00:00:00 +0000
- title: 'Gradient Descent for Sparse Rank-One Matrix Completion for Crowd-Sourced Aggregation of Sparsely Interacting Workers'
abstract: 'We consider worker skill estimation for the single coin Dawid-Skene crowdsourcing model. In practice skill-estimation is challenging because worker assignments are sparse and irregular due to the arbitrary, and uncontrolled availability of workers. We formulate skill estimation as a rank-one correlation-matrix completion problem, where the observed components correspond to observed label correlation between workers. We show that the correlation matrix can be successfully recovered and skills identifiable if and only if the sampling matrix (observed components) is irreducible and aperiodic. We then propose an efficient gradient descent scheme and show that skill estimates converges to the desired global optima for such sampling matrices. Our proof is original and the results are surprising in light of the fact that even the weighted rank-one matrix factorization problem is NP hard in general. Next we derive sample complexity bounds for the noisy case in terms of spectral properties of the signless Laplacian of the sampling matrix. Our proposed scheme achieves state-of-art performance on a number of real-world datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/ma18b.html
PDF: http://proceedings.mlr.press/v80/ma18b/ma18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ma18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yao
family: Ma
- given: Alexander
family: Olshevsky
- given: Csaba
family: Szepesvari
- given: Venkatesh
family: Saligrama
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3335-3344
id: ma18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3335
lastpage: 3344
published: 2018-07-03 00:00:00 +0000
- title: 'Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval and Matrix Completion'
abstract: 'Recent years have seen a flurry of activities in designing provably efficient nonconvex optimization procedures for solving statistical estimation problems. For various problems like phase retrieval or low-rank matrix completion, state-of-the-art nonconvex procedures require proper regularization (e.g. trimming, regularized cost, projection) in order to guarantee fast convergence. When it comes to vanilla procedures such as gradient descent, however, prior theory either recommends highly conservative learning rates to avoid overshooting, or completely lacks performance guarantees. This paper uncovers a striking phenomenon in several nonconvex problems: even in the absence of explicit regularization, gradient descent follows a trajectory staying within a basin that enjoys nice geometry, consisting of points incoherent with the sampling mechanism. This “implicit regularization” feature allows gradient descent to proceed in a far more aggressive fashion without overshooting, which in turn results in substantial computational savings. Focusing on two statistical estimation problems, i.e. solving random quadratic systems of equations and low-rank matrix completion, we establish that gradient descent achieves near-optimal statistical and computational guarantees without explicit regularization. As a byproduct, for noisy matrix completion, we demonstrate that gradient descent enables optimal control of both entrywise and spectral-norm errors.'
volume: 80
URL: https://proceedings.mlr.press/v80/ma18c.html
PDF: http://proceedings.mlr.press/v80/ma18c/ma18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ma18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Cong
family: Ma
- given: Kaizheng
family: Wang
- given: Yuejie
family: Chi
- given: Yuxin
family: Chen
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3345-3354
id: ma18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3345
lastpage: 3354
published: 2018-07-03 00:00:00 +0000
- title: 'Dimensionality-Driven Learning with Noisy Labels'
abstract: 'Datasets with significant proportions of noisy (incorrect) class labels present challenges for training accurate Deep Neural Networks (DNNs). We propose a new perspective for understanding DNN generalization for such datasets, by investigating the dimensionality of the deep representation subspace of training samples. We show that from a dimensionality perspective, DNNs exhibit quite distinctive learning styles when trained with clean labels versus when trained with a proportion of noisy labels. Based on this finding, we develop a new dimensionality-driven learning strategy, which monitors the dimensionality of subspaces during training and adapts the loss function accordingly. We empirically demonstrate that our approach is highly tolerant to significant proportions of noisy labels, and can effectively learn low-dimensional local subspaces that capture the data distribution.'
volume: 80
URL: https://proceedings.mlr.press/v80/ma18d.html
PDF: http://proceedings.mlr.press/v80/ma18d/ma18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ma18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xingjun
family: Ma
- given: Yisen
family: Wang
- given: Michael E.
family: Houle
- given: Shuo
family: Zhou
- given: Sarah
family: Erfani
- given: Shutao
family: Xia
- given: Sudanthi
family: Wijewickrema
- given: James
family: Bailey
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3355-3364
id: ma18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3355
lastpage: 3364
published: 2018-07-03 00:00:00 +0000
- title: 'Approximate message passing for amplitude based optimization'
abstract: 'We consider an $\ell_2$-regularized non-convex optimization problem for recovering signals from their noisy phaseless observations. We design and study the performance of a message passing algorithm that aims to solve this optimization problem. We consider the asymptotic setting $m,n \rightarrow \infty$, $m/n \rightarrow \delta$ and obtain sharp performance bounds, where $m$ is the number of measurements and $n$ is the signal dimension. We show that for complex signals the algorithm can perform accurate recovery with only $m=\left ( \frac{64}{\pi^2}-4\right)n\approx 2.5n$ measurements. Also, we provide sharp analysis on the sensitivity of the algorithm to noise. We highlight the following facts about our message passing algorithm: (i) Adding $\ell_2$ regularization to the non-convex loss function can be beneficial even in the noiseless setting; (ii) spectral initialization has marginal impact on the performance of the algorithm.'
volume: 80
URL: https://proceedings.mlr.press/v80/ma18e.html
PDF: http://proceedings.mlr.press/v80/ma18e/ma18e.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ma18e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Junjie
family: Ma
- given: Ji
family: Xu
- given: Arian
family: Maleki
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3365-3374
id: ma18e
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3365
lastpage: 3374
published: 2018-07-03 00:00:00 +0000
- title: 'Orthogonal Machine Learning: Power and Limitations'
abstract: 'Double machine learning provides n^{1/2}-consistent estimates of parameters of interest even when high-dimensional or nonparametric nuisance parameters are estimated at an n^{-1/4} rate. The key is to employ Neyman-orthogonal moment equations which are first-order insensitive to perturbations in the nuisance parameters. We show that the n^{-1/4} requirement can be improved to n^{-1/(2k+2)} by employing a k-th order notion of orthogonality that grants robustness to more complex or higher-dimensional nuisance parameters. In the partially linear regression setting popular in causal inference, we show that we can construct second-order orthogonal moments if and only if the treatment residual is not normally distributed. Our proof relies on Stein’s lemma and may be of independent interest. We conclude by demonstrating the robustness benefits of an explicit doubly-orthogonal estimation procedure for treatment effect.'
volume: 80
URL: https://proceedings.mlr.press/v80/mackey18a.html
PDF: http://proceedings.mlr.press/v80/mackey18a/mackey18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mackey18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lester
family: Mackey
- given: Vasilis
family: Syrgkanis
- given: Ilias
family: Zadik
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3375-3383
id: mackey18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3375
lastpage: 3383
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Adversarially Fair and Transferable Representations'
abstract: 'In this paper, we advocate for representation learning as the key to mitigating unfair prediction outcomes downstream. Motivated by a scenario where learned representations are used by third parties with unknown objectives, we propose and explore adversarial representation learning as a natural method of ensuring those parties act fairly. We connect group fairness (demographic parity, equalized odds, and equal opportunity) to different adversarial objectives. Through worst-case theoretical guarantees and experimental validation, we show that the choice of this objective is crucial to fair prediction. Furthermore, we present the first in-depth experimental demonstration of fair transfer learning and demonstrate empirically that our learned representations admit fair predictions on new tasks while maintaining utility, an essential goal of fair representation learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/madras18a.html
PDF: http://proceedings.mlr.press/v80/madras18a/madras18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-madras18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Madras
- given: Elliot
family: Creager
- given: Toniann
family: Pitassi
- given: Richard
family: Zemel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3384-3393
id: madras18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3384
lastpage: 3393
published: 2018-07-03 00:00:00 +0000
- title: 'An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning'
abstract: 'Our goal is for AI systems to correctly identify and act according to their human user’s objectives. Cooperative Inverse Reinforcement Learning (CIRL) formalizes this value alignment problem as a two-player game between a human and robot, in which only the human knows the parameters of the reward function: the robot needs to learn them as the interaction unfolds. Previous work showed that CIRL can be solved as a POMDP, but with an action space size exponential in the size of the reward parameter space. In this work, we exploit a specific property of CIRL: the human is a full information agent. This enables us to derive an optimality-preserving modification to the standard Bellman update, which reduces the complexity of the problem by an exponential factor. Additionally, we show that our modified Bellman update allows us to relax CIRL’s assumption of human rationality. We apply this update to a variety of POMDP solvers, including exact methods, point-based methods, and Monte Carlo Tree Search methods. We find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human. In solutions to these larger problems, the human exhibits pedagogical (teaching) behavior, while the robot interprets it as such and attains higher value for the human.'
volume: 80
URL: https://proceedings.mlr.press/v80/malik18a.html
PDF: http://proceedings.mlr.press/v80/malik18a/malik18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-malik18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dhruv
family: Malik
- given: Malayandi
family: Palaniappan
- given: Jaime
family: Fisac
- given: Dylan
family: Hadfield-Menell
- given: Stuart
family: Russell
- given: Anca
family: Dragan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3394-3402
id: malik18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3394
lastpage: 3402
published: 2018-07-03 00:00:00 +0000
- title: 'Iterative Amortized Inference'
abstract: 'Inference models are a key component in scaling variational inference to deep latent variable models, most notably as encoder networks in variational auto-encoders (VAEs). By replacing conventional optimization-based inference with a learned model, inference is amortized over data examples and therefore more computationally efficient. However, standard inference models are restricted to direct mappings from data to approximate posterior estimates. The failure of these models to reach fully optimized approximate posterior estimates results in an amortization gap. We aim toward closing this gap by proposing iterative inference models, which learn to perform inference optimization through repeatedly encoding gradients. Our approach generalizes standard inference models in VAEs and provides insight into several empirical findings, including top-down inference techniques. We demonstrate the inference optimization capabilities of iterative inference models and show that they outperform standard inference models on several benchmark data sets of images and text.'
volume: 80
URL: https://proceedings.mlr.press/v80/marino18a.html
PDF: http://proceedings.mlr.press/v80/marino18a/marino18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-marino18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Joe
family: Marino
- given: Yisong
family: Yue
- given: Stephan
family: Mandt
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3403-3412
id: marino18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3403
lastpage: 3412
published: 2018-07-03 00:00:00 +0000
- title: 'Streaming Principal Component Analysis in Noisy Setting'
abstract: 'We study streaming algorithms for principal component analysis (PCA) in noisy settings. We present computationally efficient algorithms with sub-linear regret bounds for PCA in the presence of noise, missing data, and gross outliers.'
volume: 80
URL: https://proceedings.mlr.press/v80/marinov18a.html
PDF: http://proceedings.mlr.press/v80/marinov18a/marinov18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-marinov18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Teodor Vanislavov
family: Marinov
- given: Poorya
family: Mianjy
- given: Raman
family: Arora
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3413-3422
id: marinov18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3413
lastpage: 3422
published: 2018-07-03 00:00:00 +0000
- title: 'Fast Approximate Spectral Clustering for Dynamic Networks'
abstract: 'Spectral clustering is a widely studied problem, yet its complexity is prohibitive for dynamic graphs of even modest size. We claim that it is possible to reuse information of past cluster assignments to expedite computation. Our approach builds on a recent idea of sidestepping the main bottleneck of spectral clustering, i.e., computing the graph eigenvectors, by a polynomial-based randomized sketching technique. We show that the proposed algorithm achieves clustering assignments with quality approximating that of spectral clustering and that it can yield significant complexity benefits when the graph dynamics are appropriately bounded. In our experiments, our method clusters 30k node graphs 3.9$\times$ faster in average and deviates from the correct assignment by less than 0.1%.'
volume: 80
URL: https://proceedings.mlr.press/v80/martin18a.html
PDF: http://proceedings.mlr.press/v80/martin18a/martin18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-martin18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lionel
family: Martin
- given: Andreas
family: Loukas
- given: Pierre
family: Vandergheynst
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3423-3432
id: martin18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3423
lastpage: 3432
published: 2018-07-03 00:00:00 +0000
- title: 'Bayesian Model Selection for Change Point Detection and Clustering'
abstract: 'We address a generalization of change point detection with the purpose of detecting the change locations and the levels of clusters of a piecewise constant signal. Our approach is to model it as a nonparametric penalized least square model selection on a family of models indexed over the collection of partitions of the design points and propose a computationally efficient algorithm to approximately solve it. Statistically, minimizing such a penalized criterion yields an approximation to the maximum a-posteriori probability (MAP) estimator. The criterion is then analyzed and an oracle inequality is derived using a Gaussian concentration inequality. The oracle inequality is used to derive on one hand conditions for consistency and on the other hand an adaptive upper bound on the expected square risk of the estimator, which statistically motivates our approximation. Finally, we apply our algorithm to simulated data to experimentally validate the statistical guarantees and illustrate its behavior.'
volume: 80
URL: https://proceedings.mlr.press/v80/mazhar18a.html
PDF: http://proceedings.mlr.press/v80/mazhar18a/mazhar18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mazhar18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Othmane
family: Mazhar
- given: Cristian
family: Rojas
- given: Carlo
family: Fischione
- given:
family: Mohammad Reza Hesamzadeh
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3433-3442
id: mazhar18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3433
lastpage: 3442
published: 2018-07-03 00:00:00 +0000
- title: 'Optimization, fast and slow: optimally switching between local and Bayesian optimization'
abstract: 'We develop the first Bayesian Optimization algorithm, BLOSSOM, which selects between multiple alternative acquisition functions and traditional local optimization at each step. This is combined with a novel stopping condition based on expected regret. This pairing allows us to obtain the best characteristics of both local and Bayesian optimization, making efficient use of function evaluations while yielding superior convergence to the global minimum on a selection of optimization problems, and also halting optimization once a principled and intuitive stopping condition has been fulfilled.'
volume: 80
URL: https://proceedings.mlr.press/v80/mcleod18a.html
PDF: http://proceedings.mlr.press/v80/mcleod18a/mcleod18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mcleod18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mark
family: McLeod
- given: Stephen
family: Roberts
- given: Michael A.
family: Osborne
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3443-3452
id: mcleod18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3443
lastpage: 3452
published: 2018-07-03 00:00:00 +0000
- title: 'Bounds on the Approximation Power of Feedforward Neural Networks'
abstract: 'The approximation power of general feedforward neural networks with piecewise linear activation functions is investigated. First, lower bounds on the size of a network are established in terms of the approximation error and network depth and width. These bounds improve upon state-of-the-art bounds for certain classes of functions, such as strongly convex functions. Second, an upper bound is established on the difference of two neural networks with identical weights but different activation functions.'
volume: 80
URL: https://proceedings.mlr.press/v80/mehrabi18a.html
PDF: http://proceedings.mlr.press/v80/mehrabi18a/mehrabi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mehrabi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mohammad
family: Mehrabi
- given: Aslan
family: Tchamkerten
- given: MANSOOR
family: YOUSEFI
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3453-3461
id: mehrabi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3453
lastpage: 3461
published: 2018-07-03 00:00:00 +0000
- title: 'Differentiable Dynamic Programming for Structured Prediction and Attention'
abstract: 'Dynamic programming (DP) solves a variety of structured combinatorial problems by iteratively breaking them down into smaller subproblems. In spite of their versatility, many DP algorithms are non-differentiable, which hampers their use as a layer in neural networks trained by backpropagation. To address this issue, we propose to smooth the max operator in the dynamic programming recursion, using a strongly convex regularizer. This allows to relax both the optimal value and solution of the original combinatorial problem, and turns a broad class of DP algorithms into differentiable operators. Theoretically, we provide a new probabilistic perspective on backpropagating through these DP operators, and relate them to inference in graphical models. We derive two particular instantiations of our framework, a smoothed Viterbi algorithm for sequence prediction and a smoothed DTW algorithm for time-series alignment. We showcase these instantiations on structured prediction (audio-to-score alignment, NER) and on structured and sparse attention for translation.'
volume: 80
URL: https://proceedings.mlr.press/v80/mensch18a.html
PDF: http://proceedings.mlr.press/v80/mensch18a/mensch18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mensch18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arthur
family: Mensch
- given: Mathieu
family: Blondel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3462-3471
id: mensch18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3462
lastpage: 3471
published: 2018-07-03 00:00:00 +0000
- title: 'Ranking Distributions based on Noisy Sorting'
abstract: 'We propose a new statistical model for ranking data, i.e., a new family of probability distributions on permutations. Our model is inspired by the idea of a data-generating process in the form of a noisy sorting procedure, in which deterministic comparisons between pairs of items are replaced by Bernoulli trials. The probability of producing a certain ranking as a result then essentially depends on the Bernoulli parameters, which can be interpreted as pairwise preferences. We show that our model can be written in closed form if insertion or quick sort are used as sorting algorithms, and propose a maximum likelihood approach for parameter estimation. We also introduce a generalization of the model, in which the constraints on pairwise preferences are relaxed, and for which maximum likelihood estimation can be carried out based on a variation of the generalized iterative scaling algorithm. Experimentally, we show that the models perform very well in terms of goodness of fit, compared to existing models for ranking data.'
volume: 80
URL: https://proceedings.mlr.press/v80/mesaoudi-paul18a.html
PDF: http://proceedings.mlr.press/v80/mesaoudi-paul18a/mesaoudi-paul18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mesaoudi-paul18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Adil El
family: Mesaoudi-Paul
- given: Eyke
family: Hüllermeier
- given: Robert
family: Busa-Fekete
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3472-3480
id: mesaoudi-paul18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3472
lastpage: 3480
published: 2018-07-03 00:00:00 +0000
- title: 'Which Training Methods for GANs do actually Converge?'
abstract: 'Recent work has shown local convergence of GAN training for absolutely continuous data and generator distributions. In this paper, we show that the requirement of absolute continuity is necessary: we describe a simple yet prototypical counterexample showing that in the more realistic case of distributions that are not absolutely continuous, unregularized GAN training is not always convergent. Furthermore, we discuss regularization strategies that were recently proposed to stabilize GAN training. Our analysis shows that GAN training with instance noise or zero-centered gradient penalties converges. On the other hand, we show that Wasserstein-GANs and WGAN-GP with a finite number of discriminator updates per generator update do not always converge to the equilibrium point. We discuss these results, leading us to a new explanation for the stability problems of GAN training. Based on our analysis, we extend our convergence results to more general GANs and prove local convergence for simplified gradient penalties even if the generator and data distributions lie on lower dimensional manifolds. We find these penalties to work well in practice and use them to learn high-resolution generative image models for a variety of datasets with little hyperparameter tuning.'
volume: 80
URL: https://proceedings.mlr.press/v80/mescheder18a.html
PDF: http://proceedings.mlr.press/v80/mescheder18a/mescheder18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mescheder18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lars
family: Mescheder
- given: Andreas
family: Geiger
- given: Sebastian
family: Nowozin
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3481-3490
id: mescheder18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3481
lastpage: 3490
published: 2018-07-03 00:00:00 +0000
- title: 'Configurable Markov Decision Processes'
abstract: 'In many real-world problems, there is the possibility to configure, to a limited extent, some environmental parameters to improve the performance of a learning agent. In this paper, we propose a novel framework, Configurable Markov Decision Processes (Conf-MDPs), to model this new type of interaction with the environment. Furthermore, we provide a new learning algorithm, Safe Policy-Model Iteration (SPMI), to jointly and adaptively optimize the policy and the environment configuration. After having introduced our approach and derived some theoretical results, we present the experimental evaluation in two explicative problems to show the benefits of the environment configurability on the performance of the learned policy.'
volume: 80
URL: https://proceedings.mlr.press/v80/metelli18a.html
PDF: http://proceedings.mlr.press/v80/metelli18a/metelli18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-metelli18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alberto Maria
family: Metelli
- given: Mirco
family: Mutti
- given: Marcello
family: Restelli
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3491-3500
id: metelli18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3491
lastpage: 3500
published: 2018-07-03 00:00:00 +0000
- title: 'prDeep: Robust Phase Retrieval with a Flexible Deep Network'
abstract: 'Phase retrieval algorithms have become an important component in many modern computational imaging systems. For instance, in the context of ptychography and speckle correlation imaging, they enable imaging past the diffraction limit and through scattering media, respectively. Unfortunately, traditional phase retrieval algorithms struggle in the presence of noise. Progress has been made recently on developing more robust algorithms using signal priors, but at the expense of limiting the range of supported measurement models (e.g., to Gaussian or coded diffraction patterns). In this work we leverage the regularization-by-denoising framework and a convolutional neural network denoiser to create prDeep, a new phase retrieval algorithm that is both robust and broadly applicable. We test and validate prDeep in simulation to demonstrate that it is robust to noise and can handle a variety of system models.'
volume: 80
URL: https://proceedings.mlr.press/v80/metzler18a.html
PDF: http://proceedings.mlr.press/v80/metzler18a/metzler18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-metzler18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christopher
family: Metzler
- given: Phillip
family: Schniter
- given: Ashok
family: Veeraraghavan
- given: Richard
family: Baraniuk
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3501-3510
id: metzler18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3501
lastpage: 3510
published: 2018-07-03 00:00:00 +0000
- title: 'Pseudo-task Augmentation: From Deep Multitask Learning to Intratask Sharing—and Back'
abstract: 'Deep multitask learning boosts performance by sharing learned structure across related tasks. This paper adapts ideas from deep multitask learning to the setting where only a single task is available. The method is formalized as pseudo-task augmentation, in which models are trained with multiple decoders for each task. Pseudo-tasks simulate the effect of training towards closely-related tasks drawn from the same universe. In a suite of experiments, pseudo-task augmentation is shown to improve performance on single-task learning problems. When combined with multitask learning, further improvements are achieved, including state-of-the-art performance on the CelebA dataset, showing that pseudo-task augmentation and multitask learning have complementary value. All in all, pseudo-task augmentation is a broadly applicable and efficient way to boost performance in deep learning systems.'
volume: 80
URL: https://proceedings.mlr.press/v80/meyerson18a.html
PDF: http://proceedings.mlr.press/v80/meyerson18a/meyerson18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-meyerson18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elliot
family: Meyerson
- given: Risto
family: Miikkulainen
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3511-3520
id: meyerson18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3511
lastpage: 3520
published: 2018-07-03 00:00:00 +0000
- title: 'The Hidden Vulnerability of Distributed Learning in Byzantium'
abstract: 'While machine learning is going through an era of celebrated success, concerns have been raised about the vulnerability of its backbone: stochastic gradient descent (SGD). Recent approaches have been proposed to ensure the robustness of distributed SGD against adversarial (Byzantine) workers sending *poisoned* gradients during the training phase. Some of these approaches have been proven *Byzantine–resilient*: they ensure the *convergence* of SGD despite the presence of a minority of adversarial workers. We show in this paper that *convergence is not enough*. In high dimension $d \gg 1$, an adver\-sary can build on the loss function’s non–convexity to make SGD converge to *ineffective* models. More precisely, we bring to light that existing Byzantine–resilient schemes leave a *margin of poisoning* of $\bigOmega\left(f(d)\right)$, where $f(d)$ increases at least like $\sqrt[p]{d }$. Based on this *leeway*, we build a simple attack, and experimentally show its strong to utmost effectivity on CIFAR–10 and MNIST. We introduce *Bulyan*, and prove it significantly reduces the attackers leeway to a narrow $\bigO\,( \sfrac{1}{\sqrt{d }})$ bound. We empirically show that Bulyan does not suffer the fragility of existing aggregation rules and, at a reasonable cost in terms of required batch size, achieves convergence *as if* only non–Byzantine gradients had been used to update the model.'
volume: 80
URL: https://proceedings.mlr.press/v80/mhamdi18a.html
PDF: http://proceedings.mlr.press/v80/mhamdi18a/mhamdi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mhamdi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: El Mahdi
family: El Mhamdi
- given: Rachid
family: Guerraoui
- given: Sébastien
family: Rouault
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3521-3530
id: mhamdi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3521
lastpage: 3530
published: 2018-07-03 00:00:00 +0000
- title: 'Stochastic PCA with $\ell_2$ and $\ell_1$ Regularization'
abstract: 'We revisit convex relaxation based methods for stochastic optimization of principal component analysis (PCA). While methods that directly solve the nonconvex problem have been shown to be optimal in terms of statistical and computational efficiency, the methods based on convex relaxation have been shown to enjoy comparable, or even superior, empirical performance – this motivates the need for a deeper formal understanding of the latter. Therefore, in this paper, we study variants of stochastic gradient descent for a convex relaxation of PCA with (a) $\ell_2$, (b) $\ell_1$, and (c) elastic net ($\ell_1+\ell_2)$ regularization in the hope that these variants yield (a) better iteration complexity, (b) better control on the rank of the intermediate iterates, and (c) both, respectively. We show, theoretically and empirically, that compared to previous work on convex relaxation based methods, the proposed variants yield faster convergence and improve overall runtime to achieve a certain user-specified $\epsilon$-suboptimality on the PCA objective. Furthermore, the proposed methods are shown to converge both in terms of the PCA objective as well as the distance between subspaces. However, there still remains a gap in computational requirements for the proposed methods when compared with existing nonconvex approaches.'
volume: 80
URL: https://proceedings.mlr.press/v80/mianjy18a.html
PDF: http://proceedings.mlr.press/v80/mianjy18a/mianjy18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mianjy18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Poorya
family: Mianjy
- given: Raman
family: Arora
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3531-3539
id: mianjy18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3531
lastpage: 3539
published: 2018-07-03 00:00:00 +0000
- title: 'On the Implicit Bias of Dropout'
abstract: 'Algorithmic approaches endow deep learning systems with implicit bias that helps them generalize even in over-parametrized settings. In this paper, we focus on understanding such a bias induced in learning through dropout, a popular technique to avoid overfitting in deep learning. For single hidden-layer linear neural networks, we show that dropout tends to make the norm of incoming/outgoing weight vectors of all the hidden nodes equal. In addition, we provide a complete characterization of the optimization landscape induced by dropout.'
volume: 80
URL: https://proceedings.mlr.press/v80/mianjy18b.html
PDF: http://proceedings.mlr.press/v80/mianjy18b/mianjy18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mianjy18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Poorya
family: Mianjy
- given: Raman
family: Arora
- given: Rene
family: Vidal
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3540-3548
id: mianjy18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3540
lastpage: 3548
published: 2018-07-03 00:00:00 +0000
- title: 'One-Shot Segmentation in Clutter'
abstract: 'We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call *cluttered Omniglot*. Using a baseline architecture combining a Siamese embedding for detection with a U-net for segmentation we show that increasing levels of clutter make the task progressively harder. Using oracle models with access to various amounts of ground-truth information, we evaluate different aspects of the problem and show that in this kind of visual search task, detection and segmentation are two intertwined problems, the solution to each of which helps solving the other. We therefore introduce *MaskNet*, an improved model that attends to multiple candidate locations, generates segmentation proposals to mask out background clutter and selects among the segmented objects. Our findings suggest that such image recognition models based on an iterative refinement of object detection and foreground segmentation may provide a way to deal with highly cluttered scenes.'
volume: 80
URL: https://proceedings.mlr.press/v80/michaelis18a.html
PDF: http://proceedings.mlr.press/v80/michaelis18a/michaelis18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-michaelis18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Claudio
family: Michaelis
- given: Matthias
family: Bethge
- given: Alexander
family: Ecker
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3549-3558
id: michaelis18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3549
lastpage: 3558
published: 2018-07-03 00:00:00 +0000
- title: 'Differentiable plasticity: training plastic neural networks with backpropagation'
abstract: 'How can we build agents that keep learning from experience, quickly and efficiently, after their initial training? Here we take inspiration from the main mechanism of learning in biological brains: synaptic plasticity, carefully tuned by evolution to produce efficient lifelong learning. We show that plasticity, just like connection weights, can be optimized by gradient descent in large (millions of parameters) recurrent networks with Hebbian plastic connections. First, recurrent plastic networks with more than two million parameters can be trained to memorize and reconstruct sets of novel, high-dimensional (1000+ pixels) natural images not seen during training. Crucially, traditional non-plastic recurrent networks fail to solve this task. Furthermore, trained plastic networks can also solve generic meta-learning tasks such as the Omniglot task, with competitive results and little parameter overhead. Finally, in reinforcement learning settings, plastic networks outperform non-plastic equivalent in a maze exploration task. We conclude that differentiable plasticity may provide a powerful novel approach to the learning-to-learn problem.'
volume: 80
URL: https://proceedings.mlr.press/v80/miconi18a.html
PDF: http://proceedings.mlr.press/v80/miconi18a/miconi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-miconi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Miconi
- given: Kenneth
family: Stanley
- given: Jeff
family: Clune
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3559-3568
id: miconi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3559
lastpage: 3568
published: 2018-07-03 00:00:00 +0000
- title: 'Training Neural Machines with Trace-Based Supervision'
abstract: 'We investigate the effectiveness of trace-based supervision methods for training existing neural abstract machines. To define the class of neural machines amenable to trace-based supervision, we introduce the concept of a differential neural computational machine (dNCM) and show that several existing architectures (NTMs, NRAMs) can be described as dNCMs. We performed a detailed experimental evaluation with NTM and NRAM machines, showing that additional supervision on the interpretable portions of these architectures leads to better convergence and generalization capabilities of the learning phase than standard training, in both noise-free and noisy scenarios.'
volume: 80
URL: https://proceedings.mlr.press/v80/mirman18a.html
PDF: http://proceedings.mlr.press/v80/mirman18a/mirman18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mirman18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Mirman
- given: Dimitar
family: Dimitrov
- given: Pavle
family: Djordjevic
- given: Timon
family: Gehr
- given: Martin
family: Vechev
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3569-3577
id: mirman18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3569
lastpage: 3577
published: 2018-07-03 00:00:00 +0000
- title: 'Differentiable Abstract Interpretation for Provably Robust Neural Networks'
abstract: 'We introduce a scalable method for training robust neural networks based on abstract interpretation. We present several abstract transformers which balance efficiency with precision and show these can be used to train large neural networks that are certifiably robust to adversarial perturbations.'
volume: 80
URL: https://proceedings.mlr.press/v80/mirman18b.html
PDF: http://proceedings.mlr.press/v80/mirman18b/mirman18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mirman18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Mirman
- given: Timon
family: Gehr
- given: Martin
family: Vechev
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3578-3586
id: mirman18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3578
lastpage: 3586
published: 2018-07-03 00:00:00 +0000
- title: 'A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning'
abstract: 'Distributed learning aims at computing high-quality models by training over scattered data. This covers a diversity of scenarios, including computer clusters or mobile agents. One of the main challenges is then to deal with heterogeneous machines and unreliable communications. In this setting, we propose and analyze a flexible asynchronous optimization algorithm for solving nonsmooth learning problems. Unlike most existing methods, our algorithm is adjustable to various levels of communication costs, machines computational powers, and data distribution evenness. We prove that the algorithm converges linearly with a fixed learning rate that does not depend on communication delays nor on the number of machines. Although long delays in communication may slow down performance, no delay can break convergence.'
volume: 80
URL: https://proceedings.mlr.press/v80/mishchenko18a.html
PDF: http://proceedings.mlr.press/v80/mishchenko18a/mishchenko18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mishchenko18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Konstantin
family: Mishchenko
- given: Franck
family: Iutzeler
- given: Jérôme
family: Malick
- given: Massih-Reza
family: Amini
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3587-3595
id: mishchenko18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3587
lastpage: 3595
published: 2018-07-03 00:00:00 +0000
- title: 'Data Summarization at Scale: A Two-Stage Submodular Approach'
abstract: 'The sheer scale of modern datasets has resulted in a dire need for summarization techniques that can identify representative elements in a dataset. Fortunately, the vast majority of data summarization tasks satisfy an intuitive diminishing returns condition known as submodularity, which allows us to find nearly-optimal solutions in linear time. We focus on a two-stage submodular framework where the goal is to use some given training functions to reduce the ground set so that optimizing new functions (drawn from the same distribution) over the reduced set provides almost as much value as optimizing them over the entire ground set. In this paper, we develop the first streaming and distributed solutions to this problem. In addition to providing strong theoretical guarantees, we demonstrate both the utility and efficiency of our algorithms on real-world tasks including image summarization and ride-share optimization.'
volume: 80
URL: https://proceedings.mlr.press/v80/mitrovic18a.html
PDF: http://proceedings.mlr.press/v80/mitrovic18a/mitrovic18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mitrovic18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marko
family: Mitrovic
- given: Ehsan
family: Kazemi
- given: Morteza
family: Zadimoghaddam
- given: Amin
family: Karbasi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3596-3605
id: mitrovic18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3596
lastpage: 3605
published: 2018-07-03 00:00:00 +0000
- title: 'The Hierarchical Adaptive Forgetting Variational Filter'
abstract: 'A common problem in Machine Learning and statistics consists in detecting whether the current sample in a stream of data belongs to the same distribution as previous ones, is an isolated outlier or inaugurates a new distribution of data. We present a hierarchical Bayesian algorithm that aims at learning a time-specific approximate posterior distribution of the parameters describing the distribution of the data observed. We derive the update equations of the variational parameters of the approximate posterior at each time step for models from the exponential family, and show that these updates find interesting correspondents in Reinforcement Learning (RL). In this perspective, our model can be seen as a hierarchical RL algorithm that learns a posterior distribution according to a certain stability confidence that is, in turn, learned according to its own stability confidence. Finally, we show some applications of our generic model, first in a RL context, next with an adaptive Bayesian Autoregressive model, and finally in the context of Stochastic Gradient Descent optimization.'
volume: 80
URL: https://proceedings.mlr.press/v80/moens18a.html
PDF: http://proceedings.mlr.press/v80/moens18a/moens18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-moens18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vincent
family: Moens
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3606-3615
id: moens18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3606
lastpage: 3615
published: 2018-07-03 00:00:00 +0000
- title: 'Decentralized Submodular Maximization: Bridging Discrete and Continuous Settings'
abstract: 'In this paper, we showcase the interplay between discrete and continuous optimization in network-structured settings. We propose the first fully decentralized optimization method for a wide class of non-convex objective functions that possess a diminishing returns property. More specifically, given an arbitrary connected network and a global continuous submodular function, formed by a sum of local functions, we develop Decentralized Continuous Greedy (DCG), a message passing algorithm that converges to the tight $(1-1/e)$ approximation factor of the optimum global solution using only local computation and communication. We also provide strong convergence bounds as a function of network size and spectral characteristics of the underlying topology. Interestingly, DCG readily provides a simple recipe for decentralized discrete submodular maximization through the means of continuous relaxations. Formally, we demonstrate that by lifting the local discrete functions to continuous domains and using DCG as an interface we can develop a consensus algorithm that also achieves the tight $(1-1/e)$ approximation guarantee of the global discrete solution once a proper rounding scheme is applied.'
volume: 80
URL: https://proceedings.mlr.press/v80/mokhtari18a.html
PDF: http://proceedings.mlr.press/v80/mokhtari18a/mokhtari18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mokhtari18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aryan
family: Mokhtari
- given: Hamed
family: Hassani
- given: Amin
family: Karbasi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3616-3625
id: mokhtari18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3616
lastpage: 3625
published: 2018-07-03 00:00:00 +0000
- title: 'DICOD: Distributed Convolutional Coordinate Descent for Convolutional Sparse Coding'
abstract: 'In this paper, we introduce DICOD, a convolutional sparse coding algorithm which builds shift invariant representations for long signals. This algorithm is designed to run in a distributed setting, with local message passing, making it communication efficient. It is based on coordinate descent and uses locally greedy updates which accelerate the resolution compared to greedy coordinate selection. We prove the convergence of this algorithm and highlight its computational speed-up which is super-linear in the number of cores used. We also provide empirical evidence for the acceleration properties of our algorithm compared to state-of-the-art methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/moreau18a.html
PDF: http://proceedings.mlr.press/v80/moreau18a/moreau18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-moreau18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Moreau
- given: Laurent
family: Oudre
- given: Nicolas
family: Vayatis
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3626-3634
id: moreau18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3626
lastpage: 3634
published: 2018-07-03 00:00:00 +0000
- title: 'WHInter: A Working set algorithm for High-dimensional sparse second order Interaction models'
abstract: 'Learning sparse linear models with two-way interactions is desirable in many application domains such as genomics. $\ell_1$-regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate two-way interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features. Here we present WHInter, a working set algorithm to solve large $\ell_1$-regularised problems with two-way interactions for binary design matrices. The novelty of WHInter stems from a new bound to efficiently identify working sets while avoiding to scan all features, and on fast computations inspired from solutions to the maximum inner product search problem. We apply WHInter to simulated and real genetic data and show that it is more scalable and two orders of magnitude faster than the state of the art.'
volume: 80
URL: https://proceedings.mlr.press/v80/morvan18a.html
PDF: http://proceedings.mlr.press/v80/morvan18a/morvan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-morvan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marine Le
family: Morvan
- given: Jean-Philippe
family: Vert
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3635-3644
id: morvan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3635
lastpage: 3644
published: 2018-07-03 00:00:00 +0000
- title: 'Dropout Training, Data-dependent Regularization, and Generalization Bounds'
abstract: 'We study the problem of generalization guarantees for dropout training. A general framework is first proposed for learning procedures with random perturbation on model parameters. The generalization error is bounded by sum of two offset Rademacher complexities: the main term is Rademacher complexity of the hypothesis class with minus offset induced by the perturbation variance, which characterizes data-dependent regularization by the random perturbation; the auxiliary term is offset Rademacher complexity for the variance class, controlling the degree to which this regularization effect can be weakened. For neural networks, we estimate upper and lower bounds for the variance induced by truthful dropout, a variant of dropout that we propose to ensure unbiased output and fit into our framework, and the variance bounds exhibits connection to adaptive regularization methods. By applying our framework to ReLU networks with one hidden layer, a generalization upper bound is derived with no assumptions on the parameter norms or data distribution, with $O(1/n)$ fast rate and adaptivity to geometry of data points being achieved at the same time.'
volume: 80
URL: https://proceedings.mlr.press/v80/mou18a.html
PDF: http://proceedings.mlr.press/v80/mou18a/mou18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mou18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wenlong
family: Mou
- given: Yuchen
family: Zhou
- given: Jun
family: Gao
- given: Liwei
family: Wang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3645-3653
id: mou18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3645
lastpage: 3653
published: 2018-07-03 00:00:00 +0000
- title: 'Kernelized Synaptic Weight Matrices'
abstract: 'In this paper we introduce a novel neural network architecture, in which weight matrices are re-parametrized in terms of low-dimensional vectors, interacting through kernel functions. A layer of our network can be interpreted as introducing a (potentially infinitely wide) linear layer between input and output. We describe the theory underpinning this model and validate it with concrete examples, exploring how it can be used to impose structure on neural networks in diverse applications ranging from data visualization to recommender systems. We achieve state-of-the-art performance in a collaborative filtering task (MovieLens).'
volume: 80
URL: https://proceedings.mlr.press/v80/muller18a.html
PDF: http://proceedings.mlr.press/v80/muller18a/muller18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-muller18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lorenz
family: Muller
- given: Julien
family: Martel
- given: Giacomo
family: Indiveri
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3654-3663
id: muller18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3654
lastpage: 3663
published: 2018-07-03 00:00:00 +0000
- title: 'Rapid Adaptation with Conditionally Shifted Neurons'
abstract: 'We describe a mechanism by which artificial neural networks can learn rapid adaptation - the ability to adapt on the fly, with little data, to new tasks - that we call conditionally shifted neurons. We apply this mechanism in the framework of metalearning, where the aim is to replicate some of the flexibility of human learning in machines. Conditionally shifted neurons modify their activation values with task-specific shifts retrieved from a memory module, which is populated rapidly based on limited task experience. On metalearning benchmarks from the vision and language domains, models augmented with conditionally shifted neurons achieve state-of-the-art results.'
volume: 80
URL: https://proceedings.mlr.press/v80/munkhdalai18a.html
PDF: http://proceedings.mlr.press/v80/munkhdalai18a/munkhdalai18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-munkhdalai18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tsendsuren
family: Munkhdalai
- given: Xingdi
family: Yuan
- given: Soroush
family: Mehri
- given: Adam
family: Trischler
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3664-3673
id: munkhdalai18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3664
lastpage: 3673
published: 2018-07-03 00:00:00 +0000
- title: 'On the Relationship between Data Efficiency and Error for Uncertainty Sampling'
abstract: 'While active learning offers potential cost savings, the actual data efficiency—the reduction in amount of labeled data needed to obtain the same error rate—observed in practice is mixed. This paper poses a basic question: when is active learning actually helpful? We provide an answer for logistic regression with the popular active learning algorithm, uncertainty sampling. Empirically, on 21 datasets from OpenML, we find a strong inverse correlation between data efficiency and the error rate of the final classifier. Theoretically, we show that for a variant of uncertainty sampling, the asymptotic data efficiency is within a constant factor of the inverse error rate of the limiting classifier.'
volume: 80
URL: https://proceedings.mlr.press/v80/mussmann18a.html
PDF: http://proceedings.mlr.press/v80/mussmann18a/mussmann18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-mussmann18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Stephen
family: Mussmann
- given: Percy
family: Liang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3674-3682
id: mussmann18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3674
lastpage: 3682
published: 2018-07-03 00:00:00 +0000
- title: 'Fitting New Speakers Based on a Short Untranscribed Sample'
abstract: 'Learning-based Text To Speech systems have the potential to generalize from one speaker to the next and thus require a relatively short sample of any new voice. However, this promise is currently largely unrealized. We present a method that is designed to capture a new speaker from a short untranscribed audio sample. This is done by employing an additional network that given an audio sample, places the speaker in the embedding space. This network is trained as part of the speech synthesis system using various consistency losses. Our results demonstrate a greatly improved performance on both the dataset speakers, and, more importantly, when fitting new voices, even from very short samples.'
volume: 80
URL: https://proceedings.mlr.press/v80/nachmani18a.html
PDF: http://proceedings.mlr.press/v80/nachmani18a/nachmani18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nachmani18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eliya
family: Nachmani
- given: Adam
family: Polyak
- given: Yaniv
family: Taigman
- given: Lior
family: Wolf
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3683-3691
id: nachmani18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3683
lastpage: 3691
published: 2018-07-03 00:00:00 +0000
- title: 'Smoothed Action Value Functions for Learning Gaussian Policies'
abstract: 'State-action value functions (i.e., Q-values) are ubiquitous in reinforcement learning (RL), giving rise to popular algorithms such as SARSA and Q-learning. We propose a new notion of action value defined by a Gaussian smoothed version of the expected Q-value. We show that such smoothed Q-values still satisfy a Bellman equation, making them learnable from experience sampled from an environment. Moreover, the gradients of expected reward with respect to the mean and covariance of a parameterized Gaussian policy can be recovered from the gradient and Hessian of the smoothed Q-value function. Based on these relationships we develop new algorithms for training a Gaussian policy directly from a learned smoothed Q-value approximator. The approach is additionally amenable to proximal optimization by augmenting the objective with a penalty on KL-divergence from a previous policy. We find that the ability to learn both a mean and covariance during training leads to significantly improved results on standard continuous control benchmarks.'
volume: 80
URL: https://proceedings.mlr.press/v80/nachum18a.html
PDF: http://proceedings.mlr.press/v80/nachum18a/nachum18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nachum18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ofir
family: Nachum
- given: Mohammad
family: Norouzi
- given: George
family: Tucker
- given: Dale
family: Schuurmans
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3692-3700
id: nachum18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3692
lastpage: 3700
published: 2018-07-03 00:00:00 +0000
- title: 'Nearly Optimal Robust Subspace Tracking'
abstract: 'Robust subspace tracking (RST) can be simply understood as a dynamic (time-varying) extension of robust PCA. More precisely, it is the problem of tracking data lying in a fixed or slowly-changing low-dimensional subspace while being robust to sparse outliers. This work develops a recursive projected compressive sensing algorithm called “Nearly Optimal RST (NORST)”, and obtains one of the first guarantees for it. We show that NORST provably solves RST under weakened standard RPCA assumptions, slow subspace change, and a lower bound on (most) outlier magnitudes. Our guarantee shows that (i) NORST is online (after initialization) and enjoys near-optimal values of tracking delay, lower bound on required delay between subspace change times, and of memory complexity; and (ii) it has a significantly improved worst-case outlier tolerance compared with all previous robust PCA or RST methods without requiring any model on how the outlier support is generated.'
volume: 80
URL: https://proceedings.mlr.press/v80/narayanamurthy18a.html
PDF: http://proceedings.mlr.press/v80/narayanamurthy18a/narayanamurthy18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-narayanamurthy18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Praneeth
family: Narayanamurthy
- given: Namrata
family: Vaswani
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3701-3709
id: narayanamurthy18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3701
lastpage: 3709
published: 2018-07-03 00:00:00 +0000
- title: 'Stochastic Proximal Algorithms for AUC Maximization'
abstract: 'Stochastic optimization algorithms such as SGDs update the model sequentially with cheap per-iteration costs, making them amenable for large-scale data analysis. However, most of the existing studies focus on the classification accuracy which can not be directly applied to the important problems of maximizing the Area under the ROC curve (AUC) in imbalanced classification and bipartite ranking. In this paper, we develop a novel stochastic proximal algorithm for AUC maximization which is referred to as SPAM. Compared with the previous literature, our algorithm SPAM applies to a non-smooth penalty function, and achieves a convergence rate of O(log t/t) for strongly convex functions while both space and per-iteration costs are of one datum.'
volume: 80
URL: https://proceedings.mlr.press/v80/natole18a.html
PDF: http://proceedings.mlr.press/v80/natole18a/natole18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-natole18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Michael
family: Natole
- given: Yiming
family: Ying
- given: Siwei
family: Lyu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3710-3719
id: natole18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3710
lastpage: 3719
published: 2018-07-03 00:00:00 +0000
- title: 'Mitigating Bias in Adaptive Data Gathering via Differential Privacy'
abstract: 'Data that is gathered adaptively — via bandit algorithms, for example — exhibits bias. This is true both when gathering simple numeric valued data — the empirical means kept track of by stochastic bandit algorithms are biased downwards — and when gathering more complicated data — running hypothesis tests on complex data gathered via contextual bandit algorithms leads to false discovery. In this paper, we show that this problem is mitigated if the data collection procedure is differentially private. This lets us both bound the bias of simple numeric valued quantities (like the empirical means of stochastic bandit algorithms), and correct the p-values of hypothesis tests run on the adaptively gathered data. Moreover, there exist differentially private bandit algorithms with near optimal regret bounds: we apply existing theorems in the simple stochastic case, and give a new analysis for linear contextual bandits. We complement our theoretical results with experiments validating our theory.'
volume: 80
URL: https://proceedings.mlr.press/v80/neel18a.html
PDF: http://proceedings.mlr.press/v80/neel18a/neel18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-neel18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Seth
family: Neel
- given: Aaron
family: Roth
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3720-3729
id: neel18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3720
lastpage: 3729
published: 2018-07-03 00:00:00 +0000
- title: 'Optimization Landscape and Expressivity of Deep CNNs'
abstract: 'We analyze the loss landscape and expressiveness of practical deep convolutional neural networks (CNNs) with shared weights and max pooling layers. We show that such CNNs produce linearly independent features at a “wide” layer which has more neurons than the number of training samples. This condition holds e.g. for the VGG network. Furthermore, we provide for such wide CNNs necessary and sufficient conditions for global minima with zero training error. For the case where the wide layer is followed by a fully connected layer we show that almost every critical point of the empirical loss is a global minimum with zero training error. Our analysis suggests that both depth and width are very important in deep learning. While depth brings more representational power and allows the network to learn high level features, width smoothes the optimization landscape of the loss function in the sense that a sufficiently wide network has a well-behaved loss surface with almost no bad local minima.'
volume: 80
URL: https://proceedings.mlr.press/v80/nguyen18a.html
PDF: http://proceedings.mlr.press/v80/nguyen18a/nguyen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nguyen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Quynh
family: Nguyen
- given: Matthias
family: Hein
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3730-3739
id: nguyen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3730
lastpage: 3739
published: 2018-07-03 00:00:00 +0000
- title: 'Neural Networks Should Be Wide Enough to Learn Disconnected Decision Regions'
abstract: 'In the recent literature the important role of depth in deep learning has been emphasized. In this paper we argue that sufficient width of a feedforward network is equally important by answering the simple question under which conditions the decision regions of a neural network are connected. It turns out that for a class of activation functions including leaky ReLU, neural networks having a pyramidal structure, that is no layer has more hidden units than the input dimension, produce necessarily connected decision regions. This implies that a sufficiently wide hidden layer is necessary to guarantee that the network can produce disconnected decision regions. We discuss the implications of this result for the construction of neural networks, in particular the relation to the problem of adversarial manipulation of classifiers.'
volume: 80
URL: https://proceedings.mlr.press/v80/nguyen18b.html
PDF: http://proceedings.mlr.press/v80/nguyen18b/nguyen18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nguyen18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Quynh
family: Nguyen
- given: Mahesh Chandra
family: Mukkamala
- given: Matthias
family: Hein
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3740-3749
id: nguyen18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3740
lastpage: 3749
published: 2018-07-03 00:00:00 +0000
- title: 'SGD and Hogwild! Convergence Without the Bounded Gradients Assumption'
abstract: 'Stochastic gradient descent (SGD) is the optimization algorithm of choice in many machine learning applications such as regularized empirical risk minimization and training deep neural networks. The classical convergence analysis of SGD is carried out under the assumption that the norm of the stochastic gradient is uniformly bounded. While this might hold for some loss functions, it is always violated for cases where the objective function is strongly convex. In (Bottou et al.,2016), a new analysis of convergence of SGD is performed under the assumption that stochastic gradients are bounded with respect to the true gradient norm. Here we show that for stochastic problems arising in machine learning such bound always holds; and we also propose an alternative convergence analysis of SGD with diminishing learning rate regime, which results in more relaxed conditions than those in (Bottou et al.,2016). We then move on the asynchronous parallel setting, and prove convergence of Hogwild! algorithm in the same regime, obtaining the first convergence results for this method in the case of diminished learning rate.'
volume: 80
URL: https://proceedings.mlr.press/v80/nguyen18c.html
PDF: http://proceedings.mlr.press/v80/nguyen18c/nguyen18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nguyen18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lam
family: Nguyen
- given: PHUONG HA
family: NGUYEN
- given: Marten
family: Dijk
- given: Peter
family: Richtarik
- given: Katya
family: Scheinberg
- given: Martin
family: Takac
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3750-3758
id: nguyen18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3750
lastpage: 3758
published: 2018-07-03 00:00:00 +0000
- title: 'Active Testing: An Efficient and Robust Framework for Estimating Accuracy'
abstract: 'Much recent work on large-scale visual recogni- tion aims to scale up learning to massive, noisily- annotated datasets. We address the problem of scaling-up the evaluation of such models to large- scale datasets with noisy labels. Current protocols for doing so require a human user to either vet (re-annotate) a small fraction of the testset and ignore the rest, or else correct errors in annotation as they are found through manual inspection of results. In this work, we re-formulate the problem as one of active testing, and examine strategies for efficiently querying a user so as to obtain an accurate performance estimate with minimal vet- ting. We demonstrate the effectiveness of our proposed active testing framework on estimating two performance metrics, Precision@K and mean Average Precisions, for two popular Computer Vi- sion tasks, multilabel classification and instance segmentation, respectively. We further show that our approach is able to siginificantly save human annotation effort and more robust than alterna- tive evaluation protocols.'
volume: 80
URL: https://proceedings.mlr.press/v80/nguyen18d.html
PDF: http://proceedings.mlr.press/v80/nguyen18d/nguyen18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nguyen18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Phuc
family: Nguyen
- given: Deva
family: Ramanan
- given: Charless
family: Fowlkes
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3759-3768
id: nguyen18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3759
lastpage: 3768
published: 2018-07-03 00:00:00 +0000
- title: 'On Learning Sparsely Used Dictionaries from Incomplete Samples'
abstract: 'Existing algorithms for dictionary learning assume that the entries of the (high-dimensional) input data are fully observed. However, in several practical applications, only an incomplete fraction of the data entries may be available. For incomplete settings, no provably correct and polynomial-time algorithm has been reported in the dictionary learning literature. In this paper, we provide provable approaches for learning – from incomplete samples – a family of dictionaries whose atoms have sufficiently “spread-out” mass. First, we propose a descent-style iterative algorithm that linearly converges to the true dictionary when provided a sufficiently coarse initial estimate. Second, we propose an initialization algorithm that utilizes a small number of extra fully observed samples to produce such a coarse initial estimate. Finally, we theoretically analyze their performance and provide asymptotic statistical and computational guarantees.'
volume: 80
URL: https://proceedings.mlr.press/v80/nguyen18e.html
PDF: http://proceedings.mlr.press/v80/nguyen18e/nguyen18e.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nguyen18e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thanh
family: Nguyen
- given: Akshay
family: Soni
- given: Chinmay
family: Hegde
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3769-3778
id: nguyen18e
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3769
lastpage: 3778
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry'
abstract: 'We are concerned with the discovery of hierarchical relationships from large-scale unstructured similarity scores. For this purpose, we study different models of hyperbolic space and find that learning embeddings in the Lorentz model is substantially more efficient than in the Poincar{é}-ball model. We show that the proposed approach allows us to learn high-quality embeddings of large taxonomies which yield improvements over Poincar{é} embeddings, especially in low dimensions. Lastly, we apply our model to discover hierarchies in two real-world datasets: we show that an embedding in hyperbolic space can reveal important aspects of a company’s organizational structure as well as reveal historical relationships between language families.'
volume: 80
URL: https://proceedings.mlr.press/v80/nickel18a.html
PDF: http://proceedings.mlr.press/v80/nickel18a/nickel18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nickel18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maximillian
family: Nickel
- given: Douwe
family: Kiela
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3779-3788
id: nickel18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3779
lastpage: 3788
published: 2018-07-03 00:00:00 +0000
- title: 'State Space Gaussian Processes with Non-Gaussian Likelihood'
abstract: 'We provide a comprehensive overview and tooling for GP modelling with non-Gaussian likelihoods using state space methods. The state space formulation allows for solving one-dimensonal GP models in O(n) time and memory complexity. While existing literature has focused on the connection between GP regression and state space methods, the computational primitives allowing for inference using general likelihoods in combination with the Laplace approximation (LA), variational Bayes (VB), and assumed density filtering (ADF) / expectation propagation (EP) schemes has been largely overlooked. We present means of combining the efficient O(n) state space methodology with existing inference methods. We also furher extend existing methods, and provide unifying code implementing all approaches.'
volume: 80
URL: https://proceedings.mlr.press/v80/nickisch18a.html
PDF: http://proceedings.mlr.press/v80/nickisch18a/nickisch18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nickisch18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hannes
family: Nickisch
- given: Arno
family: Solin
- given: Alexander
family: Grigorevskiy
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3789-3798
id: nickisch18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3789
lastpage: 3798
published: 2018-07-03 00:00:00 +0000
- title: 'SparseMAP: Differentiable Sparse Structured Inference'
abstract: 'Structured prediction requires searching over a combinatorial number of structures. To tackle it, we introduce SparseMAP, a new method for sparse structured inference, together with corresponding loss functions. SparseMAP inference is able to automatically select only a few global structures: it is situated between MAP inference, which picks a single structure, and marginal inference, which assigns probability mass to all structures, including implausible ones. Importantly, SparseMAP can be computed using only calls to a MAP oracle, hence it is applicable even to problems where marginal inference is intractable, such as linear assignment. Moreover, thanks to the solution sparsity, gradient backpropagation is efficient regardless of the structure. SparseMAP thus enables us to augment deep neural networks with generic and sparse structured hidden layers. Experiments in dependency parsing and natural language inference reveal competitive accuracy, improved interpretability, and the ability to capture natural language ambiguities, which is attractive for pipeline systems.'
volume: 80
URL: https://proceedings.mlr.press/v80/niculae18a.html
PDF: http://proceedings.mlr.press/v80/niculae18a/niculae18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-niculae18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vlad
family: Niculae
- given: Andre
family: Martins
- given: Mathieu
family: Blondel
- given: Claire
family: Cardie
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3799-3808
id: niculae18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3799
lastpage: 3808
published: 2018-07-03 00:00:00 +0000
- title: 'A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations'
abstract: 'Backpropagation-based visualizations have been proposed to interpret convolutional neural networks (CNNs), however a theory is missing to justify their behaviors: Guided backpropagation (GBP) and deconvolutional network (DeconvNet) generate more human-interpretable but less class-sensitive visualizations than saliency map. Motivated by this, we develop a theoretical explanation revealing that GBP and DeconvNet are essentially doing (partial) image recovery which is unrelated to the network decisions. Specifically, our analysis shows that the backward ReLU introduced by GBP and DeconvNet, and the local connections in CNNs are the two main causes of compelling visualizations. Extensive experiments are provided that support the theoretical analysis.'
volume: 80
URL: https://proceedings.mlr.press/v80/nie18a.html
PDF: http://proceedings.mlr.press/v80/nie18a/nie18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nie18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Weili
family: Nie
- given: Yang
family: Zhang
- given: Ankit
family: Patel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3809-3818
id: nie18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3809
lastpage: 3818
published: 2018-07-03 00:00:00 +0000
- title: 'Functional Gradient Boosting based on Residual Network Perception'
abstract: 'Residual Networks (ResNets) have become state-of-the-art models in deep learning and several theoretical studies have been devoted to understanding why ResNet works so well. One attractive viewpoint on ResNet is that it is optimizing the risk in a functional space by consisting of an ensemble of effective features. In this paper, we adopt this viewpoint to construct a new gradient boosting method, which is known to be very powerful in data analysis. To do so, we formalize the boosting perspective of ResNet mathematically using the notion of functional gradients and propose a new method called ResFGB for classification tasks by leveraging ResNet perception. Two types of generalization guarantees are provided from the optimization perspective: one is the margin bound and the other is the expected risk bound by the sample-splitting technique. Experimental results show superior performance of the proposed method over state-of-the-art methods such as LightGBM.'
volume: 80
URL: https://proceedings.mlr.press/v80/nitanda18a.html
PDF: http://proceedings.mlr.press/v80/nitanda18a/nitanda18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-nitanda18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Atsushi
family: Nitanda
- given: Taiji
family: Suzuki
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3819-3828
id: nitanda18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3819
lastpage: 3828
published: 2018-07-03 00:00:00 +0000
- title: 'Beyond 1/2-Approximation for Submodular Maximization on Massive Data Streams'
abstract: 'Many tasks in machine learning and data mining, such as data diversification, non-parametric learning, kernel machines, clustering etc., require extracting a small but representative summary from a massive dataset. Often, such problems can be posed as maximizing a submodular set function subject to a cardinality constraint. We consider this question in the streaming setting, where elements arrive over time at a fast pace and thus we need to design an efficient, low-memory algorithm. One such method, proposed by Badanidiyuru et al. (2014), always finds a 0.5-approximate solution. Can this approximation factor be improved? We answer this question affirmatively by designing a new algorithm Salsa for streaming submodular maximization. It is the first low-memory, singlepass algorithm that improves the factor 0.5, under the natural assumption that elements arrive in a random order. We also show that this assumption is necessary, i.e., that there is no such algorithm with better than 0.5-approximation when elements arrive in arbitrary order. Our experiments demonstrate that Salsa significantly outperforms the state of the art in applications related to exemplar-based clustering, social graph analysis, and recommender systems.'
volume: 80
URL: https://proceedings.mlr.press/v80/norouzi-fard18a.html
PDF: http://proceedings.mlr.press/v80/norouzi-fard18a/norouzi-fard18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-norouzi-fard18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ashkan
family: Norouzi-Fard
- given: Jakub
family: Tarnawski
- given: Slobodan
family: Mitrovic
- given: Amir
family: Zandieh
- given: Aidasadat
family: Mousavifar
- given: Ola
family: Svensson
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3829-3838
id: norouzi-fard18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3829
lastpage: 3838
published: 2018-07-03 00:00:00 +0000
- title: 'The Uncertainty Bellman Equation and Exploration'
abstract: 'We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar uncertainty Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the posterior distribution of the Q-values induced by any policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with complex generalization. Substituting our UBE-exploration strategy for $\epsilon$-greedy improves DQN performance on 51 out of 57 games in the Atari suite.'
volume: 80
URL: https://proceedings.mlr.press/v80/odonoghue18a.html
PDF: http://proceedings.mlr.press/v80/odonoghue18a/odonoghue18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-odonoghue18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brendan
family: O’Donoghue
- given: Ian
family: Osband
- given: Remi
family: Munos
- given: Vlad
family: Mnih
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3839-3848
id: odonoghue18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3839
lastpage: 3848
published: 2018-07-03 00:00:00 +0000
- title: 'Is Generator Conditioning Causally Related to GAN Performance?'
abstract: 'Recent work suggests that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning. Motivated by this, we study the distribution of singular values of the Jacobian of the generator in Generative Adversarial Networks. We find that this Jacobian generally becomes ill-conditioned at the beginning of training. Moreover, we find that the average (across the latent space) conditioning of the generator is highly predictive of two other ad-hoc metrics for measuring the “quality” of trained GANs: the Inception Score and the Frechet Inception Distance. We then test the hypothesis that this relationship is causal by proposing a “regularization” technique (called Jacobian Clamping) that softly penalizes the condition number of the generator Jacobian. Jacobian Clamping improves the mean score for nearly all datasets on which we tested it. It also greatly reduces inter-run variance of the aforementioned scores, addressing (at least partially) one of the main criticisms of GANs.'
volume: 80
URL: https://proceedings.mlr.press/v80/odena18a.html
PDF: http://proceedings.mlr.press/v80/odena18a/odena18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-odena18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Augustus
family: Odena
- given: Jacob
family: Buckman
- given: Catherine
family: Olsson
- given: Tom
family: Brown
- given: Christopher
family: Olah
- given: Colin
family: Raffel
- given: Ian
family: Goodfellow
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3849-3858
id: odena18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3849
lastpage: 3858
published: 2018-07-03 00:00:00 +0000
- title: 'Learning in Reproducing Kernel Kreı̆n Spaces'
abstract: 'We formulate a novel regularized risk minimization problem for learning in reproducing kernel Kre{ı̆}n spaces and show that the strong representer theorem applies to it. As a result of the latter, the learning problem can be expressed as the minimization of a quadratic form over a hypersphere of constant radius. We present an algorithm that can find a globally optimal solution to this non-convex optimization problem in time cubic in the number of instances. Moreover, we derive the gradient of the solution with respect to its hyperparameters and, in this way, provide means for efficient hyperparameter tuning. The approach comes with a generalization bound expressed in terms of the Rademacher complexity of the corresponding hypothesis space. The major advantage over standard kernel methods is the ability to learn with various domain specific similarity measures for which positive definiteness does not hold or is difficult to establish. The approach is evaluated empirically using indefinite kernels defined on structured as well as vectorial data. The empirical results demonstrate a superior performance of our approach over the state-of-the-art baselines.'
volume: 80
URL: https://proceedings.mlr.press/v80/oglic18a.html
PDF: http://proceedings.mlr.press/v80/oglic18a/oglic18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-oglic18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dino
family: Oglic
- given: Thomas
family: Gaertner
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3859-3867
id: oglic18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3859
lastpage: 3867
published: 2018-07-03 00:00:00 +0000
- title: 'BOCK : Bayesian Optimization with Cylindrical Kernels'
abstract: 'A major challenge in Bayesian Optimization is the boundary issue where an algorithm spends too many evaluations near the boundary of its search space. In this paper, we propose BOCK, Bayesian Optimization with Cylindrical Kernels, whose basic idea is to transform the ball geometry of the search space using a cylindrical transformation. Because of the transformed geometry, the Gaussian Process-based surrogate model spends less budget searching near the boundary, while concentrating its efforts relatively more near the center of the search region, where we expect the solution to be located. We evaluate BOCK extensively, showing that it is not only more accurate and efficient, but it also scales successfully to problems with a dimensionality as high as 500. We show that the better accuracy and scalability of BOCK even allows optimizing modestly sized neural network layers, as well as neural network hyperparameters.'
volume: 80
URL: https://proceedings.mlr.press/v80/oh18a.html
PDF: http://proceedings.mlr.press/v80/oh18a/oh18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-oh18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: ChangYong
family: Oh
- given: Efstratios
family: Gavves
- given: Max
family: Welling
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3868-3877
id: oh18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3868
lastpage: 3877
published: 2018-07-03 00:00:00 +0000
- title: 'Self-Imitation Learning'
abstract: 'This paper proposes Self-Imitation Learning (SIL), a simple off-policy actor-critic algorithm that learns to reproduce the agent’s past good decisions. This algorithm is designed to verify our hypothesis that exploiting past good experiences can indirectly drive deep exploration. Our empirical results show that SIL significantly improves advantage actor-critic (A2C) on several hard exploration Atari games and is competitive to the state-of-the-art count-based exploration methods. We also show that SIL improves proximal policy optimization (PPO) on MuJoCo tasks.'
volume: 80
URL: https://proceedings.mlr.press/v80/oh18b.html
PDF: http://proceedings.mlr.press/v80/oh18b/oh18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-oh18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Junhyuk
family: Oh
- given: Yijie
family: Guo
- given: Satinder
family: Singh
- given: Honglak
family: Lee
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3878-3887
id: oh18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3878
lastpage: 3887
published: 2018-07-03 00:00:00 +0000
- title: 'A probabilistic framework for multi-view feature learning with many-to-many associations via neural networks'
abstract: 'A simple framework Probabilistic Multi-view Graph Embedding (PMvGE) is proposed for multi-view feature learning with many-to-many associations so that it generalizes various existing multi-view methods. PMvGE is a probabilistic model for predicting new associations via graph embedding of the nodes of data vectors with links of their associations. Multi-view data vectors with many-to-many associations are transformed by neural networks to feature vectors in a shared space, and the probability of new association between two data vectors is modeled by the inner product of their feature vectors. While existing multi-view feature learning techniques can treat only either of many-to-many association or non-linear transformation, PMvGE can treat both simultaneously. By combining Mercer’s theorem and the universal approximation theorem, we prove that PMvGE learns a wide class of similarity measures across views. Our likelihood-based estimator enables efficient computation of non-linear transformations of data vectors in large-scale datasets by minibatch SGD, and numerical experiments illustrate that PMvGE outperforms existing multi-view methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/okuno18a.html
PDF: http://proceedings.mlr.press/v80/okuno18a/okuno18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-okuno18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Akifumi
family: Okuno
- given: Tetsuya
family: Hada
- given: Hidetoshi
family: Shimodaira
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3888-3897
id: okuno18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3888
lastpage: 3897
published: 2018-07-03 00:00:00 +0000
- title: 'Transformation Autoregressive Networks'
abstract: 'The fundamental task of general density estimation $p(x)$ has been of keen interest to machine learning. In this work, we attempt to systematically characterize methods for density estimation. Broadly speaking, most of the existing methods can be categorized into either using: *a*) autoregressive models to estimate the conditional factors of the chain rule, $p(x_{i}\, |\, x_{i-1}, \ldots)$; or *b*) non-linear transformations of variables of a simple base distribution. Based on the study of the characteristics of these categories, we propose multiple novel methods for each category. For example we propose RNN based transformations to model non-Markovian dependencies. Further, through a comprehensive study over both real world and synthetic data, we show that jointly leveraging transformations of variables and autoregressive conditional models, results in a considerable improvement in performance. We illustrate the use of our models in outlier detection and image modeling. Finally we introduce a novel data driven framework for learning a family of distributions.'
volume: 80
URL: https://proceedings.mlr.press/v80/oliva18a.html
PDF: http://proceedings.mlr.press/v80/oliva18a/oliva18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-oliva18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Junier
family: Oliva
- given: Avinava
family: Dubey
- given: Manzil
family: Zaheer
- given: Barnabas
family: Poczos
- given: Ruslan
family: Salakhutdinov
- given: Eric
family: Xing
- given: Jeff
family: Schneider
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3898-3907
id: oliva18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3898
lastpage: 3907
published: 2018-07-03 00:00:00 +0000
- title: 'Design of Experiments for Model Discrimination Hybridising Analytical and Data-Driven Approaches'
abstract: 'Healthcare companies must submit pharmaceutical drugs or medical device to regulatory bodies before marketing new technology. Regulatory bodies frequently require transparent and interpretable computational modelling to justify a new healthcare technology, but researchers may have several competing models for a biological system and too little data to discriminate between the models. In design of experiments for model discrimination, where the goal is to design maximally informative physical experiments in order to discriminate between rival predictive models, research has focused either on analytical approaches, which cannot manage all functions, or on data-driven approaches, which may have computational difficulties or lack interpretable marginal predictive distributions. We develop a methodology for introducing Gaussian process surrogates in lieu of the original mechanistic models. This allows us to extend existing design and model discrimination methods developed for analytical models to cases of non-analytical models.'
volume: 80
URL: https://proceedings.mlr.press/v80/olofsson18a.html
PDF: http://proceedings.mlr.press/v80/olofsson18a/olofsson18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-olofsson18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Simon
family: Olofsson
- given: Marc
family: Deisenroth
- given: Ruth
family: Misener
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3908-3917
id: olofsson18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3908
lastpage: 3917
published: 2018-07-03 00:00:00 +0000
- title: 'Parallel WaveNet: Fast High-Fidelity Speech Synthesis'
abstract: 'The recently-developed WaveNet architecture is the current state of the art in realistic speech synthesis, consistently rated as more natural sounding for many different languages than any previous system. However, because WaveNet relies on sequential generation of one audio sample at a time, it is poorly suited to today’s massively parallel computers, and therefore hard to deploy in a real-time production setting. This paper introduces Probability Density Distillation, a new method for training a parallel feed-forward network from a trained WaveNet with no significant difference in quality. The resulting system is capable of generating high-fidelity speech samples at more than 20 times faster than real-time, a 1000x speed up relative to the original WaveNet, and capable of serving multiple English and Japanese voices in a production setting.'
volume: 80
URL: https://proceedings.mlr.press/v80/oord18a.html
PDF: http://proceedings.mlr.press/v80/oord18a/oord18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-oord18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aaron
family: Oord
- given: Yazhe
family: Li
- given: Igor
family: Babuschkin
- given: Karen
family: Simonyan
- given: Oriol
family: Vinyals
- given: Koray
family: Kavukcuoglu
- given: George
family: Driessche
- given: Edward
family: Lockhart
- given: Luis
family: Cobo
- given: Florian
family: Stimberg
- given: Norman
family: Casagrande
- given: Dominik
family: Grewe
- given: Seb
family: Noury
- given: Sander
family: Dieleman
- given: Erich
family: Elsen
- given: Nal
family: Kalchbrenner
- given: Heiga
family: Zen
- given: Alex
family: Graves
- given: Helen
family: King
- given: Tom
family: Walters
- given: Dan
family: Belov
- given: Demis
family: Hassabis
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3918-3926
id: oord18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3918
lastpage: 3926
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Localized Spatio-Temporal Models From Streaming Data'
abstract: 'We address the problem of predicting spatio-temporal processes with temporal patterns that vary across spatial regions, when data is obtained as a stream. That is, when the training dataset is augmented sequentially. Specifically, we develop a localized spatio-temporal covariance model of the process that can capture spatially varying temporal periodicities in the data. We then apply a covariance-fitting methodology to learn the model parameters which yields a predictor that can be updated sequentially with each new data point. The proposed method is evaluated using both synthetic and real climate data which demonstrate its ability to accurately predict data missing in spatial regions over time.'
volume: 80
URL: https://proceedings.mlr.press/v80/osama18a.html
PDF: http://proceedings.mlr.press/v80/osama18a/osama18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-osama18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Muhammad
family: Osama
- given: Dave
family: Zachariah
- given: Thomas
family: Schön
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3927-3935
id: osama18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3927
lastpage: 3935
published: 2018-07-03 00:00:00 +0000
- title: 'Autoregressive Quantile Networks for Generative Modeling'
abstract: 'We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression. AIQN is able to achieve superior perceptual quality and improvements in evaluation metrics, without incurring a loss of sample diversity. The method can be applied to many existing models and architectures. In this work we extend the PixelCNN model with AIQN and demonstrate results on CIFAR-10 and ImageNet using Inception scores, FID, non-cherry-picked samples, and inpainting results. We consistently observe that AIQN yields a highly stable algorithm that improves perceptual quality while maintaining a highly diverse distribution.'
volume: 80
URL: https://proceedings.mlr.press/v80/ostrovski18a.html
PDF: http://proceedings.mlr.press/v80/ostrovski18a/ostrovski18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ostrovski18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Georg
family: Ostrovski
- given: Will
family: Dabney
- given: Remi
family: Munos
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3936-3945
id: ostrovski18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3936
lastpage: 3945
published: 2018-07-03 00:00:00 +0000
- title: 'Efficient First-Order Algorithms for Adaptive Signal Denoising'
abstract: 'We consider the problem of discrete-time signal denoising, focusing on a specific family of non-linear convolution-type estimators. Each such estimator is associated with a time-invariant filter which is obtained adaptively, by solving a certain convex optimization problem. Adaptive convolution-type estimators were demonstrated to have favorable statistical properties, see (Juditsky & Nemirovski, 2009; 2010; Harchaoui et al., 2015b; Ostrovsky et al., 2016). Our first contribution is an efficient implementation of these estimators via the known first-order proximal algorithms. Our second contribution is a computational complexity analysis of the proposed procedures, which takes into account their statistical nature and the related notion of statistical accuracy. The proposed procedures and their analysis are illustrated on a simulated data benchmark.'
volume: 80
URL: https://proceedings.mlr.press/v80/ostrovskii18a.html
PDF: http://proceedings.mlr.press/v80/ostrovskii18a/ostrovskii18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ostrovskii18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dmitrii
family: Ostrovskii
- given: Zaid
family: Harchaoui
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3946-3955
id: ostrovskii18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3946
lastpage: 3955
published: 2018-07-03 00:00:00 +0000
- title: 'Analyzing Uncertainty in Neural Machine Translation'
abstract: 'Machine translation is a popular test bed for research in neural sequence-to-sequence models but despite much recent research, there is still a lack of understanding of these models. Practitioners report performance degradation with large beams, the under-estimation of rare words and a lack of diversity in the final translations. Our study relates some of these issues to the inherent uncertainty of the task, due to the existence of multiple valid translations for a single source sentence, and to the extrinsic uncertainty caused by noisy training data. We propose tools and metrics to assess how uncertainty in the data is captured by the model distribution and how it affects search strategies that generate translations. Our results show that search works remarkably well but that the models tend to spread too much probability mass over the hypothesis space. Next, we propose tools to assess model calibration and show how to easily fix some shortcomings of current models. We release both code and multiple human reference translations for two popular benchmarks.'
volume: 80
URL: https://proceedings.mlr.press/v80/ott18a.html
PDF: http://proceedings.mlr.press/v80/ott18a/ott18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ott18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Myle
family: Ott
- given: Michael
family: Auli
- given: David
family: Grangier
- given: Marc’Aurelio
family: Ranzato
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3956-3965
id: ott18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3956
lastpage: 3965
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Compact Neural Networks with Regularization'
abstract: 'Proper regularization is critical for speeding up training, improving generalization performance, and learning compact models that are cost efficient. We propose and analyze regularized gradient descent algorithms for learning shallow neural networks. Our framework is general and covers weight-sharing (convolutional networks), sparsity (network pruning), and low-rank constraints among others. We first introduce covering dimension to quantify the complexity of the constraint set and provide insights on the generalization properties. Then, we show that proposed algorithms become well-behaved and local linear convergence occurs once the amount of data exceeds the covering dimension. Overall, our results demonstrate that near-optimal sample complexity is sufficient for efficient learning and illustrate how regularization can be beneficial to learn over-parameterized networks.'
volume: 80
URL: https://proceedings.mlr.press/v80/oymak18a.html
PDF: http://proceedings.mlr.press/v80/oymak18a/oymak18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-oymak18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Samet
family: Oymak
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3966-3975
id: oymak18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3966
lastpage: 3975
published: 2018-07-03 00:00:00 +0000
- title: 'Tree Edit Distance Learning via Adaptive Symbol Embeddings'
abstract: 'Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has demonstrated that metric learning approaches can also be applied to trees, such as molecular structures, abstract syntax trees of computer programs, or syntax trees of natural language, by learning the cost function of an edit distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree. However, learning such costs directly may yield an edit distance which violates metric axioms, is challenging to interpret, and may not generalize well. In this contribution, we propose a novel metric learning approach for trees which we call embedding edit distance learning (BEDL) and which learns an edit distance indirectly by embedding the tree nodes as vectors, such that the Euclidean distance between those vectors supports class discrimination. We learn such embeddings by reducing the distance to prototypical trees from the same class and increasing the distance to prototypical trees from different classes. In our experiments, we show that BEDL improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.'
volume: 80
URL: https://proceedings.mlr.press/v80/paassen18a.html
PDF: http://proceedings.mlr.press/v80/paassen18a/paassen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-paassen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Benjamin
family: Paaßen
- given: Claudio
family: Gallicchio
- given: Alessio
family: Micheli
- given: Barbara
family: Hammer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3976-3985
id: paassen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3976
lastpage: 3985
published: 2018-07-03 00:00:00 +0000
- title: 'Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control'
abstract: 'Recent work has shown that reinforcement learning (RL) is a promising approach to control dynamical systems described by partial differential equations (PDE). This paper shows how to use RL to tackle more general PDE control problems that have continuous high-dimensional action spaces with spatial relationship among action dimensions. In particular, we propose the concept of action descriptors, which encode regularities among spatially-extended action dimensions and enable the agent to control high-dimensional action PDEs. We provide theoretical evidence suggesting that this approach can be more sample efficient compared to a conventional approach that treats each action dimension separately and does not explicitly exploit the spatial regularity of the action space. The action descriptor approach is then used within the deep deterministic policy gradient algorithm. Experiments on two PDE control problems, with up to 256-dimensional continuous actions, show the advantage of the proposed approach over the conventional one.'
volume: 80
URL: https://proceedings.mlr.press/v80/pan18a.html
PDF: http://proceedings.mlr.press/v80/pan18a/pan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yangchen
family: Pan
- given: Amir-massoud
family: Farahmand
- given: Martha
family: White
- given: Saleh
family: Nabi
- given: Piyush
family: Grover
- given: Daniel
family: Nikovski
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3986-3995
id: pan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3986
lastpage: 3995
published: 2018-07-03 00:00:00 +0000
- title: 'Learning to Speed Up Structured Output Prediction'
abstract: 'Predicting structured outputs can be computationally onerous due to the combinatorially large output spaces. In this paper, we focus on reducing the prediction time of a trained black-box structured classifier without losing accuracy. To do so, we train a speedup classifier that learns to mimic a black-box classifier under the learning-to-search approach. As the structured classifier predicts more examples, the speedup classifier will operate as a learned heuristic to guide search to favorable regions of the output space. We present a mistake bound for the speedup classifier and identify inference situations where it can independently make correct judgments without input features. We evaluate our method on the task of entity and relation extraction and show that the speedup classifier outperforms even greedy search in terms of speed without loss of accuracy.'
volume: 80
URL: https://proceedings.mlr.press/v80/pan18b.html
PDF: http://proceedings.mlr.press/v80/pan18b/pan18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pan18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xingyuan
family: Pan
- given: Vivek
family: Srikumar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 3996-4005
id: pan18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 3996
lastpage: 4005
published: 2018-07-03 00:00:00 +0000
- title: 'Theoretical Analysis of Image-to-Image Translation with Adversarial Learning'
abstract: 'Recently, a unified model for image-to-image translation tasks within adversarial learning framework has aroused widespread research interests in computer vision practitioners. Their reported empirical success however lacks solid theoretical interpretations for its inherent mechanism. In this paper, we reformulate their model from a brand-new geometrical perspective and have eventually reached a full interpretation on some interesting but unclear empirical phenomenons from their experiments. Furthermore, by extending the definition of generalization for generative adversarial nets to a broader sense, we have derived a condition to control the generalization capability of their model. According to our derived condition, several practical suggestions have also been proposed on model design and dataset construction as a guidance for further empirical researches.'
volume: 80
URL: https://proceedings.mlr.press/v80/pan18c.html
PDF: http://proceedings.mlr.press/v80/pan18c/pan18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pan18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xudong
family: Pan
- given: Mi
family: Zhang
- given: Daizong
family: Ding
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4006-4015
id: pan18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4006
lastpage: 4015
published: 2018-07-03 00:00:00 +0000
- title: 'Max-Mahalanobis Linear Discriminant Analysis Networks'
abstract: 'A deep neural network (DNN) consists of a nonlinear transformation from an input to a feature representation, followed by a common softmax linear classifier. Though many efforts have been devoted to designing a proper architecture for nonlinear transformation, little investigation has been done on the classifier part. In this paper, we show that a properly designed classifier can improve robustness to adversarial attacks and lead to better prediction results. Specifically, we define a Max-Mahalanobis distribution (MMD) and theoretically show that if the input distributes as a MMD, the linear discriminant analysis (LDA) classifier will have the best robustness to adversarial examples. We further propose a novel Max-Mahalanobis linear discriminant analysis (MM-LDA) network, which explicitly maps a complicated data distribution in the input space to a MMD in the latent feature space and then applies LDA to make predictions. Our results demonstrate that the MM-LDA networks are significantly more robust to adversarial attacks, and have better performance in class-biased classification.'
volume: 80
URL: https://proceedings.mlr.press/v80/pang18a.html
PDF: http://proceedings.mlr.press/v80/pang18a/pang18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pang18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tianyu
family: Pang
- given: Chao
family: Du
- given: Jun
family: Zhu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4016-4025
id: pang18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4016
lastpage: 4025
published: 2018-07-03 00:00:00 +0000
- title: 'Stochastic Variance-Reduced Policy Gradient'
abstract: 'In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs). Stochastic variance-reduced gradient (SVRG) methods have proven to be very successful in supervised learning. However, their adaptation to policy gradient is not straightforward and needs to account for I) a non-concave objective function; II) approximations in the full gradient computation; and III) a non-stationary sampling process. The result is SVRPG, a stochastic variance-reduced policy gradient algorithm that leverages on importance weights to preserve the unbiasedness of the gradient estimate. Under standard assumptions on the MDP, we provide convergence guarantees for SVRPG with a convergence rate that is linear under increasing batch sizes. Finally, we suggest practical variants of SVRPG, and we empirically evaluate them on continuous MDPs.'
volume: 80
URL: https://proceedings.mlr.press/v80/papini18a.html
PDF: http://proceedings.mlr.press/v80/papini18a/papini18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-papini18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matteo
family: Papini
- given: Damiano
family: Binaghi
- given: Giuseppe
family: Canonaco
- given: Matteo
family: Pirotta
- given: Marcello
family: Restelli
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4026-4035
id: papini18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4026
lastpage: 4035
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Independent Causal Mechanisms'
abstract: 'Statistical learning relies upon data sampled from a distribution, and we usually do not care what actually generated it in the first place. From the point of view of causal modeling, the structure of each distribution is induced by physical mechanisms that give rise to dependences between observables. Mechanisms, however, can be meaningful autonomous modules of generative models that make sense beyond a particular entailed data distribution, lending themselves to transfer between problems. We develop an algorithm to recover a set of independent (inverse) mechanisms from a set of transformed data points. The approach is unsupervised and based on a set of experts that compete for data generated by the mechanisms, driving specialization. We analyze the proposed method in a series of experiments on image data. Each expert learns to map a subset of the transformed data back to a reference distribution. The learned mechanisms generalize to novel domains. We discuss implications for transfer learning and links to recent trends in generative modeling.'
volume: 80
URL: https://proceedings.mlr.press/v80/parascandolo18a.html
PDF: http://proceedings.mlr.press/v80/parascandolo18a/parascandolo18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-parascandolo18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Giambattista
family: Parascandolo
- given: Niki
family: Kilbertus
- given: Mateo
family: Rojas-Carulla
- given: Bernhard
family: Schölkopf
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4036-4044
id: parascandolo18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4036
lastpage: 4044
published: 2018-07-03 00:00:00 +0000
- title: 'Time Limits in Reinforcement Learning'
abstract: 'In reinforcement learning, it is common to let an agent interact for a fixed amount of time with its environment before resetting it and repeating the process in a series of episodes. The task that the agent has to learn can either be to maximize its performance over (i) that fixed period, or (ii) an indefinite period where time limits are only used during training to diversify experience. In this paper, we provide a formal account for how time limits could effectively be handled in each of the two cases and explain why not doing so can cause state-aliasing and invalidation of experience replay, leading to suboptimal policies and training instability. In case (i), we argue that the terminations due to time limits are in fact part of the environment, and thus a notion of the remaining time should be included as part of the agent’s input to avoid violation of the Markov property. In case (ii), the time limits are not part of the environment and are only used to facilitate learning. We argue that this insight should be incorporated by bootstrapping from the value of the state at the end of each partial episode. For both cases, we illustrate empirically the significance of our considerations in improving the performance and stability of existing reinforcement learning algorithms, showing state-of-the-art results on several control tasks.'
volume: 80
URL: https://proceedings.mlr.press/v80/pardo18a.html
PDF: http://proceedings.mlr.press/v80/pardo18a/pardo18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pardo18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Fabio
family: Pardo
- given: Arash
family: Tavakoli
- given: Vitaly
family: Levdik
- given: Petar
family: Kormushev
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4045-4054
id: pardo18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4045
lastpage: 4054
published: 2018-07-03 00:00:00 +0000
- title: 'Image Transformer'
abstract: 'Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood. By restricting the self-attention mechanism to attend to local neighborhoods we significantly increase the size of images the model can process in practice, despite maintaining significantly larger receptive fields per layer than typical convolutional neural networks. While conceptually simple, our generative models significantly outperform the current state of the art in image generation on ImageNet, improving the best published negative log-likelihood on ImageNet from 3.83 to 3.77. We also present results on image super-resolution with a large magnification ratio, applying an encoder-decoder configuration of our architecture. In a human evaluation study, we find that images generated by our super-resolution model fool human observers three times more often than the previous state of the art.'
volume: 80
URL: https://proceedings.mlr.press/v80/parmar18a.html
PDF: http://proceedings.mlr.press/v80/parmar18a/parmar18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-parmar18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Niki
family: Parmar
- given: Ashish
family: Vaswani
- given: Jakob
family: Uszkoreit
- given: Lukasz
family: Kaiser
- given: Noam
family: Shazeer
- given: Alexander
family: Ku
- given: Dustin
family: Tran
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4055-4064
id: parmar18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4055
lastpage: 4064
published: 2018-07-03 00:00:00 +0000
- title: 'PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos'
abstract: 'Previously, the exploding gradient problem has been explained to be central in deep learning and model-based reinforcement learning, because it causes numerical issues and instability in optimization. Our experiments in model-based reinforcement learning imply that the problem is not just a numerical issue, but it may be caused by a fundamental chaos-like nature of long chains of nonlinear computations. Not only do the magnitudes of the gradients become large, the direction of the gradients becomes essentially random. We show that reparameterization gradients suffer from the problem, while likelihood ratio gradients are robust. Using our insights, we develop a model-based policy search framework, Probabilistic Inference for Particle-Based Policy Search (PIPPS), which is easily extensible, and allows for almost arbitrary models and policies, while simultaneously matching the performance of previous data-efficient learning algorithms. Finally, we invent the total propagation algorithm, which efficiently computes a union over all pathwise derivative depths during a single backwards pass, automatically giving greater weight to estimators with lower variance, sometimes improving over reparameterization gradients by $10^6$ times.'
volume: 80
URL: https://proceedings.mlr.press/v80/parmas18a.html
PDF: http://proceedings.mlr.press/v80/parmas18a/parmas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-parmas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Paavo
family: Parmas
- given: Carl Edward
family: Rasmussen
- given: Jan
family: Peters
- given: Kenji
family: Doya
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4065-4074
id: parmas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4065
lastpage: 4074
published: 2018-07-03 00:00:00 +0000
- title: 'High-Quality Prediction Intervals for Deep Learning: A Distribution-Free, Ensembled Approach'
abstract: 'This paper considers the generation of prediction intervals (PIs) by neural networks for quantifying uncertainty in regression tasks. It is axiomatic that high-quality PIs should be as narrow as possible, whilst capturing a specified portion of data. We derive a loss function directly from this axiom that requires no distributional assumption. We show how its form derives from a likelihood principle, that it can be used with gradient descent, and that model uncertainty is accounted for in ensembled form. Benchmark experiments show the method outperforms current state-of-the-art uncertainty quantification methods, reducing average PI width by over 10%.'
volume: 80
URL: https://proceedings.mlr.press/v80/pearce18a.html
PDF: http://proceedings.mlr.press/v80/pearce18a/pearce18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pearce18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tim
family: Pearce
- given: Alexandra
family: Brintrup
- given: Mohamed
family: Zaki
- given: Andy
family: Neely
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4075-4084
id: pearce18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4075
lastpage: 4084
published: 2018-07-03 00:00:00 +0000
- title: 'Adaptive Three Operator Splitting'
abstract: 'We propose and analyze a novel adaptive step size variant of the Davis-Yin three operator splitting, a method that can solve optimization problems composed of a sum of a smooth term for which we have access to its gradient and an arbitrary number of potentially non-smooth terms for which we have access to their proximal operator. The proposed method leverages local information of the objective function, allowing for larger step sizes while preserving the convergence properties of the original method. It only requires two extra function evaluations per iteration and does not depend on any step size hyperparameter besides an initial estimate. We provide a convergence rate analysis of this method, showing sublinear convergence rate for general convex functions and linear convergence under stronger assumptions, matching the best known rates of its non adaptive variant. Finally, an empirical comparison with related methods on 6 different problems illustrates the computational advantage of the adaptive step size strategy.'
volume: 80
URL: https://proceedings.mlr.press/v80/pedregosa18a.html
PDF: http://proceedings.mlr.press/v80/pedregosa18a/pedregosa18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pedregosa18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Fabian
family: Pedregosa
- given: Gauthier
family: Gidel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4085-4094
id: pedregosa18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4085
lastpage: 4094
published: 2018-07-03 00:00:00 +0000
- title: 'Efficient Neural Architecture Search via Parameters Sharing'
abstract: 'We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. ENAS constructs a large computational graph, where each subgraph represents a neural network architecture, hence forcing all architectures to share their parameters. A controller is trained with policy gradient to search for a subgraph that maximizes the expected reward on a validation set. Meanwhile a model corresponding to the selected subgraph is trained to minimize a canonical cross entropy loss. Sharing parameters among child models allows ENAS to deliver strong empirical performances, whilst using much fewer GPU-hours than existing automatic model design approaches, and notably, 1000x less expensive than standard Neural Architecture Search. On Penn Treebank, ENAS discovers a novel architecture that achieves a test perplexity of 56.3, on par with the existing state-of-the-art among all methods without post-training processing. On CIFAR-10, ENAS finds a novel architecture that achieves 2.89% test error, which is on par with the 2.65% test error of NASNet (Zoph et al., 2018).'
volume: 80
URL: https://proceedings.mlr.press/v80/pham18a.html
PDF: http://proceedings.mlr.press/v80/pham18a/pham18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pham18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hieu
family: Pham
- given: Melody
family: Guan
- given: Barret
family: Zoph
- given: Quoc
family: Le
- given: Jeff
family: Dean
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4095-4104
id: pham18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4095
lastpage: 4104
published: 2018-07-03 00:00:00 +0000
- title: 'Bandits with Delayed, Aggregated Anonymous Feedback'
abstract: 'We study a variant of the stochastic $K$-armed bandit problem, which we call "bandits with delayed, aggregated anonymous feedback”. In this problem, when the player pulls an arm, a reward is generated, however it is not immediately observed. Instead, at the end of each round the player observes only the sum of a number of previously generated rewards which happen to arrive in the given round. The rewards are stochastically delayed and due to the aggregated nature of the observations, the information of which arm led to a particular reward is lost. The question is what is the cost of the information loss due to this delayed, aggregated anonymous feedback? Previous works have studied bandits with stochastic, non-anonymous delays and found that the regret increases only by an additive factor relating to the expected delay. In this paper, we show that this additive regret increase can be maintained in the harder delayed, aggregated anonymous feedback setting when the expected delay (or a bound on it) is known. We provide an algorithm that matches the worst case regret of the non-anonymous problem exactly when the delays are bounded, and up to logarithmic factors or an additive variance term for unbounded delays.'
volume: 80
URL: https://proceedings.mlr.press/v80/pike-burke18a.html
PDF: http://proceedings.mlr.press/v80/pike-burke18a/pike-burke18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pike-burke18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ciara
family: Pike-Burke
- given: Shipra
family: Agrawal
- given: Csaba
family: Szepesvari
- given: Steffen
family: Grunewalder
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4105-4113
id: pike-burke18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4105
lastpage: 4113
published: 2018-07-03 00:00:00 +0000
- title: 'Constant-Time Predictive Distributions for Gaussian Processes'
abstract: 'One of the most compelling features of Gaussian process (GP) regression is its ability to provide well-calibrated posterior distributions. Recent advances in inducing point methods have sped up GP marginal likelihood and posterior mean computations, leaving posterior covariance estimation and sampling as the remaining computational bottlenecks. In this paper we address these shortcomings by using the Lanczos algorithm to rapidly approximate the predictive covariance matrix. Our approach, which we refer to as LOVE (LanczOs Variance Estimates), substantially improves time and space complexity. In our experiments, LOVE computes covariances up to 2,000 times faster and draws samples 18,000 times faster than existing methods, all without sacrificing accuracy.'
volume: 80
URL: https://proceedings.mlr.press/v80/pleiss18a.html
PDF: http://proceedings.mlr.press/v80/pleiss18a/pleiss18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pleiss18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Geoff
family: Pleiss
- given: Jacob
family: Gardner
- given: Kilian
family: Weinberger
- given: Andrew Gordon
family: Wilson
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4114-4123
id: pleiss18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4114
lastpage: 4123
published: 2018-07-03 00:00:00 +0000
- title: 'Local Convergence Properties of SAGA/Prox-SVRG and Acceleration'
abstract: 'In this paper, we present a local convergence anal- ysis for a class of stochastic optimisation meth- ods: the proximal variance reduced stochastic gradient methods, and mainly focus on SAGA (Defazio et al., 2014) and Prox-SVRG (Xiao & Zhang, 2014). Under the assumption that the non-smooth component of the optimisation prob- lem is partly smooth relative to a smooth mani- fold, we present a unified framework for the local convergence analysis of SAGA/Prox-SVRG: (i) the sequences generated by the methods are able to identify the smooth manifold in a finite num- ber of iterations; (ii) then the sequence enters a local linear convergence regime. Furthermore, we discuss various possibilities for accelerating these algorithms, including adapting to better lo- cal parameters, and applying higher-order deter- ministic/stochastic optimisation methods which can achieve super-linear convergence. Several concrete examples arising from machine learning are considered to demonstrate the obtained result.'
volume: 80
URL: https://proceedings.mlr.press/v80/poon18a.html
PDF: http://proceedings.mlr.press/v80/poon18a/poon18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-poon18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Clarice
family: Poon
- given: Jingwei
family: Liang
- given: Carola
family: Schoenlieb
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4124-4132
id: poon18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4124
lastpage: 4132
published: 2018-07-03 00:00:00 +0000
- title: 'Equivalence of Multicategory SVM and Simplex Cone SVM: Fast Computations and Statistical Theory'
abstract: 'The multicategory SVM (MSVM) of Lee et al. (2004) is a natural generalization of the classical, binary support vector machines (SVM). However, its use has been limited by computational difficulties. The simplex-cone SVM (SCSVM) of Mroueh et al. (2012) is a computationally efficient multicategory classifier, but its use has been limited by a seemingly opaque interpretation. We show that MSVM and SCSVM are in fact exactly equivalent, and provide a bijection between their tuning parameters. MSVM may then be entertained as both a natural and computationally efficient multicategory extension of SVM. We further provide a Donsker theorem for finite-dimensional kernel MSVM and partially answer the open question pertaining to the very competitive performance of One-vs-Rest methods against MSVM. Furthermore, we use the derived asymptotic covariance formula to develop an inverse-variance weighted classification rule which improves on the One-vs-Rest approach.'
volume: 80
URL: https://proceedings.mlr.press/v80/pouliot18a.html
PDF: http://proceedings.mlr.press/v80/pouliot18a/pouliot18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pouliot18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Guillaume
family: Pouliot
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4133-4140
id: pouliot18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4133
lastpage: 4140
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Dynamics of Linear Denoising Autoencoders'
abstract: 'Denoising autoencoders (DAEs) have proven useful for unsupervised representation learning, but a thorough theoretical understanding is still lacking of how the input noise influences learning. Here we develop theory for how noise influences learning in DAEs. By focusing on linear DAEs, we are able to derive analytic expressions that exactly describe their learning dynamics. We verify our theoretical predictions with simulations as well as experiments on MNIST and CIFAR-10. The theory illustrates how, when tuned correctly, noise allows DAEs to ignore low variance directions in the inputs while learning to reconstruct them. Furthermore, in a comparison of the learning dynamics of DAEs to standard regularised autoencoders, we show that noise has a similar regularisation effect to weight decay, but with faster training dynamics. We also show that our theoretical predictions approximate learning dynamics on real-world data and qualitatively match observed dynamics in nonlinear DAEs.'
volume: 80
URL: https://proceedings.mlr.press/v80/pretorius18a.html
PDF: http://proceedings.mlr.press/v80/pretorius18a/pretorius18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pretorius18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arnu
family: Pretorius
- given: Steve
family: Kroon
- given: Herman
family: Kamper
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4141-4150
id: pretorius18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4141
lastpage: 4150
published: 2018-07-03 00:00:00 +0000
- title: 'JointGAN: Multi-Domain Joint Distribution Learning with Generative Adversarial Nets'
abstract: 'A new generative adversarial network is developed for joint distribution matching.Distinct from most existing approaches, that only learn conditional distributions, the proposed model aims to learn a joint distribution of multiple random variables (domains). This is achieved by learning to sample from conditional distributions between the domains, while simultaneously learning to sample from the marginals of each individual domain.The proposed framework consists of multiple generators and a single softmax-based critic, all jointly trained via adversarial learning.From a simple noise source, the proposed framework allows synthesis of draws from the marginals, conditional draws given observations from a subset of random variables, or complete draws from the full joint distribution. Most examples considered are for joint analysis of two domains, with examples for three domains also presented.'
volume: 80
URL: https://proceedings.mlr.press/v80/pu18a.html
PDF: http://proceedings.mlr.press/v80/pu18a/pu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yunchen
family: Pu
- given: Shuyang
family: Dai
- given: Zhe
family: Gan
- given: Weiyao
family: Wang
- given: Guoyin
family: Wang
- given: Yizhe
family: Zhang
- given: Ricardo
family: Henao
- given: Lawrence Carin
family: Duke
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4151-4160
id: pu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4151
lastpage: 4160
published: 2018-07-03 00:00:00 +0000
- title: 'Selecting Representative Examples for Program Synthesis'
abstract: 'Program synthesis is a class of regression problems where one seeks a solution, in the form of a source-code program, mapping the inputs to their corresponding outputs exactly. Due to its precise and combinatorial nature, program synthesis is commonly formulated as a constraint satisfaction problem, where input-output examples are encoded as constraints and solved with a constraint solver. A key challenge of this formulation is scalability: while constraint solvers work well with a few well-chosen examples, a large set of examples can incur significant overhead in both time and memory. We describe a method to discover a subset of examples that is both small and representative: the subset is constructed iteratively, using a neural network to predict the probability of unchosen examples conditioned on the chosen examples in the subset, and greedily adding the least probable example. We empirically evaluate the representativeness of the subsets constructed by our method, and demonstrate such subsets can significantly improve synthesis time and stability.'
volume: 80
URL: https://proceedings.mlr.press/v80/pu18b.html
PDF: http://proceedings.mlr.press/v80/pu18b/pu18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-pu18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yewen
family: Pu
- given: Zachery
family: Miranda
- given: Armando
family: Solar-Lezama
- given: Leslie
family: Kaelbling
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4161-4170
id: pu18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4161
lastpage: 4170
published: 2018-07-03 00:00:00 +0000
- title: 'Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction'
abstract: 'Future predictions on sequence data (e.g., videos or audios) require the algorithms to capture non-Markovian and compositional properties of high-level semantics. Context-free grammars are natural choices to capture such properties, but traditional grammar parsers (e.g., Earley parser) only take symbolic sentences as inputs. In this paper, we generalize the Earley parser to parse sequence data which is neither segmented nor labeled. This generalized Earley parser integrates a grammar parser with a classifier to find the optimal segmentation and labels, and makes top-down future predictions. Experiments show that our method significantly outperforms other approaches for future human activity prediction.'
volume: 80
URL: https://proceedings.mlr.press/v80/qi18a.html
PDF: http://proceedings.mlr.press/v80/qi18a/qi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-qi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siyuan
family: Qi
- given: Baoxiong
family: Jia
- given: Song-Chun
family: Zhu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4171-4179
id: qi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4171
lastpage: 4179
published: 2018-07-03 00:00:00 +0000
- title: 'Do Outliers Ruin Collaboration?'
abstract: 'We consider the problem of learning a binary classifier from $n$ different data sources, among which at most an $\eta$ fraction are adversarial. The overhead is defined as the ratio between the sample complexity of learning in this setting and that of learning the same hypothesis class on a single data distribution. We present an algorithm that achieves an $O(\eta n + \ln n)$ overhead, which is proved to be worst-case optimal. We also discuss the potential challenges to the design of a computationally efficient learning algorithm with a small overhead.'
volume: 80
URL: https://proceedings.mlr.press/v80/qiao18a.html
PDF: http://proceedings.mlr.press/v80/qiao18a/qiao18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-qiao18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mingda
family: Qiao
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4180-4187
id: qiao18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4180
lastpage: 4187
published: 2018-07-03 00:00:00 +0000
- title: 'Gradually Updated Neural Networks for Large-Scale Image Recognition'
abstract: 'Depth is one of the keys that make neural networks succeed in the task of large-scale image recognition. The state-of-the-art network architectures usually increase the depths by cascading convolutional layers or building blocks. In this paper, we present an alternative method to increase the depth. Our method is by introducing computation orderings to the channels within convolutional layers or blocks, based on which we gradually compute the outputs in a channel-wise manner. The added orderings not only increase the depths and the learning capacities of the networks without any additional computation costs, but also eliminate the overlap singularities so that the networks are able to converge faster and perform better. Experiments show that the networks based on our method achieve the state-of-the-art performances on CIFAR and ImageNet datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/qiao18b.html
PDF: http://proceedings.mlr.press/v80/qiao18b/qiao18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-qiao18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siyuan
family: Qiao
- given: Zhishuai
family: Zhang
- given: Wei
family: Shen
- given: Bo
family: Wang
- given: Alan
family: Yuille
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4188-4197
id: qiao18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4188
lastpage: 4197
published: 2018-07-03 00:00:00 +0000
- title: 'DCFNet: Deep Neural Network with Decomposed Convolutional Filters'
abstract: 'Filters in a Convolutional Neural Network (CNN) contain model parameters learned from enormous amounts of data. In this paper, we suggest to decompose convolutional filters in CNN as a truncated expansion with pre-fixed bases, namely the Decomposed Convolutional Filters network (DCFNet), where the expansion coefficients remain learned from data. Such a structure not only reduces the number of trainable parameters and computation, but also imposes filter regularity by bases truncation. Through extensive experiments, we consistently observe that DCFNet maintains accuracy for image classification tasks with a significant reduction of model parameters, particularly with Fourier-Bessel (FB) bases, and even with random bases. Theoretically, we analyze the representation stability of DCFNet with respect to input variations, and prove representation stability under generic assumptions on the expansion coefficients. The analysis is consistent with the empirical observations.'
volume: 80
URL: https://proceedings.mlr.press/v80/qiu18a.html
PDF: http://proceedings.mlr.press/v80/qiu18a/qiu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-qiu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Qiang
family: Qiu
- given: Xiuyuan
family: Cheng
- given:
family: Calderbank
- given: Guillermo
family: Sapiro
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4198-4207
id: qiu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4198
lastpage: 4207
published: 2018-07-03 00:00:00 +0000
- title: 'Non-convex Conditional Gradient Sliding'
abstract: 'We investigate a projection free optimization method, namely non-convex conditional gradient sliding (NCGS) for non-convex optimization problems on the batch, stochastic and finite-sum settings. Conditional gradient sliding (CGS) method, by integrating Nesterov’s accelerated gradient method with Frank-Wolfe (FW) method in a smart way, outperforms FW for convex optimization, by reducing the amount of gradient computations. However, the study of CGS in the non-convex setting is limited. In this paper, we propose the non-convex conditional gradient sliding (NCGS) methods and analyze their convergence properties. We also leverage the idea of variance reduction from the recent progress in convex optimization to obtain a new algorithm termed *variance reduced NCGS* (NCGS-VR), and obtain faster convergence rate than the batch NCGS in the finite-sum setting. We show that NCGS algorithms outperform their Frank-Wolfe counterparts both in theory and in practice, for all three settings, namely the batch, stochastic and finite-sum setting. This significantly improves our understanding of optimizing non-convex functions with complicated feasible sets (where projection is prohibitively expensive).'
volume: 80
URL: https://proceedings.mlr.press/v80/qu18a.html
PDF: http://proceedings.mlr.press/v80/qu18a/qu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-qu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chao
family: Qu
- given: Yan
family: Li
- given: Huan
family: Xu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4208-4217
id: qu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4208
lastpage: 4217
published: 2018-07-03 00:00:00 +0000
- title: 'Machine Theory of Mind'
abstract: 'Theory of mind (ToM) broadly refers to humans’ ability to represent the mental states of others, including their desires, beliefs, and intentions. We design a Theory of Mind neural network {–} a ToMnet {–} which uses meta-learning to build such models of the agents it encounters. The ToMnet learns a strong prior model for agents’ future behaviour, and, using only a small number of behavioural observations, can bootstrap to richer predictions about agents’ characteristics and mental states. We apply the ToMnet to agents behaving in simple gridworld environments, showing that it learns to model random, algorithmic, and deep RL agents from varied populations, and that it passes classic ToM tasks such as the "Sally-Anne" test of recognising that others can hold false beliefs about the world.'
volume: 80
URL: https://proceedings.mlr.press/v80/rabinowitz18a.html
PDF: http://proceedings.mlr.press/v80/rabinowitz18a/rabinowitz18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-rabinowitz18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Neil
family: Rabinowitz
- given: Frank
family: Perbet
- given: Francis
family: Song
- given: Chiyuan
family: Zhang
- given: S. M. Ali
family: Eslami
- given: Matthew
family: Botvinick
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4218-4227
id: rabinowitz18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4218
lastpage: 4227
published: 2018-07-03 00:00:00 +0000
- title: 'Fast Parametric Learning with Activation Memorization'
abstract: 'Neural networks trained with backpropagation often struggle to identify classes that have been observed a small number of times. In applications where most class labels are rare, such as language modelling, this can become a performance bottleneck. One potential remedy is to augment the network with a fast-learning non-parametric model which stores recent activations and class labels into an external memory. We explore a simplified architecture where we treat a subset of the model parameters as fast memory stores. This can help retain information over longer time intervals than a traditional memory, and does not require additional space or compute. In the case of image classification, we display faster binding of novel classes on an Omniglot image curriculum task. We also show improved performance for word-based language models on news reports (GigaWord), books (Project Gutenberg) and Wikipedia articles (WikiText-103) - the latter achieving a state-of-the-art perplexity of 29.2.'
volume: 80
URL: https://proceedings.mlr.press/v80/rae18a.html
PDF: http://proceedings.mlr.press/v80/rae18a/rae18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-rae18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jack
family: Rae
- given: Chris
family: Dyer
- given: Peter
family: Dayan
- given: Timothy
family: Lillicrap
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4228-4237
id: rae18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4228
lastpage: 4237
published: 2018-07-03 00:00:00 +0000
- title: 'Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?'
abstract: 'Deep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose individual actions against such a characterization. Here we consider a family of combinatorial games, arising from work of Erdos, Selfridge, and Spencer, and we propose their use as environments for evaluating and comparing different approaches to reinforcement learning. These games have a number of appealing features: they are challenging for current learning approaches, but they form (i) a low-dimensional, simply parametrized environment where (ii) there is a linear closed form solution for optimal behavior from any state, and (iii) the difficulty of the game can be tuned by changing environment parameters in an interpretable way. We use these Erdos-Selfridge-Spencer games not only to compare different algorithms, but test for generalization, make comparisons to supervised learning, analyse multiagent play, and even develop a self play algorithm.'
volume: 80
URL: https://proceedings.mlr.press/v80/raghu18a.html
PDF: http://proceedings.mlr.press/v80/raghu18a/raghu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-raghu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maithra
family: Raghu
- given: Alex
family: Irpan
- given: Jacob
family: Andreas
- given: Bobby
family: Kleinberg
- given: Quoc
family: Le
- given: Jon
family: Kleinberg
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4238-4246
id: raghu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4238
lastpage: 4246
published: 2018-07-03 00:00:00 +0000
- title: 'Cut-Pursuit Algorithm for Regularizing Nonsmooth Functionals with Graph Total Variation'
abstract: 'We present an extension of the cut-pursuit algorithm, introduced by Landrieu and Obozinski (2017), to the graph total-variation regularization of functions with a separable nondifferentiable part. We propose a modified algorithmic scheme as well as adapted proofs of convergence. We also present a heuristic approach for handling the cases in which the values associated to each vertex of the graph are multidimensional. The performance of our algorithm, which we demonstrate on difficult, ill-conditioned large-scale inverse and learning problems, is such that it may in practice extend the scope of application of the total-variation regularization.'
volume: 80
URL: https://proceedings.mlr.press/v80/raguet18a.html
PDF: http://proceedings.mlr.press/v80/raguet18a/raguet18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-raguet18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hugo
family: Raguet
- given: Loic
family: Landrieu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4247-4256
id: raguet18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4247
lastpage: 4256
published: 2018-07-03 00:00:00 +0000
- title: 'Modeling Others using Oneself in Multi-Agent Reinforcement Learning'
abstract: 'We consider the multi-agent reinforcement learning setting with imperfect information. The reward function depends on the hidden goals of both agents, so the agents must infer the other players’ goals from their observed behavior in order to maximize their returns. We propose a new approach for learning in these domains: Self Other-Modeling (SOM), in which an agent uses its own policy to predict the other agent’s actions and update its belief of their hidden goal in an online manner. We evaluate this approach on three different tasks and show that the agents are able to learn better policies using their estimate of the other players’ goals, in both cooperative and competitive settings.'
volume: 80
URL: https://proceedings.mlr.press/v80/raileanu18a.html
PDF: http://proceedings.mlr.press/v80/raileanu18a/raileanu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-raileanu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Roberta
family: Raileanu
- given: Emily
family: Denton
- given: Arthur
family: Szlam
- given: Rob
family: Fergus
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4257-4266
id: raileanu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4257
lastpage: 4266
published: 2018-07-03 00:00:00 +0000
- title: 'On Nesting Monte Carlo Estimators'
abstract: 'Many problems in machine learning and statistics involve nested expectations and thus do not permit conventional Monte Carlo (MC) estimation. For such problems, one must nest estimators, such that terms in an outer estimator themselves involve calculation of a separate, nested, estimation. We investigate the statistical implications of nesting MC estimators, including cases of multiple levels of nesting, and establish the conditions under which they converge. We derive corresponding rates of convergence and provide empirical evidence that these rates are observed in practice. We further establish a number of pitfalls that can arise from naive nesting of MC estimators, provide guidelines about how these can be avoided, and lay out novel methods for reformulating certain classes of nested expectation problems into single expectations, leading to improved convergence rates. We demonstrate the applicability of our work by using our results to develop a new estimator for discrete Bayesian experimental design problems and derive error bounds for a class of variational objectives.'
volume: 80
URL: https://proceedings.mlr.press/v80/rainforth18a.html
PDF: http://proceedings.mlr.press/v80/rainforth18a/rainforth18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-rainforth18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tom
family: Rainforth
- given: Rob
family: Cornish
- given: Hongseok
family: Yang
- given: Andrew
family: Warrington
- given: Frank
family: Wood
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4267-4276
id: rainforth18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4267
lastpage: 4276
published: 2018-07-03 00:00:00 +0000
- title: 'Tighter Variational Bounds are Not Necessarily Better'
abstract: 'We provide theoretical and empirical evidence that using tighter evidence lower bounds (ELBOs) can be detrimental to the process of learning an inference network by reducing the signal-to-noise ratio of the gradient estimator. Our results call into question common implicit assumptions that tighter ELBOs are better variational objectives for simultaneous model learning and inference amortization schemes. Based on our insights, we introduce three new algorithms: the partially importance weighted auto-encoder (PIWAE), the multiply importance weighted auto-encoder (MIWAE), and the combination importance weighted autoencoder (CIWAE), each of which includes the standard importance weighted auto-encoder (IWAE) as a special case. We show that each can deliver improvements over IWAE, even when performance is measured by the IWAE target itself. Furthermore, our results suggest that PIWAE may be able to deliver simultaneous improvements in the training of both the inference and generative networks.'
volume: 80
URL: https://proceedings.mlr.press/v80/rainforth18b.html
PDF: http://proceedings.mlr.press/v80/rainforth18b/rainforth18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-rainforth18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tom
family: Rainforth
- given: Adam
family: Kosiorek
- given: Tuan Anh
family: Le
- given: Chris
family: Maddison
- given: Maximilian
family: Igl
- given: Frank
family: Wood
- given: Yee Whye
family: Teh
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4277-4285
id: rainforth18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4277
lastpage: 4285
published: 2018-07-03 00:00:00 +0000
- title: 'SAFFRON: an Adaptive Algorithm for Online Control of the False Discovery Rate'
abstract: 'In the online false discovery rate (FDR) problem, one observes a possibly infinite sequence of $p$-values $P_1,P_2,…$, each testing a different null hypothesis, and an algorithm must pick a sequence of rejection thresholds $\alpha_1,\alpha_2,…$ in an online fashion, effectively rejecting the $k$-th null hypothesis whenever $P_k \leq \alpha_k$. Importantly, $\alpha_k$ must be a function of the past, and cannot depend on $P_k$ or any of the later unseen $p$-values, and must be chosen to guarantee that for any time $t$, the FDR up to time $t$ is less than some pre-determined quantity $\alpha \in (0,1)$. In this work, we present a powerful new framework for online FDR control that we refer to as “SAFFRON”. Like older alpha-investing algorithms, SAFFRON starts off with an error budget (called alpha-wealth) that it intelligently allocates to different tests over time, earning back some alpha-wealth whenever it makes a new discovery. However, unlike older methods, SAFFRON’s threshold sequence is based on a novel estimate of the alpha fraction that it allocates to true null hypotheses. In the offline setting, algorithms that employ an estimate of the proportion of true nulls are called “adaptive”, hence SAFFRON can be seen as an online analogue of the offline Storey-BH adaptive procedure. Just as Storey-BH is typically more powerful than the Benjamini-Hochberg (BH) procedure under independence, we demonstrate that SAFFRON is also more powerful than its non-adaptive counterparts such as LORD.'
volume: 80
URL: https://proceedings.mlr.press/v80/ramdas18a.html
PDF: http://proceedings.mlr.press/v80/ramdas18a/ramdas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ramdas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aaditya
family: Ramdas
- given: Tijana
family: Zrnic
- given: Martin
family: Wainwright
- given: Michael
family: Jordan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4286-4294
id: ramdas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4286
lastpage: 4294
published: 2018-07-03 00:00:00 +0000
- title: 'QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning'
abstract: 'In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/rashid18a.html
PDF: http://proceedings.mlr.press/v80/rashid18a/rashid18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-rashid18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tabish
family: Rashid
- given: Mikayel
family: Samvelyan
- given: Christian
family: Schroeder
- given: Gregory
family: Farquhar
- given: Jakob
family: Foerster
- given: Shimon
family: Whiteson
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4295-4304
id: rashid18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4295
lastpage: 4304
published: 2018-07-03 00:00:00 +0000
- title: 'Gradient Coding from Cyclic MDS Codes and Expander Graphs'
abstract: 'Gradient coding is a technique for straggler mitigation in distributed learning. In this paper we design novel gradient codes using tools from classical coding theory, namely, cyclic MDS codes, which compare favourably with existing solutions, both in the applicable range of parameters and in the complexity of the involved algorithms. Second, we introduce an approximate variant of the gradient coding problem, in which we settle for approximate gradient computation instead of the exact one. This approach enables graceful degradation, i.e., the $\ell_2$ error of the approximate gradient is a decreasing function of the number of stragglers. Our main result is that the normalized adjacency matrix of an expander graph can yield excellent approximate gradient codes, and that this approach allows us to perform significantly less computation compared to exact gradient coding. We experimentally test our approach on Amazon EC2, and show that the generalization error of approximate gradient coding is very close to the full gradient while requiring significantly less computation from the workers.'
volume: 80
URL: https://proceedings.mlr.press/v80/raviv18a.html
PDF: http://proceedings.mlr.press/v80/raviv18a/raviv18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-raviv18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Netanel
family: Raviv
- given: Rashish
family: Tandon
- given: Alex
family: Dimakis
- given: Itzhak
family: Tamo
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4305-4313
id: raviv18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4305
lastpage: 4313
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Implicit Generative Models with the Method of Learned Moments'
abstract: 'We propose a method of moments (MoM) algorithm for training large-scale implicit generative models. Moment estimation in this setting encounters two problems: it is often difficult to define the millions of moments needed to learn the model parameters, and it is hard to determine which properties are useful when specifying moments. To address the first issue, we introduce a moment network, and define the moments as the network’s hidden units and the gradient of the network’s output with respect to its parameters. To tackle the second problem, we use asymptotic theory to highlight desiderata for moments – namely they should minimize the asymptotic variance of estimated model parameters – and introduce an objective to learn better moments. The sequence of objectives created by this Method of Learned Moments (MoLM) can train high-quality neural image samplers. On CIFAR-10, we demonstrate that MoLM-trained generators achieve significantly higher Inception Scores and lower Frechet Inception Distances than those trained with gradient penalty-regularized and spectrally-normalized adversarial objectives. These generators also achieve nearly perfect Multi-Scale Structural Similarity Scores on CelebA, and can create high-quality samples of 128x128 images.'
volume: 80
URL: https://proceedings.mlr.press/v80/ravuri18a.html
PDF: http://proceedings.mlr.press/v80/ravuri18a/ravuri18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ravuri18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Suman
family: Ravuri
- given: Shakir
family: Mohamed
- given: Mihaela
family: Rosca
- given: Oriol
family: Vinyals
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4314-4323
id: ravuri18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4314
lastpage: 4323
published: 2018-07-03 00:00:00 +0000
- title: 'Weightless: Lossy weight encoding for deep neural network compression'
abstract: 'The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such as weight pruning or quantization. In this paper, we present a novel scheme for lossy weight encoding co-designed with weight simplification techniques. The encoding is based on the Bloomier filter, a probabilistic data structure that can save space at the cost of introducing random errors. Leveraging the ability of neural networks to tolerate these imperfections and by re-training around the errors, the proposed technique, named Weightless, can compress weights by up to 496x without loss of model accuracy. This results in up to a 1.51x improvement over the state-of-the-art.'
volume: 80
URL: https://proceedings.mlr.press/v80/reagan18a.html
PDF: http://proceedings.mlr.press/v80/reagan18a/reagan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-reagan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brandon
family: Reagan
- given: Udit
family: Gupta
- given: Bob
family: Adolf
- given: Michael
family: Mitzenmacher
- given: Alexander
family: Rush
- given: Gu-Yeon
family: Wei
- given: David
family: Brooks
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4324-4333
id: reagan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4324
lastpage: 4333
published: 2018-07-03 00:00:00 +0000
- title: 'Learning to Reweight Examples for Robust Deep Learning'
abstract: 'Deep neural networks have been shown to be very powerful modeling tools for many supervised learning tasks involving complex input patterns. However, they can also easily overfit to training set biases and label noises. In addition to various regularizers, example reweighting algorithms are popular solutions to these problems, but they require careful tuning of additional hyperparameters, such as example mining schedules and regularization hyperparameters. In contrast to past reweighting methods, which typically consist of functions of the cost value of each example, in this work we propose a novel meta-learning algorithm that learns to assign weights to training examples based on their gradient directions. To determine the example weights, our method performs a meta gradient descent step on the current mini-batch example weights (which are initialized from zero) to minimize the loss on a clean unbiased validation set. Our proposed method can be easily implemented on any type of deep network, does not require any additional hyperparameter tuning, and achieves impressive performance on class imbalance and corrupted label problems where only a small amount of clean validation data is available.'
volume: 80
URL: https://proceedings.mlr.press/v80/ren18a.html
PDF: http://proceedings.mlr.press/v80/ren18a/ren18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ren18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mengye
family: Ren
- given: Wenyuan
family: Zeng
- given: Bin
family: Yang
- given: Raquel
family: Urtasun
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4334-4343
id: ren18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4334
lastpage: 4343
published: 2018-07-03 00:00:00 +0000
- title: 'Learning by Playing Solving Sparse Reward Tasks from Scratch'
abstract: 'We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors - from scratch - in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL. Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach.'
volume: 80
URL: https://proceedings.mlr.press/v80/riedmiller18a.html
PDF: http://proceedings.mlr.press/v80/riedmiller18a/riedmiller18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-riedmiller18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Martin
family: Riedmiller
- given: Roland
family: Hafner
- given: Thomas
family: Lampe
- given: Michael
family: Neunert
- given: Jonas
family: Degrave
- given: Tom
family: Wiele
- given: Vlad
family: Mnih
- given: Nicolas
family: Heess
- given: Jost Tobias
family: Springenberg
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4344-4353
id: riedmiller18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4344
lastpage: 4353
published: 2018-07-03 00:00:00 +0000
- title: 'Been There, Done That: Meta-Learning with Episodic Recall'
abstract: 'Meta-learning agents excel at rapidly learning new tasks from open-ended task distributions; yet, they forget what they learn about each task as soon as the next begins. When tasks reoccur {–} as they do in natural environments {–} meta-learning agents must explore again instead of immediately exploiting previously discovered solutions. We propose a formalism for generating open-ended yet repetitious environments, then develop a meta-learning architecture for solving these environments. This architecture melds the standard LSTM working memory with a differentiable neural episodic memory. We explore the capabilities of agents with this episodic LSTM in five meta-learning environments with reoccurring tasks, ranging from bandits to navigation and stochastic sequential decision problems.'
volume: 80
URL: https://proceedings.mlr.press/v80/ritter18a.html
PDF: http://proceedings.mlr.press/v80/ritter18a/ritter18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ritter18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Samuel
family: Ritter
- given: Jane
family: Wang
- given: Zeb
family: Kurth-Nelson
- given: Siddhant
family: Jayakumar
- given: Charles
family: Blundell
- given: Razvan
family: Pascanu
- given: Matthew
family: Botvinick
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4354-4363
id: ritter18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4354
lastpage: 4363
published: 2018-07-03 00:00:00 +0000
- title: 'A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music'
abstract: 'The Variational Autoencoder (VAE) has proven to be an effective model for producing semantically meaningful latent representations for natural data. However, it has thus far seen limited application to sequential data, and, as we demonstrate, existing recurrent VAE models have difficulty modeling sequences with long-term structure. To address this issue, we propose the use of a hierarchical decoder, which first outputs embeddings for subsequences of the input and then uses these embeddings to generate each subsequence independently. This structure encourages the model to utilize its latent code, thereby avoiding the "posterior collapse" problem which remains an issue for recurrent VAEs. We apply this architecture to modeling sequences of musical notes and find that it exhibits dramatically better sampling, interpolation, and reconstruction performance than a "flat" baseline model. An implementation of our "MusicVAE" is available online at https://goo.gl/magenta/musicvae-code.'
volume: 80
URL: https://proceedings.mlr.press/v80/roberts18a.html
PDF: http://proceedings.mlr.press/v80/roberts18a/roberts18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-roberts18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Adam
family: Roberts
- given: Jesse
family: Engel
- given: Colin
family: Raffel
- given: Curtis
family: Hawthorne
- given: Douglas
family: Eck
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4364-4373
id: roberts18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4364
lastpage: 4373
published: 2018-07-03 00:00:00 +0000
- title: 'Learning to Optimize Combinatorial Functions'
abstract: 'Submodular functions have become a ubiquitous tool in machine learning. They are learnable from data, and can be optimized efficiently and with guarantees. Nonetheless, recent negative results show that optimizing learned surrogates of submodular functions can result in arbitrarily bad approximations of the true optimum. Our goal in this paper is to highlight the source of this hardness, and propose an alternative criterion for optimizing general combinatorial functions from sampled data. We prove a tight equivalence showing that a class of functions is optimizable if and only if it can be learned. We provide efficient and scalable optimization algorithms for several function classes of interest, and demonstrate their utility on the task of optimally choosing trending social media items.'
volume: 80
URL: https://proceedings.mlr.press/v80/rosenfeld18a.html
PDF: http://proceedings.mlr.press/v80/rosenfeld18a/rosenfeld18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-rosenfeld18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nir
family: Rosenfeld
- given: Eric
family: Balkanski
- given: Amir
family: Globerson
- given: Yaron
family: Singer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4374-4383
id: rosenfeld18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4374
lastpage: 4383
published: 2018-07-03 00:00:00 +0000
- title: 'Fast Information-theoretic Bayesian Optimisation'
abstract: 'Information-theoretic Bayesian optimisation techniques have demonstrated state-of-the-art performance in tackling important global optimisation problems. However, current information-theoretic approaches require many approximations in implementation, introduce often-prohibitive computational overhead and limit the choice of kernels available to model the objective. We develop a fast information-theoretic Bayesian Optimisation method, FITBO, that avoids the need for sampling the global minimiser, thus significantly reducing computational overhead. Moreover, in comparison with existing approaches, our method faces fewer constraints on kernel choice and enjoys the merits of dealing with the output space. We demonstrate empirically that FITBO inherits the performance associated with information-theoretic Bayesian optimisation, while being even faster than simpler Bayesian optimisation approaches, such as Expected Improvement.'
volume: 80
URL: https://proceedings.mlr.press/v80/ru18a.html
PDF: http://proceedings.mlr.press/v80/ru18a/ru18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ru18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Binxin
family: Ru
- given: Michael A.
family: Osborne
- given: Mark
family: Mcleod
- given: Diego
family: Granziol
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4384-4392
id: ru18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4384
lastpage: 4392
published: 2018-07-03 00:00:00 +0000
- title: 'Deep One-Class Classification'
abstract: 'Despite the great advances made by deep learning in many machine learning problems, there is a relative dearth of deep learning approaches for anomaly detection. Those approaches which do exist involve networks trained to perform a task other than anomaly detection, namely generative models or compression, which are in turn adapted for use in anomaly detection; they are not trained on an anomaly detection based objective. In this paper we introduce a new anomaly detection method—Deep Support Vector Data Description—, which is trained on an anomaly detection based objective. The adaptation to the deep regime necessitates that our neural network and training procedure satisfy certain properties, which we demonstrate theoretically. We show the effectiveness of our method on MNIST and CIFAR-10 image benchmark datasets as well as on the detection of adversarial examples of GTSRB stop signs.'
volume: 80
URL: https://proceedings.mlr.press/v80/ruff18a.html
PDF: http://proceedings.mlr.press/v80/ruff18a/ruff18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ruff18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lukas
family: Ruff
- given: Robert
family: Vandermeulen
- given: Nico
family: Goernitz
- given: Lucas
family: Deecke
- given: Shoaib Ahmed
family: Siddiqui
- given: Alexander
family: Binder
- given: Emmanuel
family: Müller
- given: Marius
family: Kloft
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4393-4402
id: ruff18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4393
lastpage: 4402
published: 2018-07-03 00:00:00 +0000
- title: 'Augment and Reduce: Stochastic Inference for Large Categorical Distributions'
abstract: 'Categorical distributions are ubiquitous in machine learning, e.g., in classification, language models, and recommendation systems. However, when the number of possible outcomes is very large, using categorical distributions becomes computationally expensive, as the complexity scales linearly with the number of outcomes. To address this problem, we propose augment and reduce (A&R), a method to alleviate the computational complexity. A&R uses two ideas: latent variable augmentation and stochastic variational inference. It maximizes a lower bound on the marginal likelihood of the data. Unlike existing methods which are specific to softmax, A&R is more general and is amenable to other categorical models, such as multinomial probit. On several large-scale classification problems, we show that A&R provides a tighter bound on the marginal likelihood and has better predictive performance than existing approaches.'
volume: 80
URL: https://proceedings.mlr.press/v80/ruiz18a.html
PDF: http://proceedings.mlr.press/v80/ruiz18a/ruiz18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ruiz18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Francisco
family: Ruiz
- given: Michalis
family: Titsias
- given: Adji Bousso
family: Dieng
- given: David
family: Blei
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4403-4412
id: ruiz18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4403
lastpage: 4412
published: 2018-07-03 00:00:00 +0000
- title: 'Probabilistic Boolean Tensor Decomposition'
abstract: 'Boolean tensor decomposition approximates data of multi-way binary relationships as product of interpretable low-rank binary factors, following the rules Boolean algebra. Here, we present its first probabilistic treatment. We facilitate scalable sampling-based posterior inference by exploitation of the combinatorial structure of the factor conditionals. Maximum a posteriori estimates consistently outperform existing non-probabilistic approaches. We show that our performance gains can partially be explained by convergence to solutions that occupy relatively large regions of the parameter space, as well as by implicit model averaging. Moreover, the Bayesian treatment facilitates model selection with much greater accuracy than the previously suggested minimum description length based approach. We investigate three real-world data sets. First, temporal interaction networks and behavioural data of university students demonstrate the inference of instructive latent patterns. Next, we decompose a tensor with more than 10 Billion data points, indicating relations of gene expression in cancer patients. Not only does this demonstrate scalability, it also provides an entirely novel perspective on relational properties of continuous data and, in the present example, on the molecular heterogeneity of cancer. Our implementation is available on GitHub: https://github.com/TammoR/LogicalFactorisationMachines'
volume: 80
URL: https://proceedings.mlr.press/v80/rukat18a.html
PDF: http://proceedings.mlr.press/v80/rukat18a/rukat18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-rukat18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tammo
family: Rukat
- given: Chris
family: Holmes
- given: Christopher
family: Yau
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4413-4422
id: rukat18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4413
lastpage: 4422
published: 2018-07-03 00:00:00 +0000
- title: 'Black-Box Variational Inference for Stochastic Differential Equations'
abstract: 'Parameter inference for stochastic differential equations is challenging due to the presence of a latent diffusion process. Working with an Euler-Maruyama discretisation for the diffusion, we use variational inference to jointly learn the parameters and the diffusion paths. We use a standard mean-field variational approximation of the parameter posterior, and introduce a recurrent neural network to approximate the posterior for the diffusion paths conditional on the parameters. This neural network learns how to provide Gaussian state transitions which bridge between observations in a very similar way to the conditioned diffusion process. The resulting black-box inference method can be applied to any SDE system with light tuning requirements. We illustrate the method on a Lotka-Volterra system and an epidemic model, producing accurate parameter estimates in a few hours.'
volume: 80
URL: https://proceedings.mlr.press/v80/ryder18a.html
PDF: http://proceedings.mlr.press/v80/ryder18a/ryder18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-ryder18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tom
family: Ryder
- given: Andrew
family: Golightly
- given: A. Stephen
family: McGough
- given: Dennis
family: Prangle
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4423-4432
id: ryder18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4423
lastpage: 4432
published: 2018-07-03 00:00:00 +0000
- title: 'Spurious Local Minima are Common in Two-Layer ReLU Neural Networks'
abstract: 'We consider the optimization problem associated with training simple ReLU neural networks of the form $\mathbf{x}\mapsto \sum_{i=1}^{k}\max\{0,\mathbf{w}_i^\top \mathbf{x}\}$ with respect to the squared loss. We provide a computer-assisted proof that even if the input distribution is standard Gaussian, even if the dimension is arbitrarily large, and even if the target values are generated by such a network, with orthonormal parameter vectors, the problem can still have spurious local minima once $6\le k\le 20$. By a concentration of measure argument, this implies that in high input dimensions, *nearly all* target networks of the relevant sizes lead to spurious local minima. Moreover, we conduct experiments which show that the probability of hitting such local minima is quite high, and increasing with the network size. On the positive side, mild over-parameterization appears to drastically reduce such local minima, indicating that an over-parameterization assumption is necessary to get a positive result in this setting.'
volume: 80
URL: https://proceedings.mlr.press/v80/safran18a.html
PDF: http://proceedings.mlr.press/v80/safran18a/safran18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-safran18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Itay
family: Safran
- given: Ohad
family: Shamir
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4433-4441
id: safran18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4433
lastpage: 4441
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Equations for Extrapolation and Control'
abstract: 'We present an approach to identify concise equations from data using a shallow neural network approach. In contrast to ordinary black-box regression, this approach allows understanding functional relations and generalizing them from observed data to unseen parts of the parameter space. We show how to extend the class of learnable equations for a recently proposed equation learning network to include divisions, and we improve the learning and model selection strategy to be useful for challenging real-world data. For systems governed by analytical expressions, our method can in many cases identify the true underlying equation and extrapolate to unseen domains. We demonstrate its effectiveness by experiments on a cart-pendulum system, where only 2 random rollouts are required to learn the forward dynamics and successfully achieve the swing-up task.'
volume: 80
URL: https://proceedings.mlr.press/v80/sahoo18a.html
PDF: http://proceedings.mlr.press/v80/sahoo18a/sahoo18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sahoo18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Subham
family: Sahoo
- given: Christoph
family: Lampert
- given: Georg
family: Martius
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4442-4450
id: sahoo18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4442
lastpage: 4450
published: 2018-07-03 00:00:00 +0000
- title: 'Tempered Adversarial Networks'
abstract: 'Generative adversarial networks (GANs) have been shown to produce realistic samples from high-dimensional distributions, but training them is considered hard. A possible explanation for training instabilities is the inherent imbalance between the networks: While the discriminator is trained directly on both real and fake samples, the generator only has control over the fake samples it produces since the real data distribution is fixed by the choice of a given dataset. We propose a simple modification that gives the generator control over the real samples which leads to a tempered learning process for both generator and discriminator. The real data distribution passes through a lens before being revealed to the discriminator, balancing the generator and discriminator by gradually revealing more detailed features necessary to produce high-quality results. The proposed module automatically adjusts the learning process to the current strength of the networks, yet is generic and easy to add to any GAN variant. In a number of experiments, we show that this can improve quality, stability and/or convergence speed across a range of different GAN architectures (DCGAN, LSGAN, WGAN-GP).'
volume: 80
URL: https://proceedings.mlr.press/v80/sajjadi18a.html
PDF: http://proceedings.mlr.press/v80/sajjadi18a/sajjadi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sajjadi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mehdi S. M.
family: Sajjadi
- given: Giambattista
family: Parascandolo
- given: Arash
family: Mehrjou
- given: Bernhard
family: Schölkopf
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4451-4459
id: sajjadi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4451
lastpage: 4459
published: 2018-07-03 00:00:00 +0000
- title: 'Representation Tradeoffs for Hyperbolic Embeddings'
abstract: 'Hyperbolic embeddings offer excellent quality with few dimensions when embedding hierarchical data structures. We give a combinatorial construction that embeds trees into hyperbolic space with arbitrarily low distortion without optimization. On WordNet, this algorithm obtains a mean-average-precision of 0.989 with only two dimensions, outperforming existing work by 0.11 points. We provide bounds characterizing the precision-dimensionality tradeoff inherent in any hyperbolic embedding. To embed general metric spaces, we propose a hyperbolic generalization of multidimensional scaling (h-MDS). We show how to perform exact recovery of hyperbolic points from distances, provide a perturbation analysis, and give a recovery result that enables us to reduce dimensionality. Finally, we extract lessons from the algorithms and theory above to design a scalable PyTorch-based implementation that can handle incomplete information.'
volume: 80
URL: https://proceedings.mlr.press/v80/sala18a.html
PDF: http://proceedings.mlr.press/v80/sala18a/sala18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sala18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Frederic
family: Sala
- given: Chris
family: De Sa
- given: Albert
family: Gu
- given: Christopher
family: Re
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4460-4469
id: sala18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4460
lastpage: 4469
published: 2018-07-03 00:00:00 +0000
- title: 'Graph Networks as Learnable Physics Engines for Inference and Control'
abstract: 'Understanding and interacting with everyday physical scenes requires rich knowledge about the structure of the world, represented either implicitly in a value or policy function, or explicitly in a transition model. Here we introduce a new class of learnable models–based on graph networks–which implement an inductive bias for object- and relation-centric representations of complex, dynamical systems. Our results show that as a forward model, our approach supports accurate predictions from real and simulated data, and surprisingly strong and efficient generalization, across eight distinct physical systems which we varied parametrically and structurally. We also found that our inference model can perform system identification. Our models are also differentiable, and support online planning via gradient-based trajectory optimization, as well as offline policy optimization. Our framework offers new opportunities for harnessing and exploiting rich knowledge about the world, and takes a key step toward building machines with more human-like representations of the world.'
volume: 80
URL: https://proceedings.mlr.press/v80/sanchez-gonzalez18a.html
PDF: http://proceedings.mlr.press/v80/sanchez-gonzalez18a/sanchez-gonzalez18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sanchez-gonzalez18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alvaro
family: Sanchez-Gonzalez
- given: Nicolas
family: Heess
- given: Jost Tobias
family: Springenberg
- given: Josh
family: Merel
- given: Martin
family: Riedmiller
- given: Raia
family: Hadsell
- given: Peter
family: Battaglia
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4470-4479
id: sanchez-gonzalez18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4470
lastpage: 4479
published: 2018-07-03 00:00:00 +0000
- title: 'A Classification-Based Study of Covariate Shift in GAN Distributions'
abstract: 'A basic, and still largely unanswered, question in the context of Generative Adversarial Networks (GANs) is whether they are truly able to capture all the fundamental characteristics of the distributions they are trained on. In particular, evaluating the diversity of GAN distributions is challenging and existing methods provide only a partial understanding of this issue. In this paper, we develop quantitative and scalable tools for assessing the diversity of GAN distributions. Specifically, we take a classification-based perspective and view loss of diversity as a form of covariate shift introduced by GANs. We examine two specific forms of such shift: mode collapse and boundary distortion. In contrast to prior work, our methods need only minimal human supervision and can be readily applied to state-of-the-art GANs on large, canonical datasets. Examining popular GANs using our tools indicates that these GANs have significant problems in reproducing the more distributional properties of their training dataset.'
volume: 80
URL: https://proceedings.mlr.press/v80/santurkar18a.html
PDF: http://proceedings.mlr.press/v80/santurkar18a/santurkar18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-santurkar18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shibani
family: Santurkar
- given: Ludwig
family: Schmidt
- given: Aleksander
family: Madry
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4480-4489
id: santurkar18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4480
lastpage: 4489
published: 2018-07-03 00:00:00 +0000
- title: 'TAPAS: Tricks to Accelerate (encrypted) Prediction As a Service'
abstract: 'Machine learning methods are widely used for a variety of prediction problems. Prediction as a service is a paradigm in which service providers with technological expertise and computational resources may perform predictions for clients. However, data privacy severely restricts the applicability of such services, unless measures to keep client data private (even from the service provider) are designed. Equally important is to minimize the nature of computation and amount of communication required between client and server. Fully homomorphic encryption offers a way out, whereby clients may encrypt their data, and on which the server may perform arithmetic computations. The one drawback of using fully homomorphic encryption is the amount of time required to evaluate large machine learning models on encrypted data. We combine several ideas from the machine learning literature, particularly work on quantization and sparsification of neural networks, together with algorithmic tools to speed-up and parallelize computation using encrypted data.'
volume: 80
URL: https://proceedings.mlr.press/v80/sanyal18a.html
PDF: http://proceedings.mlr.press/v80/sanyal18a/sanyal18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sanyal18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amartya
family: Sanyal
- given: Matt
family: Kusner
- given: Adria
family: Gascon
- given: Varun
family: Kanade
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4490-4499
id: sanyal18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4490
lastpage: 4499
published: 2018-07-03 00:00:00 +0000
- title: 'Tight Regret Bounds for Bayesian Optimization in One Dimension'
abstract: 'We consider the problem of Bayesian optimization (BO) in one dimension, under a Gaussian process prior and Gaussian sampling noise. We provide a theoretical analysis showing that, under fairly mild technical assumptions on the kernel, the best possible cumulative regret up to time $T$ behaves as $\Omega(\sqrt{T})$ and $O(\sqrt{T\log T})$. This gives a tight characterization up to a $\sqrt{\log T}$ factor, and includes the first non-trivial lower bound for noisy BO. Our assumptions are satisfied, for example, by the squared exponential and Matérn-$\nu$ kernels, with the latter requiring $\nu > 2$. Our results certify the near-optimality of existing bounds (Srinivas *et al.*, 2009) for the SE kernel, while proving them to be strictly suboptimal for the Matérn kernel with $\nu > 2$.'
volume: 80
URL: https://proceedings.mlr.press/v80/scarlett18a.html
PDF: http://proceedings.mlr.press/v80/scarlett18a/scarlett18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-scarlett18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonathan
family: Scarlett
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4500-4508
id: scarlett18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4500
lastpage: 4508
published: 2018-07-03 00:00:00 +0000
- title: 'Learning with Abandonment'
abstract: 'Consider a platform that wants to learn a personalized policy for each user, but the platform faces the risk of a user abandoning the platform if they are dissatisfied with the actions of the platform. For example, a platform is interested in personalizing the number of newsletters it sends, but faces the risk that the user unsubscribes forever. We propose a general thresholded learning model for scenarios like this, and discuss the structure of optimal policies. We describe salient features of optimal personalization algorithms and how feedback the platform receives impacts the results. Furthermore, we investigate how the platform can efficiently learn the heterogeneity across users by interacting with a population and provide performance guarantees.'
volume: 80
URL: https://proceedings.mlr.press/v80/schmit18a.html
PDF: http://proceedings.mlr.press/v80/schmit18a/schmit18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-schmit18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sven
family: Schmit
- given: Ramesh
family: Johari
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4509-4517
id: schmit18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4509
lastpage: 4517
published: 2018-07-03 00:00:00 +0000
- title: 'Not to Cry Wolf: Distantly Supervised Multitask Learning in Critical Care'
abstract: 'Patients in the intensive care unit (ICU) require constant and close supervision. To assist clinical staff in this task, hospitals use monitoring systems that trigger audiovisual alarms if their algorithms indicate that a patient’s condition may be worsening. However, current monitoring systems are extremely sensitive to movement artefacts and technical errors. As a result, they typically trigger hundreds to thousands of false alarms per patient per day - drowning the important alarms in noise and adding to the exhaustion of clinical staff. In this setting, data is abundantly available, but obtaining trustworthy annotations by experts is laborious and expensive. We frame the problem of false alarm reduction from multivariate time series as a machine-learning task and address it with a novel multitask network architecture that utilises distant supervision through multiple related auxiliary tasks in order to reduce the number of expensive labels required for training. We show that our approach leads to significant improvements over several state-of-the-art baselines on real-world ICU data and provide new insights on the importance of task selection and architectural choices in distantly supervised multitask learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/schwab18a.html
PDF: http://proceedings.mlr.press/v80/schwab18a/schwab18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-schwab18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Patrick
family: Schwab
- given: Emanuela
family: Keller
- given: Carl
family: Muroi
- given: David J.
family: Mack
- given: Christian
family: Strässle
- given: Walter
family: Karlen
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4518-4527
id: schwab18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4518
lastpage: 4527
published: 2018-07-03 00:00:00 +0000
- title: 'Progress & Compress: A scalable framework for continual learning'
abstract: 'We introduce a conceptually simple and scalable framework for continual learning domains where tasks are learned sequentially. Our method is constant in the number of parameters and is designed to preserve performance on previously encountered tasks while accelerating learning progress on subsequent problems. This is achieved by training a network with two components: A knowledge base, capable of solving previously encountered problems, which is connected to an active column that is employed to efficiently learn the current task. After learning a new task, the active column is distilled into the knowledge base, taking care to protect any previously acquired skills. This cycle of active learning (progression) followed by consolidation (compression) requires no architecture growth, no access to or storing of previous data or tasks, and no task-specific parameters. We demonstrate the progress & compress approach on sequential classification of handwritten alphabets as well as two reinforcement learning domains: Atari games and 3D maze navigation.'
volume: 80
URL: https://proceedings.mlr.press/v80/schwarz18a.html
PDF: http://proceedings.mlr.press/v80/schwarz18a/schwarz18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-schwarz18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonathan
family: Schwarz
- given: Wojciech
family: Czarnecki
- given: Jelena
family: Luketina
- given: Agnieszka
family: Grabska-Barwinska
- given: Yee Whye
family: Teh
- given: Razvan
family: Pascanu
- given: Raia
family: Hadsell
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4528-4537
id: schwarz18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4528
lastpage: 4537
published: 2018-07-03 00:00:00 +0000
- title: 'Multi-Fidelity Black-Box Optimization with Hierarchical Partitions'
abstract: 'Motivated by settings such as hyper-parameter tuning and physical simulations, we consider the problem of black-box optimization of a function. Multi-fidelity techniques have become popular for applications where exact function evaluations are expensive, but coarse (biased) approximations are available at much lower cost. A canonical example is that of hyper-parameter selection in a learning algorithm. The learning algorithm can be trained for fewer iterations – this results in a lower cost, but its validation error is only coarsely indicative of the same if the algorithm had been trained till completion. We incorporate the multi-fidelity setup into the powerful framework of black-box optimization through hierarchical partitioning. We develop tree-search based multi-fidelity algorithms with theoretical guarantees on simple regret. We finally demonstrate the performance gains of our algorithms on both real and synthetic datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/sen18a.html
PDF: http://proceedings.mlr.press/v80/sen18a/sen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rajat
family: Sen
- given: Kirthevasan
family: Kandasamy
- given: Sanjay
family: Shakkottai
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4538-4547
id: sen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4538
lastpage: 4547
published: 2018-07-03 00:00:00 +0000
- title: 'Overcoming Catastrophic Forgetting with Hard Attention to the Task'
abstract: 'Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks’ information without affecting the current task’s learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.'
volume: 80
URL: https://proceedings.mlr.press/v80/serra18a.html
PDF: http://proceedings.mlr.press/v80/serra18a/serra18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-serra18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Joan
family: Serra
- given: Didac
family: Suris
- given: Marius
family: Miron
- given: Alexandros
family: Karatzoglou
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4548-4557
id: serra18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4548
lastpage: 4557
published: 2018-07-03 00:00:00 +0000
- title: 'Bounding and Counting Linear Regions of Deep Neural Networks'
abstract: 'We investigate the complexity of deep neural networks (DNN) that represent piecewise linear (PWL) functions. In particular, we study the number of linear regions, i.e. pieces, that a PWL function represented by a DNN can attain, both theoretically and empirically. We present (i) tighter upper and lower bounds for the maximum number of linear regions on rectifier networks, which are exact for inputs of dimension one; (ii) a first upper bound for multi-layer maxout networks; and (iii) a first method to perform exact enumeration or counting of the number of regions by modeling the DNN with a mixed-integer linear formulation. These bounds come from leveraging the dimension of the space defining each linear region. The results also indicate that a deep rectifier network can only have more linear regions than every shallow counterpart with same number of neurons if that number exceeds the dimension of the input.'
volume: 80
URL: https://proceedings.mlr.press/v80/serra18b.html
PDF: http://proceedings.mlr.press/v80/serra18b/serra18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-serra18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thiago
family: Serra
- given: Christian
family: Tjandraatmadja
- given: Srikumar
family: Ramalingam
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4558-4566
id: serra18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4558
lastpage: 4566
published: 2018-07-03 00:00:00 +0000
- title: 'First Order Generative Adversarial Networks'
abstract: 'GANs excel at learning high dimensional distributions, but they can update generator parameters in directions that do not correspond to the steepest descent direction of the objective. Prominent examples of problematic update directions include those used in both Goodfellow’s original GAN and the WGAN-GP. To formally describe an optimal update direction, we introduce a theoretical framework which allows the derivation of requirements on both the divergence and corresponding method for determining an update direction, with these requirements guaranteeing unbiased mini-batch updates in the direction of steepest descent. We propose a novel divergence which approximates the Wasserstein distance while regularizing the critic’s first order information. Together with an accompanying update direction, this divergence fulfills the requirements for unbiased steepest descent updates. We verify our method, the First Order GAN, with image generation on CelebA, LSUN and CIFAR-10 and set a new state of the art on the One Billion Word language generation task.'
volume: 80
URL: https://proceedings.mlr.press/v80/seward18a.html
PDF: http://proceedings.mlr.press/v80/seward18a/seward18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-seward18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Calvin
family: Seward
- given: Thomas
family: Unterthiner
- given: Urs
family: Bergmann
- given: Nikolay
family: Jetchev
- given: Sepp
family: Hochreiter
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4567-4576
id: seward18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4567
lastpage: 4576
published: 2018-07-03 00:00:00 +0000
- title: 'Finding Influential Training Samples for Gradient Boosted Decision Trees'
abstract: 'We address the problem of finding influential training samples for a particular case of tree ensemble-based models, e.g., Random Forest (RF) or Gradient Boosted Decision Trees (GBDT). A natural way of formalizing this problem is studying how the model’s predictions change upon leave-one-out retraining, leaving out each individual training sample. Recent work has shown that, for parametric models, this analysis can be conducted in a computationally efficient way. We propose several ways of extending this framework to non-parametric GBDT ensembles under the assumption that tree structures remain fixed. Furthermore, we introduce a general scheme of obtaining further approximations to our method that balance the trade-off between performance and computational complexity. We evaluate our approaches on various experimental setups and use-case scenarios and demonstrate both the quality of our approach to finding influential training samples in comparison to the baselines and its computational efficiency.'
volume: 80
URL: https://proceedings.mlr.press/v80/sharchilev18a.html
PDF: http://proceedings.mlr.press/v80/sharchilev18a/sharchilev18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sharchilev18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Boris
family: Sharchilev
- given: Yury
family: Ustinovskiy
- given: Pavel
family: Serdyukov
- given: Maarten
family: Rijke
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4577-4585
id: sharchilev18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4577
lastpage: 4585
published: 2018-07-03 00:00:00 +0000
- title: 'Solving Partial Assignment Problems using Random Clique Complexes'
abstract: 'We present an alternate formulation of the partial assignment problem as matching random clique complexes, that are higher-order analogues of random graphs, designed to provide a set of invariants that better detect higher-order structure. The proposed method creates random clique adjacency matrices for each k-skeleton of the random clique complexes and matches them, taking into account each point as the affine combination of its geometric neighborhood. We justify our solution theoretically, by analyzing the runtime and storage complexity of our algorithm along with the asymptotic behavior of the quadratic assignment problem (QAP) that is associated with the underlying random clique adjacency matrices. Experiments on both synthetic and real-world datasets, containing severe occlusions and distortions, provide insight into the accuracy, efficiency, and robustness of our approach. We outperform diverse matching algorithms by a significant margin.'
volume: 80
URL: https://proceedings.mlr.press/v80/sharma18a.html
PDF: http://proceedings.mlr.press/v80/sharma18a/sharma18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sharma18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Charu
family: Sharma
- given: Deepak
family: Nathani
- given: Manohar
family: Kaul
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4586-4595
id: sharma18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4586
lastpage: 4595
published: 2018-07-03 00:00:00 +0000
- title: 'Adafactor: Adaptive Learning Rates with Sublinear Memory Cost'
abstract: 'In several recently proposed stochastic optimization methods (e.g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients. Maintaining these per-parameter second-moment estimators requires memory equal to the number of parameters. For the case of neural network weight matrices, we propose maintaining only the per-row and per-column sums of these moving averages, and estimating the per-parameter second moments based on these sums. We demonstrate empirically that this method produces similar results to the baseline. Secondly, we show that adaptive methods can produce larger-than-desired updates when the decay rate of the second moment accumulator is too slow. We propose update clipping and a gradually increasing decay rate scheme as remedies. Combining these methods and dropping momentum, we achieve comparable results to the published Adam regime in training the Transformer model on the WMT 2014 English-German machine translation task, while using very little auxiliary storage in the optimizer. Finally, we propose scaling the parameter updates based on the scale of the parameters themselves.'
volume: 80
URL: https://proceedings.mlr.press/v80/shazeer18a.html
PDF: http://proceedings.mlr.press/v80/shazeer18a/shazeer18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-shazeer18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Noam
family: Shazeer
- given: Mitchell
family: Stern
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4596-4604
id: shazeer18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4596
lastpage: 4604
published: 2018-07-03 00:00:00 +0000
- title: 'Locally Private Hypothesis Testing'
abstract: 'We initiate the study of differentially private hypothesis testing in the local-model, under both the standard (symmetric) randomized-response mechanism (Warner 1965, Kasiviswanathan et al, 2008) and the newer (non-symmetric) mechanisms (Bassily & Smith, 2015, Bassily et al, 2017). First, we study the general framework of mapping each user’s type into a signal and show that the problem of finding the maximum-likelihood distribution over the signals is feasible. Then we discuss the randomized-response mechanism and show that, in essence, it maps the null- and alternative-hypotheses onto new sets, an affine translation of the original sets. We then give sample complexity bounds for identity and independence testing under randomized-response. We then move to the newer non-symmetric mechanisms and show that there too the problem of finding the maximum-likelihood distribution is feasible. Under the mechanism of Bassily et al we give identity and independence testers with better sample complexity than the testers in the symmetric case, and we also propose a $\chi^2$-based identity tester which we investigate empirically.'
volume: 80
URL: https://proceedings.mlr.press/v80/sheffet18a.html
PDF: http://proceedings.mlr.press/v80/sheffet18a/sheffet18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sheffet18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Or
family: Sheffet
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4605-4614
id: sheffet18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4605
lastpage: 4614
published: 2018-07-03 00:00:00 +0000
- title: 'Learning in Integer Latent Variable Models with Nested Automatic Differentiation'
abstract: 'We develop nested automatic differentiation (AD) algorithms for exact inference and learning in integer latent variable models. Recently, Winner, Sujono, and Sheldon showed how to reduce marginalization in a class of integer latent variable models to evaluating a probability generating function which contains many levels of nested high-order derivatives. We contribute faster and more stable AD algorithms for this challenging problem and a novel algorithm to compute exact gradients for learning. These contributions lead to significantly faster and more accurate learning algorithms, and are the first AD algorithms whose running time is polynomial in the number of levels of nesting.'
volume: 80
URL: https://proceedings.mlr.press/v80/sheldon18a.html
PDF: http://proceedings.mlr.press/v80/sheldon18a/sheldon18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sheldon18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniel
family: Sheldon
- given: Kevin
family: Winner
- given: Debora
family: Sujono
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4615-4623
id: sheldon18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4615
lastpage: 4623
published: 2018-07-03 00:00:00 +0000
- title: 'Towards More Efficient Stochastic Decentralized Learning: Faster Convergence and Sparse Communication'
abstract: 'Recently, the decentralized optimization problem is attracting growing attention. Most existing methods are deterministic with high per-iteration cost and have a convergence rate quadratically depending on the problem condition number. Besides, the dense communication is necessary to ensure the convergence even if the dataset is sparse. In this paper, we generalize the decentralized optimization problem to a monotone operator root finding problem, and propose a stochastic algorithm named DSBA that (1) converges geometrically with a rate linearly depending on the problem condition number, and (2) can be implemented using sparse communication only. Additionally, DSBA handles important learning problems like AUC-maximization which can not be tackled efficiently in the previous problem setting. Experiments on convex minimization and AUC-maximization validate the efficiency of our method.'
volume: 80
URL: https://proceedings.mlr.press/v80/shen18a.html
PDF: http://proceedings.mlr.press/v80/shen18a/shen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-shen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zebang
family: Shen
- given: Aryan
family: Mokhtari
- given: Tengfei
family: Zhou
- given: Peilin
family: Zhao
- given: Hui
family: Qian
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4624-4633
id: shen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4624
lastpage: 4633
published: 2018-07-03 00:00:00 +0000
- title: 'An Algorithmic Framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-Gradient Method'
abstract: 'We propose a novel algorithmic framework of Variable Metric Over-Relaxed Hybrid Proximal Extra-gradient (VMOR-HPE) method with a global convergence guarantee for the maximal monotone operator inclusion problem. Its iteration complexities and local linear convergence rate are provided, which theoretically demonstrate that a large over-relaxed step-size contributes to accelerating the proposed VMOR-HPE as a byproduct. Specifically, we find that a large class of primal and primal-dual operator splitting algorithms are all special cases of VMOR-HPE. Hence, the proposed framework offers a new insight into these operator splitting algorithms. In addition, we apply VMOR-HPE to the Karush-Kuhn-Tucker (KKT) generalized equation of linear equality constrained multi-block composite convex optimization, yielding a new algorithm, namely nonsymmetric Proximal Alternating Direction Method of Multipliers with a preconditioned Extra-gradient step in which the preconditioned metric is generated by a blockwise Barzilai-Borwein line search technique (PADMM-EBB). We also establish iteration complexities of PADMM-EBB in terms of the KKT residual. Finally, we apply PADMM-EBB to handle the nonnegative dual graph regularized low-rank representation problem. Promising results on synthetic and real datasets corroborate the efficacy of PADMM-EBB.'
volume: 80
URL: https://proceedings.mlr.press/v80/shen18b.html
PDF: http://proceedings.mlr.press/v80/shen18b/shen18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-shen18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Li
family: Shen
- given: Peng
family: Sun
- given: Yitong
family: Wang
- given: Wei
family: Liu
- given: Tong
family: Zhang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4634-4643
id: shen18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4634
lastpage: 4643
published: 2018-07-03 00:00:00 +0000
- title: 'A Spectral Approach to Gradient Estimation for Implicit Distributions'
abstract: 'Recently there have been increasing interests in learning and inference with implicit distributions (i.e., distributions without tractable densities). To this end, we develop a gradient estimator for implicit distributions based on Stein’s identity and a spectral decomposition of kernel operators, where the eigenfunctions are approximated by the Nystr{ö}m method. Unlike the previous works that only provide estimates at the sample points, our approach directly estimates the gradient function, thus allows for a simple and principled out-of-sample extension. We provide theoretical results on the error bound of the estimator and discuss the bias-variance tradeoff in practice. The effectiveness of our method is demonstrated by applications to gradient-free Hamiltonian Monte Carlo and variational inference with implicit distributions. Finally, we discuss the intuition behind the estimator by drawing connections between the Nystr{ö}m method and kernel PCA, which indicates that the estimator can automatically adapt to the geometry of the underlying distribution.'
volume: 80
URL: https://proceedings.mlr.press/v80/shi18a.html
PDF: http://proceedings.mlr.press/v80/shi18a/shi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-shi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiaxin
family: Shi
- given: Shengyang
family: Sun
- given: Jun
family: Zhu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4644-4653
id: shi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4644
lastpage: 4653
published: 2018-07-03 00:00:00 +0000
- title: 'TACO: Learning Task Decomposition via Temporal Alignment for Control'
abstract: 'Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks. By reusing the corresponding sub-policies within and between tasks, we can provide training data for each policy from different high-level tasks and compose them to perform novel ones. Existing approaches to modular LfD focus either on learning a single high-level task or depend on domain knowledge and temporal segmentation. In contrast, we propose a weakly supervised, domain-agnostic approach based on task sketches, which include only the sequence of sub-tasks performed in each demonstration. Our approach simultaneously aligns the sketches with the observed demonstrations and learns the required sub-policies. This improves generalisation in comparison to separate optimisation procedures. We evaluate the approach on multiple domains, including a simulated 3D robot arm control task using purely image-based observations. The results show that our approach performs commensurately with fully supervised approaches, while requiring significantly less annotation effort.'
volume: 80
URL: https://proceedings.mlr.press/v80/shiarlis18a.html
PDF: http://proceedings.mlr.press/v80/shiarlis18a/shiarlis18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-shiarlis18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kyriacos
family: Shiarlis
- given: Markus
family: Wulfmeier
- given: Sasha
family: Salter
- given: Shimon
family: Whiteson
- given: Ingmar
family: Posner
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4654-4663
id: shiarlis18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4654
lastpage: 4663
published: 2018-07-03 00:00:00 +0000
- title: 'CRAFTML, an Efficient Clustering-based Random Forest for Extreme Multi-label Learning'
abstract: 'Extreme Multi-label Learning (XML) considers large sets of items described by a number of labels that can exceed one million. Tree-based methods, which hierarchically partition the problem into small scale sub-problems, are particularly promising in this context to reduce the learning/prediction complexity and to open the way to parallelization. However, the current best approaches do not exploit tree randomization which has shown its efficiency in random forests and they resort to complex partitioning strategies. To overcome these limits, we here introduce a new random forest based algorithm with a very fast partitioning approach called CRAFTML. Experimental comparisons on nine datasets from the XML literature show that it outperforms the other tree-based approaches. Moreover with a parallelized implementation reduced to five cores, it is competitive with the best state-of-the-art methods which run on one hundred-core machines.'
volume: 80
URL: https://proceedings.mlr.press/v80/siblini18a.html
PDF: http://proceedings.mlr.press/v80/siblini18a/siblini18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-siblini18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wissam
family: Siblini
- given: Pascale
family: Kuntz
- given: Frank
family: Meyer
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4664-4673
id: siblini18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4664
lastpage: 4673
published: 2018-07-03 00:00:00 +0000
- title: 'Asynchronous Stochastic Quasi-Newton MCMC for Non-Convex Optimization'
abstract: 'Recent studies have illustrated that stochastic gradient Markov Chain Monte Carlo techniques have a strong potential in non-convex optimization, where local and global convergence guarantees can be shown under certain conditions. By building up on this recent theory, in this study, we develop an asynchronous-parallel stochastic L-BFGS algorithm for non-convex optimization. The proposed algorithm is suitable for both distributed and shared-memory settings. We provide formal theoretical analysis and show that the proposed method achieves an ergodic convergence rate of ${\cal O}(1/\sqrt{N})$ ($N$ being the total number of iterations) and it can achieve a linear speedup under certain conditions. We perform several experiments on both synthetic and real datasets. The results support our theory and show that the proposed algorithm provides a significant speedup over the recently proposed synchronous distributed L-BFGS algorithm.'
volume: 80
URL: https://proceedings.mlr.press/v80/simsekli18a.html
PDF: http://proceedings.mlr.press/v80/simsekli18a/simsekli18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-simsekli18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Umut
family: Simsekli
- given: Cagatay
family: Yildiz
- given: Than Huy
family: Nguyen
- given: Taylan
family: Cemgil
- given: Gael
family: Richard
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4674-4683
id: simsekli18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4674
lastpage: 4683
published: 2018-07-03 00:00:00 +0000
- title: 'K-means clustering using random matrix sparsification'
abstract: 'K-means clustering algorithm using Lloyd’s heuristic is one of the most commonly used tools in data mining and machine learning that shows promising performance. However, it suffers from a high computational cost resulting from pairwise Euclidean distance computations between data points and cluster centers in each iteration of Lloyd’s heuristic. Main contributing factor of this computational bottle neck is a matrix-vector multiplication step, where the matrix contains all the data points and the vector is a cluster center. In this paper we show that we can randomly sparsify the original data matrix resulting in a sparse data matrix which can significantly speed up the above mentioned matrix vector multiplication step without significantly affecting cluster quality. In particular, we show that optimal k-means clustering solution of the sparse data matrix, obtained by applying random matrix sparsification, results in an approximately optimal k-means clustering objective of the original data matrix. Our empirical studies on three real world datasets corroborate our theoretical findings and demonstrate that our proposed sparsification method can indeed achieve satisfactory clustering performance.'
volume: 80
URL: https://proceedings.mlr.press/v80/sinha18a.html
PDF: http://proceedings.mlr.press/v80/sinha18a/sinha18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sinha18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kaushik
family: Sinha
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4684-4692
id: sinha18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4684
lastpage: 4692
published: 2018-07-03 00:00:00 +0000
- title: 'Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron'
abstract: 'We present an extension to the Tacotron speech synthesis architecture that learns a latent embedding space of prosody, derived from a reference acoustic representation containing the desired prosody. We show that conditioning Tacotron on this learned embedding space results in synthesized audio that matches the prosody of the reference signal with fine time detail even when the reference and synthesis speakers are different. Additionally, we show that a reference prosody embedding can be used to synthesize text that is different from that of the reference utterance. We define several quantitative and subjective metrics for evaluating prosody transfer, and report results with accompanying audio samples from single-speaker and 44-speaker Tacotron models on a prosody transfer task.'
volume: 80
URL: https://proceedings.mlr.press/v80/skerry-ryan18a.html
PDF: http://proceedings.mlr.press/v80/skerry-ryan18a/skerry-ryan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-skerry-ryan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: RJ
family: Skerry-Ryan
- given: Eric
family: Battenberg
- given: Ying
family: Xiao
- given: Yuxuan
family: Wang
- given: Daisy
family: Stanton
- given: Joel
family: Shor
- given: Ron
family: Weiss
- given: Rob
family: Clark
- given: Rif A.
family: Saurous
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4693-4702
id: skerry-ryan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4693
lastpage: 4702
published: 2018-07-03 00:00:00 +0000
- title: 'An Inference-Based Policy Gradient Method for Learning Options'
abstract: 'In the pursuit of increasingly intelligent learning systems, abstraction plays a vital role in enabling sophisticated decisions to be made in complex environments. The options framework provides formalism for such abstraction over sequences of decisions. However most models require that options be given a priori, presumably specified by hand, which is neither efficient, nor scalable. Indeed, it is preferable to learn options directly from interaction with the environment. Despite several efforts, this remains a difficult problem. In this work we develop a novel policy gradient method for the automatic learning of policies with options. This algorithm uses inference methods to simultaneously improve all of the options available to an agent, and thus can be employed in an off-policy manner, without observing option labels. The differentiable inference procedure employed yields options that can be easily interpreted. Empirical results confirm these attributes, and indicate that our algorithm has an improved sample efficiency relative to state-of-the-art in learning options end-to-end.'
volume: 80
URL: https://proceedings.mlr.press/v80/smith18a.html
PDF: http://proceedings.mlr.press/v80/smith18a/smith18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-smith18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Smith
- given: Herke
family: Hoof
- given: Joelle
family: Pineau
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4703-4712
id: smith18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4703
lastpage: 4712
published: 2018-07-03 00:00:00 +0000
- title: 'Accelerating Natural Gradient with Higher-Order Invariance'
abstract: 'An appealing property of the natural gradient is that it is invariant to arbitrary differentiable reparameterizations of the model. However, this invariance property requires infinitesimal steps and is lost in practical implementations with small but finite step sizes. In this paper, we study invariance properties from a combined perspective of Riemannian geometry and numerical differential equation solving. We define the order of invariance of a numerical method to be its convergence order to an invariant solution. We propose to use higher-order integrators and geodesic corrections to obtain more invariant optimization trajectories. We prove the numerical convergence properties of geodesic corrected updates and show that they can be as computational efficient as plain natural gradient. Experimentally, we demonstrate that invariance leads to faster optimization and our techniques improve on traditional natural gradient in deep neural network training and natural policy gradient for reinforcement learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/song18a.html
PDF: http://proceedings.mlr.press/v80/song18a/song18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-song18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yang
family: Song
- given: Jiaming
family: Song
- given: Stefano
family: Ermon
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4713-4722
id: song18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4713
lastpage: 4722
published: 2018-07-03 00:00:00 +0000
- title: 'Knowledge Transfer with Jacobian Matching'
abstract: 'Classical distillation methods transfer representations from a “teacher” neural network to a “student” network by matching their output activations. Recent methods also match the Jacobians, or the gradient of output activations with the input. However, this involves making some ad hoc decisions, in particular, the choice of the loss function. In this paper, we first establish an equivalence between Jacobian matching and distillation with input noise, from which we derive appropriate loss functions for Jacobian matching. We then rely on this analysis to apply Jacobian matching to transfer learning by establishing equivalence of a recent transfer learning procedure to distillation. We then show experimentally on standard image datasets that Jacobian-based penalties improve distillation, robustness to noisy inputs, and transfer learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/srinivas18a.html
PDF: http://proceedings.mlr.press/v80/srinivas18a/srinivas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-srinivas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Suraj
family: Srinivas
- given: Francois
family: Fleuret
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4723-4731
id: srinivas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4723
lastpage: 4731
published: 2018-07-03 00:00:00 +0000
- title: 'Universal Planning Networks: Learning Generalizable Representations for Visuomotor Control'
abstract: 'A key challenge in complex visuomotor control is learning abstract representations that are effective for specifying goals, planning, and generalization. To this end, we introduce universal planning networks (UPN). UPNs embed differentiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The plan-by-gradient-descent process and its underlying representations are learned end-to-end to directly optimize a supervised imitation learning objective. We find that the representations learned are not only effective for goal-directed visual imitation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images. The learned representations can be leveraged to specify distance-based rewards to reach new target states for model-free reinforcement learning, resulting in substantially more effective learning when solving new tasks described via image based goals. We were able to achieve successful transfer of visuomotor planning strategies across robots with significantly different morphologies and actuation capabilities. Visit https://sites.google. com/view/upn-public/home for video highlights.'
volume: 80
URL: https://proceedings.mlr.press/v80/srinivas18b.html
PDF: http://proceedings.mlr.press/v80/srinivas18b/srinivas18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-srinivas18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aravind
family: Srinivas
- given: Allan
family: Jabri
- given: Pieter
family: Abbeel
- given: Sergey
family: Levine
- given: Chelsea
family: Finn
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4732-4741
id: srinivas18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4732
lastpage: 4741
published: 2018-07-03 00:00:00 +0000
- title: 'Structured Control Nets for Deep Reinforcement Learning'
abstract: 'In recent years, Deep Reinforcement Learning has made impressive advances in solving several important benchmark problems for sequential decision making. Many control applications use a generic multilayer perceptron (MLP) for non-vision parts of the policy network. In this work, we propose a new neural network architecture for the policy network representation that is simple yet effective. The proposed Structured Control Net (SCN) splits the generic MLP into two separate sub-modules: a nonlinear control module and a linear control module. Intuitively, the nonlinear control is for forward-looking and global control, while the linear control stabilizes the local dynamics around the residual of global control. We hypothesize that this will bring together the benefits of both linear and nonlinear policies: improve training sample efficiency, final episodic reward, and generalization of learned policy, while requiring a smaller network and being generally applicable to different training methods. We validated our hypothesis with competitive results on simulations from OpenAI MuJoCo, Roboschool, Atari, and a custom urban driving environment, with various ablation and generalization tests, trained with multiple black-box and policy gradient training methods. The proposed architecture has the potential to improve upon broader control tasks by incorporating problem specific priors into the architecture. As a case study, we demonstrate much improved performance for locomotion tasks by emulating the biological central pattern generators (CPGs) as the nonlinear part of the architecture.'
volume: 80
URL: https://proceedings.mlr.press/v80/srouji18a.html
PDF: http://proceedings.mlr.press/v80/srouji18a/srouji18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-srouji18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mario
family: Srouji
- given: Jian
family: Zhang
- given: Ruslan
family: Salakhutdinov
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4742-4751
id: srouji18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4742
lastpage: 4751
published: 2018-07-03 00:00:00 +0000
- title: 'Approximation Algorithms for Cascading Prediction Models'
abstract: 'We present an approximation algorithm that takes a pool of pre-trained models as input and produces from it a cascaded model with similar accuracy but lower average-case cost. Applied to state-of-the-art ImageNet classification models, this yields up to a 2x reduction in floating point multiplications, and up to a 6x reduction in average-case memory I/O. The auto-generated cascades exhibit intuitive properties, such as using lower-resolution input for easier images and requiring higher prediction confidence when using a computationally cheaper model.'
volume: 80
URL: https://proceedings.mlr.press/v80/streeter18a.html
PDF: http://proceedings.mlr.press/v80/streeter18a/streeter18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-streeter18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Streeter
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4752-4760
id: streeter18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4752
lastpage: 4760
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Low-Dimensional Temporal Representations'
abstract: 'Low-dimensional discriminative representations enhance machine learning methods in both performance and complexity, motivating supervised dimensionality reduction (DR) that transforms high-dimensional data to a discriminative subspace. Most DR methods require data to be i.i.d., however, in some domains, data naturally come in sequences, where the observations are temporally correlated. We propose a DR method called LT-LDA to learn low-dimensional temporal representations. We construct the separability among sequence classes by lifting the holistic temporal structures, which are established based on temporal alignments and may change in different subspaces. We jointly learn the subspace and the associated alignments by optimizing an objective which favors easily-separable temporal structures, and show that this objective is connected to the inference of alignments, thus allows an iterative solution. We provide both theoretical insight and empirical evaluation on real-world sequence datasets to show the interest of our method.'
volume: 80
URL: https://proceedings.mlr.press/v80/su18a.html
PDF: http://proceedings.mlr.press/v80/su18a/su18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-su18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bing
family: Su
- given: Ying
family: Wu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4761-4770
id: su18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4761
lastpage: 4770
published: 2018-07-03 00:00:00 +0000
- title: 'Exploiting the Potential of Standard Convolutional Autoencoders for Image Restoration by Evolutionary Search'
abstract: 'Researchers have applied deep neural networks to image restoration tasks, in which they proposed various network architectures, loss functions, and training methods. In particular, adversarial training, which is employed in recent studies, seems to be a key ingredient to success. In this paper, we show that simple convolutional autoencoders (CAEs) built upon only standard network components, i.e., convolutional layers and skip connections, can outperform the state-of-the-art methods which employ adversarial training and sophisticated loss functions. The secret is to search for good architectures using an evolutionary algorithm. All we did was to train the optimized CAEs by minimizing the l2 loss between reconstructed images and their ground truths using the ADAM optimizer. Our experimental results show that this approach achieves 27.8 dB peak signal to noise ratio (PSNR) on the CelebA dataset and 33.3 dB on the SVHN dataset, compared to 22.8 dB and 19.0 dB provided by the former state-of-the-art methods, respectively.'
volume: 80
URL: https://proceedings.mlr.press/v80/suganuma18a.html
PDF: http://proceedings.mlr.press/v80/suganuma18a/suganuma18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-suganuma18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Masanori
family: Suganuma
- given: Mete
family: Ozay
- given: Takayuki
family: Okatani
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4771-4780
id: suganuma18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4771
lastpage: 4780
published: 2018-07-03 00:00:00 +0000
- title: 'Stagewise Safe Bayesian Optimization with Gaussian Processes'
abstract: 'Enforcing safety is a key aspect of many problems pertaining to sequential decision making under uncertainty, which require the decisions made at every step to be both informative of the optimal decision and also safe. For example, we value both efficacy and comfort in medical therapy, and efficiency and safety in robotic control. We consider this problem of optimizing an unknown utility function with absolute feedback or preference feedback subject to unknown safety constraints. We develop an efficient safe Bayesian optimization algorithm, StageOpt, that separates safe region expansion and utility function maximization into two distinct stages. Compared to existing approaches which interleave between expansion and optimization, we show that StageOpt is more efficient and naturally applicable to a broader class of problems. We provide theoretical guarantees for both the satisfaction of safety constraints as well as convergence to the optimal utility value. We evaluate StageOpt on both a variety of synthetic experiments, as well as in clinical practice. We demonstrate that StageOpt is more effective than existing safe optimization approaches, and is able to safely and effectively optimize spinal cord stimulation therapy in our clinical experiments.'
volume: 80
URL: https://proceedings.mlr.press/v80/sui18a.html
PDF: http://proceedings.mlr.press/v80/sui18a/sui18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sui18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yanan
family: Sui
- given: Vincent
family: Zhuang
- given: Joel
family: Burdick
- given: Yisong
family: Yue
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4781-4789
id: sui18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4781
lastpage: 4789
published: 2018-07-03 00:00:00 +0000
- title: 'Neural Program Synthesis from Diverse Demonstration Videos'
abstract: 'Interpreting decision making logic in demonstration videos is key to collaborating with and mimicking humans. To empower machines with this ability, we propose a neural program synthesizer that is able to explicitly synthesize underlying programs from behaviorally diverse and visually complicated demonstration videos. We introduce a summarizer module as part of our model to improve the network’s ability to integrate multiple demonstrations varying in behavior. We also employ a multi-task objective to encourage the model to learn meaningful intermediate representations for end-to-end training. We show that our model is able to reliably synthesize underlying programs as well as capture diverse behaviors exhibited in demonstrations. The code is available at https://shaohua0116.github.io/demo2program.'
volume: 80
URL: https://proceedings.mlr.press/v80/sun18a.html
PDF: http://proceedings.mlr.press/v80/sun18a/sun18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sun18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shao-Hua
family: Sun
- given: Hyeonwoo
family: Noh
- given: Sriram
family: Somasundaram
- given: Joseph
family: Lim
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4790-4799
id: sun18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4790
lastpage: 4799
published: 2018-07-03 00:00:00 +0000
- title: 'Scalable approximate Bayesian inference for particle tracking data'
abstract: 'Many important datasets in physics, chemistry, and biology consist of noisy sequences of images of multiple moving overlapping particles. In many cases, the observed particles are indistinguishable, leading to unavoidable uncertainty about nearby particles’ identities. Exact Bayesian inference is intractable in this setting, and previous approximate Bayesian methods scale poorly. Non-Bayesian approaches that output a single “best” estimate of the particle tracks (thus discarding important uncertainty information) are therefore dominant in practice. Here we propose a flexible and scalable amortized approach for Bayesian inference on this task. We introduce a novel neural network method to approximate the (intractable) filter-backward-sample-forward algorithm for Bayesian inference in this setting. By varying the simulated training data for the network, we can perform inference on a wide variety of data types. This approach is therefore highly flexible and improves on the state of the art in terms of accuracy; provides uncertainty estimates about the particle locations and identities; and has a test run-time that scales linearly as a function of the data length and number of particles, thus enabling Bayesian inference in arbitrarily large particle tracking datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/sun18b.html
PDF: http://proceedings.mlr.press/v80/sun18b/sun18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sun18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ruoxi
family: Sun
- given: Liam
family: Paninski
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4800-4809
id: sun18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4800
lastpage: 4809
published: 2018-07-03 00:00:00 +0000
- title: 'Graphical Nonconvex Optimization via an Adaptive Convex Relaxation'
abstract: 'We consider the problem of learning high-dimensional Gaussian graphical models. The graphical lasso is one of the most popular methods for estimating Gaussian graphical models. However, it does not achieve the oracle rate of convergence. In this paper, we propose the graphical nonconvex optimization for optimal estimation in Gaussian graphical models, which is then approximated by a sequence of convex programs. Our proposal is computationally tractable and produces an estimator that achieves the oracle rate of convergence. The statistical error introduced by the sequential approximation using a sequence of convex programs is clearly demonstrated via a contraction property. The proposed methodology is then extended to modeling semiparametric graphical models. We show via numerical studies that the proposed estimator outperforms other popular methods for estimating Gaussian graphical models.'
volume: 80
URL: https://proceedings.mlr.press/v80/sun18c.html
PDF: http://proceedings.mlr.press/v80/sun18c/sun18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sun18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Qiang
family: Sun
- given: Kean Ming
family: Tan
- given: Han
family: Liu
- given: Tong
family: Zhang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4810-4817
id: sun18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4810
lastpage: 4817
published: 2018-07-03 00:00:00 +0000
- title: 'Convolutional Imputation of Matrix Networks'
abstract: 'A matrix network is a family of matrices, with their relations modeled as a weighted graph. We consider the task of completing a partially observed matrix network. The observation comes from a novel sampling scheme where a fraction of matrices might be completely unobserved. How can we recover the entire matrix network from incomplete observations? This mathematical problem arises in many applications including medical imaging and social networks. To recover the matrix network, we propose a structural assumption that the matrices are low-rank after the graph Fourier transform on the network. We formulate a convex optimization problem and prove an exact recovery guarantee for the optimization problem. Furthermore, we numerically characterize the exact recovery regime for varying rank and sampling rate and discover a new phase transition phenomenon. Then we give an iterative imputation algorithm to efficiently solve optimization problem and complete large scale matrix networks. We demonstrate the algorithm with a variety of applications such as MRI and Facebook user network.'
volume: 80
URL: https://proceedings.mlr.press/v80/sun18d.html
PDF: http://proceedings.mlr.press/v80/sun18d/sun18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sun18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Qingyun
family: Sun
- given: Mengyuan
family: Yan
- given: David
family: Donoho
- given:
family: boyd
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4818-4827
id: sun18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4818
lastpage: 4827
published: 2018-07-03 00:00:00 +0000
- title: 'Differentiable Compositional Kernel Learning for Gaussian Processes'
abstract: 'The generalization properties of Gaussian processes depend heavily on the choice of kernel, and this choice remains a dark art. We present the Neural Kernel Network (NKN), a flexible family of kernels represented by a neural network. The NKN’s architecture is based on the composition rules for kernels, so that each unit of the network corresponds to a valid kernel. It can compactly approximate compositional kernel structures such as those used by the Automatic Statistician (Lloyd et al., 2014), but because the architecture is differentiable, it is end-to-end trainable with gradient- based optimization. We show that the NKN is universal for the class of stationary kernels. Empirically we demonstrate NKN’s pattern discovery and extrapolation abilities on several tasks that depend crucially on identifying the underlying structure, including time series and texture extrapolation, as well as Bayesian optimization.'
volume: 80
URL: https://proceedings.mlr.press/v80/sun18e.html
PDF: http://proceedings.mlr.press/v80/sun18e/sun18e.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-sun18e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shengyang
family: Sun
- given: Guodong
family: Zhang
- given: Chaoqi
family: Wang
- given: Wenyuan
family: Zeng
- given: Jiaman
family: Li
- given: Roger
family: Grosse
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4828-4837
id: sun18e
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4828
lastpage: 4837
published: 2018-07-03 00:00:00 +0000
- title: 'Learning the Reward Function for a Misspecified Model'
abstract: 'In model-based reinforcement learning it is typical to decouple the problems of learning the dynamics model and learning the reward function. However, when the dynamics model is flawed, it may generate erroneous states that would never occur in the true environment. It is not clear a priori what value the reward function should assign to such states. This paper presents a novel error bound that accounts for the reward model’s behavior in states sampled from the model. This bound is used to extend the existing Hallucinated DAgger-MC algorithm, which offers theoretical performance guarantees in deterministic MDPs that do not assume a perfect model can be learned. Empirically, this approach to reward learning can yield dramatic improvements in control performance when the dynamics model is flawed.'
volume: 80
URL: https://proceedings.mlr.press/v80/talvitie18a.html
PDF: http://proceedings.mlr.press/v80/talvitie18a/talvitie18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-talvitie18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Erik
family: Talvitie
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4838-4847
id: talvitie18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4838
lastpage: 4847
published: 2018-07-03 00:00:00 +0000
- title: '$D^2$: Decentralized Training over Decentralized Data'
abstract: 'While training a machine learning model using multiple workers, each of which collects data from its own data source, it would be useful when the data collected from different workers are *unique* and *different*. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are *not too different*. In this paper, we ask the question: *Can we design a decentralized parallel stochastic gradient descent algorithm that is less sensitive to the data variance across workers?* In this paper, we present D$^2$, a novel decentralized parallel stochastic gradient descent algorithm designed for large data variance \xr{among workers} (imprecisely, “decentralized” data). The core of D$^2$ is a variance reduction extension of D-PSGD. It improves the convergence rate from $O\left({\sigma \over \sqrt{nT}} + {(n\zeta^2)^{\frac{1}{3}} \over T^{2/3}}\right)$ to $O\left({\sigma \over \sqrt{nT}}\right)$ where $\zeta^{2}$ denotes the variance among data on different workers. As a result, D$^2$ is robust to data variance among workers. We empirically evaluated D$^2$ on image classification tasks, where each worker has access to only the data of a limited set of labels, and find that D$^2$ significantly outperforms D-PSGD.'
volume: 80
URL: https://proceedings.mlr.press/v80/tang18a.html
PDF: http://proceedings.mlr.press/v80/tang18a/tang18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tang18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hanlin
family: Tang
- given: Xiangru
family: Lian
- given: Ming
family: Yan
- given: Ce
family: Zhang
- given: Ji
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4848-4856
id: tang18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4848
lastpage: 4856
published: 2018-07-03 00:00:00 +0000
- title: 'Neural Inverse Rendering for General Reflectance Photometric Stereo'
abstract: 'We present a novel convolutional neural network architecture for photometric stereo (Woodham, 1980), a problem of recovering 3D object surface normals from multiple images observed under varying illuminations. Despite its long history in computer vision, the problem still shows fundamental challenges for surfaces with unknown general reflectance properties (BRDFs). Leveraging deep neural networks to learn complicated reflectance models is promising, but studies in this direction are very limited due to difficulties in acquiring accurate ground truth for training and also in designing networks invariant to permutation of input images. In order to address these challenges, we propose a physics based unsupervised learning framework where surface normals and BRDFs are predicted by the network and fed into the rendering equation to synthesize observed images. The network weights are optimized during testing by minimizing reconstruction loss between observed and synthesized images. Thus, our learning process does not require ground truth normals or even pre-training on external images. Our method is shown to achieve the state-of-the-art performance on a challenging real-world scene benchmark.'
volume: 80
URL: https://proceedings.mlr.press/v80/taniai18a.html
PDF: http://proceedings.mlr.press/v80/taniai18a/taniai18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-taniai18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tatsunori
family: Taniai
- given: Takanori
family: Maehara
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4857-4866
id: taniai18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4857
lastpage: 4866
published: 2018-07-03 00:00:00 +0000
- title: 'Black Box FDR'
abstract: 'Analyzing large-scale, multi-experiment studies requires scientists to test each experimental outcome for statistical significance and then assess the results as a whole. We present Black Box FDR (BB-FDR), an empirical-Bayes method for analyzing multi-experiment studies when many covariates are gathered per experiment. BB-FDR learns a series of black box predictive models to boost power and control the false discovery rate (FDR) at two stages of study analysis. In Stage 1, it uses a deep neural network prior to report which experiments yielded significant outcomes. In Stage 2, a separate black box model of each covariate is used to select features that have significant predictive power across all experiments. In benchmarks, BB-FDR outperforms competing state-of-the-art methods in both stages of analysis. We apply BB-FDR to two real studies on cancer drug efficacy. For both studies, BB-FDR increases the proportion of significant outcomes discovered and selects variables that reveal key genomic drivers of drug sensitivity and resistance in cancer.'
volume: 80
URL: https://proceedings.mlr.press/v80/tansey18a.html
PDF: http://proceedings.mlr.press/v80/tansey18a/tansey18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tansey18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wesley
family: Tansey
- given: Yixin
family: Wang
- given: David
family: Blei
- given: Raul
family: Rabadan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4867-4876
id: tansey18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4867
lastpage: 4876
published: 2018-07-03 00:00:00 +0000
- title: 'Best Arm Identification in Linear Bandits with Linear Dimension Dependency'
abstract: 'We study the best arm identification problem in linear bandits, where the mean reward of each arm depends linearly on an unknown $d$-dimensional parameter vector $\theta$, and the goal is to identify the arm with the largest expected reward. We first design and analyze a novel randomized $\theta$ estimator based on the solution to the convex relaxation of an optimal $G$-allocation experiment design problem. Using this estimator, we describe an algorithm whose sample complexity depends linearly on the dimension $d$, as well as an algorithm with sample complexity dependent on the reward gaps of the best $d$ arms, matching the lower bound arising from the ordinary top-arm identification problem. We finally compare the empirical performance of our algorithms with other state-of-the-art algorithms in terms of both sample complexity and computational time.'
volume: 80
URL: https://proceedings.mlr.press/v80/tao18a.html
PDF: http://proceedings.mlr.press/v80/tao18a/tao18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tao18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chao
family: Tao
- given: Saúl
family: Blanco
- given: Yuan
family: Zhou
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4877-4886
id: tao18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4877
lastpage: 4886
published: 2018-07-03 00:00:00 +0000
- title: 'Chi-square Generative Adversarial Network'
abstract: 'To assess the difference between real and synthetic data, Generative Adversarial Networks (GANs) are trained using a distribution discrepancy measure. Three widely employed measures are information-theoretic divergences, integral probability metrics, and Hilbert space discrepancy metrics. We elucidate the theoretical connections between these three popular GAN training criteria and propose a novel procedure, called $\chi^2$ (Chi-square) GAN, that is conceptually simple, stable at training and resistant to mode collapse. Our procedure naturally generalizes to address the problem of simultaneous matching of multiple distributions. Further, we propose a resampling strategy that significantly improves sample quality, by repurposing the trained critic function via an importance weighting mechanism. Experiments show that the proposed procedure improves stability and convergence, and yields state-of-art results on a wide range of generative modeling tasks.'
volume: 80
URL: https://proceedings.mlr.press/v80/tao18b.html
PDF: http://proceedings.mlr.press/v80/tao18b/tao18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tao18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chenyang
family: Tao
- given: Liqun
family: Chen
- given: Ricardo
family: Henao
- given: Jianfeng
family: Feng
- given: Lawrence Carin
family: Duke
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4887-4896
id: tao18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4887
lastpage: 4896
published: 2018-07-03 00:00:00 +0000
- title: 'Lyapunov Functions for First-Order Methods: Tight Automated Convergence Guarantees'
abstract: 'We present a novel way of generating Lyapunov functions for proving linear convergence rates of first-order optimization methods. Our approach provably obtains the fastest linear convergence rate that can be verified by a quadratic Lyapunov function (with given states), and only relies on solving a small-sized semidefinite program. Our approach combines the advantages of performance estimation problems (PEP, due to Drori and Teboulle (2014)) and integral quadratic constraints (IQC, due to Lessard et al. (2016)), and relies on convex interpolation (due to Taylor et al. (2017c;b)).'
volume: 80
URL: https://proceedings.mlr.press/v80/taylor18a.html
PDF: http://proceedings.mlr.press/v80/taylor18a/taylor18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-taylor18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Adrien
family: Taylor
- given: Bryan
family: Van Scoy
- given: Laurent
family: Lessard
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4897-4906
id: taylor18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4897
lastpage: 4906
published: 2018-07-03 00:00:00 +0000
- title: 'Bayesian Uncertainty Estimation for Batch Normalized Deep Networks'
abstract: 'We show that training a deep network using batch normalization is equivalent to approximate inference in Bayesian models. We further demonstrate that this finding allows us to make meaningful estimates of the model uncertainty using conventional architectures, without modifications to the network or the training procedure. Our approach is thoroughly validated by measuring the quality of uncertainty in a series of empirical experiments on different tasks. It outperforms baselines with strong statistical significance, and displays competitive performance with recent Bayesian approaches.'
volume: 80
URL: https://proceedings.mlr.press/v80/teye18a.html
PDF: http://proceedings.mlr.press/v80/teye18a/teye18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-teye18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mattias
family: Teye
- given: Hossein
family: Azizpour
- given: Kevin
family: Smith
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4907-4916
id: teye18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4907
lastpage: 4916
published: 2018-07-03 00:00:00 +0000
- title: 'Decoupling Gradient-Like Learning Rules from Representations'
abstract: 'In machine learning, learning often corresponds to changing the parameters of a parameterized function. A learning rule is an algorithm or mathematical expression that specifies precisely how the parameters should be changed. When creating a machine learning system, we must make two decisions: what representation should be used (i.e., what parameterized function should be used) and what learning rule should be used to search through the resulting set of representable functions. In this paper we focus on gradient-like learning rules, wherein these two decisions are coupled in a subtle (and often unintentional) way. Using most learning rules, these two decisions are coupled in a subtle (and often unintentional) way. That is, using the same learning rule with two different representations that can represent the same sets of functions can result in two different outcomes. After arguing that this coupling is undesirable, particularly when using neural networks, we present a method for partially decoupling these two decisions for a broad class of gradient-like learning rules that span unsupervised learning, reinforcement learning, and supervised learning.'
volume: 80
URL: https://proceedings.mlr.press/v80/thomas18a.html
PDF: http://proceedings.mlr.press/v80/thomas18a/thomas18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-thomas18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Philip
family: Thomas
- given: Christoph
family: Dann
- given: Emma
family: Brunskill
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4917-4925
id: thomas18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4917
lastpage: 4925
published: 2018-07-03 00:00:00 +0000
- title: 'CoVeR: Learning Covariate-Specific Vector Representations with Tensor Decompositions'
abstract: 'Word embedding is a useful approach to capture co-occurrence structures in large text corpora. However, in addition to the text data itself, we often have additional covariates associated with individual corpus documents—e.g. the demographic of the author, time and venue of publication—and we would like the embedding to naturally capture this information. We propose CoVeR, a new tensor decomposition model for vector embeddings with covariates. CoVeR jointly learns a *base* embedding for all the words as well as a weighted diagonal matrix to model how each covariate affects the base embedding. To obtain author or venue-specific embedding, for example, we can then simply multiply the base embedding by the associated transformation matrix. The main advantages of our approach are data efficiency and interpretability of the covariate transformation. Our experiments demonstrate that our joint model learns substantially better covariate-specific embeddings compared to the standard approach of learning a separate embedding for each covariate using only the relevant subset of data, as well as other related methods. Furthermore, CoVeR encourages the embeddings to be “topic-aligned” in that the dimensions have specific independent meanings. This allows our covariate-specific embeddings to be compared by topic, enabling downstream differential analysis. We empirically evaluate the benefits of our algorithm on datasets, and demonstrate how it can be used to address many natural questions about covariate effects.'
volume: 80
URL: https://proceedings.mlr.press/v80/tian18a.html
PDF: http://proceedings.mlr.press/v80/tian18a/tian18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tian18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kevin
family: Tian
- given: Teng
family: Zhang
- given: James
family: Zou
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4926-4935
id: tian18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4926
lastpage: 4935
published: 2018-07-03 00:00:00 +0000
- title: 'Importance Weighted Transfer of Samples in Reinforcement Learning'
abstract: 'We consider the transfer of experience samples (i.e., tuples < s, a, s’, r >) in reinforcement learning (RL), collected from a set of source tasks to improve the learning process in a given target task. Most of the related approaches focus on selecting the most relevant source samples for solving the target task, but then all the transferred samples are used without considering anymore the discrepancies between the task models. In this paper, we propose a model-based technique that automatically estimates the relevance (importance weight) of each source sample for solving the target task. In the proposed approach, all the samples are transferred and used by a batch RL algorithm to solve the target task, but their contribution to the learning process is proportional to their importance weight. By extending the results for importance weighting provided in supervised learning literature, we develop a finite-sample analysis of the proposed batch RL algorithm. Furthermore, we empirically compare the proposed algorithm to state-of-the-art approaches, showing that it achieves better learning performance and is very robust to negative transfer, even when some source tasks are significantly different from the target task.'
volume: 80
URL: https://proceedings.mlr.press/v80/tirinzoni18a.html
PDF: http://proceedings.mlr.press/v80/tirinzoni18a/tirinzoni18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tirinzoni18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrea
family: Tirinzoni
- given: Andrea
family: Sessa
- given: Matteo
family: Pirotta
- given: Marcello
family: Restelli
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4936-4945
id: tirinzoni18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4936
lastpage: 4945
published: 2018-07-03 00:00:00 +0000
- title: 'Adversarial Regression with Multiple Learners'
abstract: 'Despite the considerable success enjoyed by machine learning techniques in practice, numerous studies demonstrated that many approaches are vulnerable to attacks. An important class of such attacks involves adversaries changing features at test time to cause incorrect predictions. Previous investigations of this problem pit a single learner against an adversary. However, in many situations an adversary’s decision is aimed at a collection of learners, rather than specifically targeted at each independently. We study the problem of adversarial linear regression with multiple learners. We approximate the resulting game by exhibiting an upper bound on learner loss functions, and show that the resulting game has a unique symmetric equilibrium. We present an algorithm for computing this equilibrium, and show through extensive experiments that equilibrium models are significantly more robust than conventional regularized linear regression.'
volume: 80
URL: https://proceedings.mlr.press/v80/tong18a.html
PDF: http://proceedings.mlr.press/v80/tong18a/tong18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tong18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Liang
family: Tong
- given: Sixie
family: Yu
- given: Scott
family: Alfeld
- given:
family: vorobeychik
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4946-4954
id: tong18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4946
lastpage: 4954
published: 2018-07-03 00:00:00 +0000
- title: 'Convergent Tree Backup and Retrace with Function Approximation'
abstract: 'Off-policy learning is key to scaling up reinforcement learning as it allows to learn about a target policy from the experience generated by a different behavior policy. Unfortunately, it has been challenging to combine off-policy learning with function approximation and multi-step bootstrapping in a way that leads to both stable and efficient algorithms. In this work, we show that the Tree Backup and Retrace algorithms are unstable with linear function approximation, both in theory and in practice with specific examples. Based on our analysis, we then derive stable and efficient gradient-based algorithms using a quadratic convex-concave saddle-point formulation. By exploiting the problem structure proper to these algorithms, we are able to provide convergence guarantees and finite-sample bounds. The applicability of our new analysis also goes beyond Tree Backup and Retrace and allows us to provide new convergence rates for the GTD and GTD2 algorithms without having recourse to projections or Polyak averaging.'
volume: 80
URL: https://proceedings.mlr.press/v80/touati18a.html
PDF: http://proceedings.mlr.press/v80/touati18a/touati18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-touati18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ahmed
family: Touati
- given: Pierre-Luc
family: Bacon
- given: Doina
family: Precup
- given: Pascal
family: Vincent
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4955-4964
id: touati18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4955
lastpage: 4964
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Longer-term Dependencies in RNNs with Auxiliary Losses'
abstract: 'Despite recent advances in training recurrent neural networks (RNNs), capturing long-term dependencies in sequences remains a fundamental challenge. Most approaches use backpropagation through time (BPTT), which is difficult to scale to very long sequences. This paper proposes a simple method that improves the ability to capture long term dependencies in RNNs by adding an unsupervised auxiliary loss to the original objective. This auxiliary loss forces RNNs to either reconstruct previous events or predict next events in a sequence, making truncated backpropagation feasible for long sequences and also improving full BPTT. We evaluate our method on a variety of settings, including pixel-by-pixel image classification with sequence lengths up to 16000, and a real document classification benchmark. Our results highlight good performance and resource efficiency of this approach over competitive baselines, including other recurrent models and a comparable sized Transformer. Further analyses reveal beneficial effects of the auxiliary loss on optimization and regularization, as well as extreme cases where there is little to no backpropagation.'
volume: 80
URL: https://proceedings.mlr.press/v80/trinh18a.html
PDF: http://proceedings.mlr.press/v80/trinh18a/trinh18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-trinh18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Trieu
family: Trinh
- given: Andrew
family: Dai
- given: Thang
family: Luong
- given: Quoc
family: Le
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4965-4974
id: trinh18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4965
lastpage: 4974
published: 2018-07-03 00:00:00 +0000
- title: 'Theoretical Analysis of Sparse Subspace Clustering with Missing Entries'
abstract: 'Sparse Subspace Clustering (SSC) is a popular unsupervised machine learning method for clustering data lying close to an unknown union of low-dimensional linear subspaces; a problem with numerous applications in pattern recognition and computer vision. Even though the behavior of SSC for complete data is by now well-understood, little is known about its theoretical properties when applied to data with missing entries. In this paper we give theoretical guarantees for SSC with incomplete data, and provide theoretical evidence that projecting the zero-filled data onto the observation pattern of the point being expressed can lead to substantial improvement in performance; a phenomenon already known experimentally. The main insight of our analysis is that even though this projection induces additional missing entries, this is counterbalanced by the fact that the projected and zero-filled data are in effect incomplete points associated with the union of the corresponding projected subspaces, with respect to which the point being expressed is complete. The significance of this phenomenon potentially extends to the entire class of self-expressive methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/tsakiris18a.html
PDF: http://proceedings.mlr.press/v80/tsakiris18a/tsakiris18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tsakiris18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Manolis
family: Tsakiris
- given: Rene
family: Vidal
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4975-4984
id: tsakiris18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4975
lastpage: 4984
published: 2018-07-03 00:00:00 +0000
- title: 'StrassenNets: Deep Learning with a Multiplication Budget'
abstract: 'A large fraction of the arithmetic operations required to evaluate deep neural networks (DNNs) consists of matrix multiplications, in both convolution and fully connected layers. We perform end-to-end learning of low-cost approximations of matrix multiplications in DNN layers by casting matrix multiplications as 2-layer sum-product networks (SPNs) (arithmetic circuits) and learning their (ternary) edge weights from data. The SPNs disentangle multiplication and addition operations and enable us to impose a budget on the number of multiplication operations. Combining our method with knowledge distillation and applying it to image classification DNNs (trained on ImageNet) and language modeling DNNs (using LSTMs), we obtain a first-of-a-kind reduction in number of multiplications (over 99.5%) while maintaining the predictive performance of the full-precision models. Finally, we demonstrate that the proposed framework is able to rediscover Strassen’s matrix multiplication algorithm, learning to multiply $2 \times 2$ matrices using only 7 multiplications instead of 8.'
volume: 80
URL: https://proceedings.mlr.press/v80/tschannen18a.html
PDF: http://proceedings.mlr.press/v80/tschannen18a/tschannen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tschannen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Michael
family: Tschannen
- given: Aran
family: Khanna
- given: Animashree
family: Anandkumar
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4985-4994
id: tschannen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4985
lastpage: 4994
published: 2018-07-03 00:00:00 +0000
- title: 'Invariance of Weight Distributions in Rectified MLPs'
abstract: 'An interesting approach to analyzing neural networks that has received renewed attention is to examine the equivalent kernel of the neural network. This is based on the fact that a fully connected feedforward network with one hidden layer, a certain weight distribution, an activation function, and an infinite number of neurons can be viewed as a mapping into a Hilbert space. We derive the equivalent kernels of MLPs with ReLU or Leaky ReLU activations for all rotationally-invariant weight distributions, generalizing a previous result that required Gaussian weight distributions. Additionally, the Central Limit Theorem is used to show that for certain activation functions, kernels corresponding to layers with weight distributions having $0$ mean and finite absolute third moment are asymptotically universal, and are well approximated by the kernel corresponding to layers with spherical Gaussian weights. In deep networks, as depth increases the equivalent kernel approaches a pathological fixed point, which can be used to argue why training randomly initialized networks can be difficult. Our results also have implications for weight initialization.'
volume: 80
URL: https://proceedings.mlr.press/v80/tsuchida18a.html
PDF: http://proceedings.mlr.press/v80/tsuchida18a/tsuchida18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tsuchida18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Russell
family: Tsuchida
- given: Fred
family: Roosta
- given: Marcus
family: Gallagher
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 4995-5004
id: tsuchida18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 4995
lastpage: 5004
published: 2018-07-03 00:00:00 +0000
- title: 'Least-Squares Temporal Difference Learning for the Linear Quadratic Regulator'
abstract: 'Reinforcement learning (RL) has been successfully used to solve many continuous control tasks. Despite its impressive results however, fundamental questions regarding the sample complexity of RL on continuous problems remain open. We study the performance of RL in this setting by considering the behavior of the Least-Squares Temporal Difference (LSTD) estimator on the classic Linear Quadratic Regulator (LQR) problem from optimal control. We give the first finite-time analysis of the number of samples needed to estimate the value function for a fixed static state-feedback policy to within epsilon-relative error. In the process of deriving our result, we give a general characterization for when the minimum eigenvalue of the empirical covariance matrix formed along the sample path of a fast-mixing stochastic process concentrates above zero, extending a result by Koltchinskii and Mendelson in the independent covariates setting. Finally, we provide experimental evidence indicating that our analysis correctly captures the qualitative behavior of LSTD on several LQR instances.'
volume: 80
URL: https://proceedings.mlr.press/v80/tu18a.html
PDF: http://proceedings.mlr.press/v80/tu18a/tu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Stephen
family: Tu
- given: Benjamin
family: Recht
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5005-5014
id: tu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5005
lastpage: 5014
published: 2018-07-03 00:00:00 +0000
- title: 'The Mirage of Action-Dependent Baselines in Reinforcement Learning'
abstract: 'Policy gradient methods are a widely used class of model-free reinforcement learning algorithms where a state-dependent baseline is used to reduce gradient estimator variance. Several recent papers extend the baseline to depend on both the state and action and suggest that this significantly reduces variance and improves sample efficiency without introducing bias into the gradient estimates. To better understand this development, we decompose the variance of the policy gradient estimator and numerically show that learned state-action-dependent baselines do not in fact reduce variance over a state-dependent baseline in commonly tested benchmark domains. We confirm this unexpected result by reviewing the open-source code accompanying these prior papers, and show that subtle implementation decisions cause deviations from the methods presented in the papers and explain the source of the previously observed empirical gains. Furthermore, the variance decomposition highlights areas for improvement, which we demonstrate by illustrating a simple change to the typical value function parameterization that can significantly improve performance.'
volume: 80
URL: https://proceedings.mlr.press/v80/tucker18a.html
PDF: http://proceedings.mlr.press/v80/tucker18a/tucker18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-tucker18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: George
family: Tucker
- given: Surya
family: Bhupatiraju
- given: Shixiang
family: Gu
- given: Richard
family: Turner
- given: Zoubin
family: Ghahramani
- given: Sergey
family: Levine
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5015-5024
id: tucker18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5015
lastpage: 5024
published: 2018-07-03 00:00:00 +0000
- title: 'Adversarial Risk and the Dangers of Evaluating Against Weak Attacks'
abstract: 'This paper investigates recently proposed approaches for defending against adversarial examples and evaluating adversarial robustness. We motivate *adversarial risk* as an objective for achieving models robust to worst-case inputs. We then frame commonly used attacks and evaluation metrics as defining a tractable surrogate objective to the true adversarial risk. This suggests that models may optimize this surrogate rather than the true adversarial risk. We formalize this notion as *obscurity to an adversary*, and develop tools and heuristics for identifying obscured models and designing transparent models. We demonstrate that this is a significant problem in practice by repurposing gradient-free optimization techniques into adversarial attacks, which we use to decrease the accuracy of several recently proposed defenses to near zero. Our hope is that our formulations and results will help researchers to develop more powerful defenses.'
volume: 80
URL: https://proceedings.mlr.press/v80/uesato18a.html
PDF: http://proceedings.mlr.press/v80/uesato18a/uesato18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-uesato18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonathan
family: Uesato
- given: Brendan
family: O’Donoghue
- given: Pushmeet
family: Kohli
- given: Aaron
family: Oord
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5025-5034
id: uesato18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5025
lastpage: 5034
published: 2018-07-03 00:00:00 +0000
- title: 'DVAE++: Discrete Variational Autoencoders with Overlapping Transformations'
abstract: 'Training of discrete latent variable models remains challenging because passing gradient information through discrete units is difficult. We propose a new class of smoothing transformations based on a mixture of two overlapping distributions, and show that the proposed transformation can be used for training binary latent models with either directed or undirected priors. We derive a new variational bound to efficiently train with Boltzmann machine priors. Using this bound, we develop DVAE++, a generative model with a global discrete prior and a hierarchy of convolutional continuous variables. Experiments on several benchmarks show that overlapping transformations outperform other recent continuous relaxations of discrete latent variables including Gumbel-Softmax (Maddison et al., 2016; Jang et al., 2016), and discrete variational autoencoders (Rolfe 2016).'
volume: 80
URL: https://proceedings.mlr.press/v80/vahdat18a.html
PDF: http://proceedings.mlr.press/v80/vahdat18a/vahdat18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-vahdat18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arash
family: Vahdat
- given: William
family: Macready
- given: Zhengbing
family: Bian
- given: Amir
family: Khoshaman
- given: Evgeny
family: Andriyash
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5035-5044
id: vahdat18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5035
lastpage: 5044
published: 2018-07-03 00:00:00 +0000
- title: 'Programmatically Interpretable Reinforcement Learning'
abstract: 'We present a reinforcement learning framework, called Programmatically Interpretable Reinforcement Learning (PIRL), that is designed to generate interpretable and verifiable agent policies. Unlike the popular Deep Reinforcement Learning (DRL) paradigm, which represents policies by neural networks, PIRL represents policies using a high-level, domain-specific programming language. Such programmatic policies have the benefits of being more easily interpreted than neural networks, and being amenable to verification by symbolic methods. We propose a new method, called Neurally Directed Program Search (NDPS), for solving the challenging nonsmooth optimization problem of finding a programmatic policy with maximal reward. NDPS works by first learning a neural policy network using DRL, and then performing a local search over programmatic policies that seeks to minimize a distance from this neural “oracle”. We evaluate NDPS on the task of learning to drive a simulated car in the TORCS car-racing environment. We demonstrate that NDPS is able to discover human-readable policies that pass some significant performance bars. We also show that PIRL policies can have smoother trajectories, and can be more easily transferred to environments not encountered during training, than corresponding policies discovered by DRL.'
volume: 80
URL: https://proceedings.mlr.press/v80/verma18a.html
PDF: http://proceedings.mlr.press/v80/verma18a/verma18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-verma18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Abhinav
family: Verma
- given: Vijayaraghavan
family: Murali
- given: Rishabh
family: Singh
- given: Pushmeet
family: Kohli
- given: Swarat
family: Chaudhuri
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5045-5054
id: verma18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5045
lastpage: 5054
published: 2018-07-03 00:00:00 +0000
- title: 'Clustering Semi-Random Mixtures of Gaussians'
abstract: 'Gaussian mixture models (GMM) are the most widely used statistical model for the k-means clustering problem and form a popular framework for clustering in machine learning and data analysis. In this paper, we propose a natural robust model for k-means clustering that generalizes the Gaussian mixture model, and that we believe will be useful in identifying robust algorithms. Our first contribution is a polynomial time algorithm that provably recovers the ground-truth up to small classification error w.h.p., assuming certain separation between the components. Perhaps surprisingly, the algorithm we analyze is the popular Lloyd’s algorithm for k-means clustering that is the method-of-choice in practice. Our second result complements the upper bound by giving a nearly matching lower bound on the number of misclassified points incurred by any k-means clustering algorithm on the semi-random model.'
volume: 80
URL: https://proceedings.mlr.press/v80/vijayaraghavan18a.html
PDF: http://proceedings.mlr.press/v80/vijayaraghavan18a/vijayaraghavan18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-vijayaraghavan18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aravindan
family: Vijayaraghavan
- given: Pranjal
family: Awasthi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5055-5064
id: vijayaraghavan18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5055
lastpage: 5064
published: 2018-07-03 00:00:00 +0000
- title: 'A Probabilistic Theory of Supervised Similarity Learning for Pointwise ROC Curve Optimization'
abstract: 'The performance of many machine learning techniques depends on the choice of an appropriate similarity or distance measure on the input space. Similarity learning (or metric learning) aims at building such a measure from training data so that observations with the same (resp. different) label are as close (resp. far) as possible. In this paper, similarity learning is investigated from the perspective of pairwise bipartite ranking, where the goal is to rank the elements of a database by decreasing order of the probability that they share the same label with some query data point, based on the similarity scores. A natural performance criterion in this setting is pointwise ROC optimization: maximize the true positive rate under a fixed false positive rate. We study this novel perspective on similarity learning through a rigorous probabilistic framework. The empirical version of the problem gives rise to a constrained optimization formulation involving U-statistics, for which we derive universal learning rates as well as faster rates under a noise assumption on the data distribution. We also address the large-scale setting by analyzing the effect of sampling-based approximations. Our theoretical results are supported by illustrative numerical experiments.'
volume: 80
URL: https://proceedings.mlr.press/v80/vogel18a.html
PDF: http://proceedings.mlr.press/v80/vogel18a/vogel18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-vogel18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Robin
family: Vogel
- given: Aurélien
family: Bellet
- given: Stéphan
family: Clémençon
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5065-5074
id: vogel18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5065
lastpage: 5074
published: 2018-07-03 00:00:00 +0000
- title: 'Hierarchical Multi-Label Classification Networks'
abstract: 'One of the most challenging machine learning problems is a particular case of data classification in which classes are hierarchically structured and objects can be assigned to multiple paths of the class hierarchy at the same time. This task is known as hierarchical multi-label classification (HMC), with applications in text classification, image annotation, and in bioinformatics problems such as protein function prediction. In this paper, we propose novel neural network architectures for HMC called HMCN, capable of simultaneously optimizing local and global loss functions for discovering local hierarchical class-relationships and global information from the entire class hierarchy while penalizing hierarchical violations. We evaluate its performance in 21 datasets from four distinct domains, and we compare it against the current HMC state-of-the-art approaches. Results show that HMCN substantially outperforms all baselines with statistical significance, arising as the novel state-of-the-art for HMC.'
volume: 80
URL: https://proceedings.mlr.press/v80/wehrmann18a.html
PDF: http://proceedings.mlr.press/v80/wehrmann18a/wehrmann18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wehrmann18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonatas
family: Wehrmann
- given: Ricardo
family: Cerri
- given: Rodrigo
family: Barros
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5075-5084
id: wehrmann18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5075
lastpage: 5084
published: 2018-07-03 00:00:00 +0000
- title: 'Transfer Learning via Learning to Transfer'
abstract: 'In transfer learning, what and how to transfer are two primary issues to be addressed, as different transfer learning algorithms applied between a source and a target domain result in different knowledge transferred and thereby the performance improvement in the target domain. Determining the optimal one that maximizes the performance improvement requires either exhaustive exploration or considerable expertise. Meanwhile, it is widely accepted in educational psychology that human beings improve transfer learning skills of deciding what to transfer through meta-cognitive reflection on inductive transfer learning practices. Motivated by this, we propose a novel transfer learning framework known as Learning to Transfer (L2T) to automatically determine what and how to transfer are the best by leveraging previous transfer learning experiences. We establish the L2T framework in two stages: 1) we learn a reflection function encrypting transfer learning skills from experiences; and 2) we infer what and how to transfer are the best for a future pair of domains by optimizing the reflection function. We also theoretically analyse the algorithmic stability and generalization bound of L2T, and empirically demonstrate its superiority over several state-of-the-art transfer learning algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/wei18a.html
PDF: http://proceedings.mlr.press/v80/wei18a/wei18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wei18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ying
family: WEI
- given: Yu
family: Zhang
- given: Junzhou
family: Huang
- given: Qiang
family: Yang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5085-5094
id: wei18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5085
lastpage: 5094
published: 2018-07-03 00:00:00 +0000
- title: 'Semi-Supervised Learning on Data Streams via Temporal Label Propagation'
abstract: 'We consider the problem of labeling points on a fast-moving data stream when only a small number of labeled examples are available. In our setting, incoming points must be processed efficiently and the stream is too large to store in its entirety. We present a semi-supervised learning algorithm for this task. The algorithm maintains a small synopsis of the stream which can be quickly updated as new points arrive, and labels every incoming point by provably learning from the full history of the stream. Experiments on real datasets validate that the algorithm can quickly and accurately classify points on a stream with a small quantity of labeled examples.'
volume: 80
URL: https://proceedings.mlr.press/v80/wagner18a.html
PDF: http://proceedings.mlr.press/v80/wagner18a/wagner18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wagner18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tal
family: Wagner
- given: Sudipto
family: Guha
- given: Shiva
family: Kasiviswanathan
- given: Nina
family: Mishra
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5095-5104
id: wagner18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5095
lastpage: 5104
published: 2018-07-03 00:00:00 +0000
- title: 'Neural Dynamic Programming for Musical Self Similarity'
abstract: 'We present a neural sequence model designed specifically for symbolic music. The model is based on a learned edit distance mechanism which generalises a classic recursion from computer science, leading to a neural dynamic program. Repeated motifs are detected by learning the transformations between them. We represent the arising computational dependencies using a novel data structure, the edit tree; this perspective suggests natural approximations which afford the scaling up of our otherwise cubic time algorithm. We demonstrate our model on real and synthetic data; in all cases it out-performs a strong stacked long short-term memory benchmark.'
volume: 80
URL: https://proceedings.mlr.press/v80/walder18a.html
PDF: http://proceedings.mlr.press/v80/walder18a/walder18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-walder18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christian
family: Walder
- given: Dongwoo
family: Kim
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5105-5113
id: walder18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5105
lastpage: 5113
published: 2018-07-03 00:00:00 +0000
- title: 'Thompson Sampling for Combinatorial Semi-Bandits'
abstract: 'We study the application of the Thompson sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of $O(m\log T / \Delta_{\min}) $ for TS under general CMAB, where $m$ is the number of arms, $T$ is the time horizon, and $\Delta_{\min}$ is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one cannot use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB and for which we could remove the independence assumption across arms and achieve a better regret bound. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18a.html
PDF: http://proceedings.mlr.press/v80/wang18a/wang18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siwei
family: Wang
- given: Wei
family: Chen
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5114-5122
id: wang18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5114
lastpage: 5122
published: 2018-07-03 00:00:00 +0000
- title: 'PredRNN++: Towards A Resolution of the Deep-in-Time Dilemma in Spatiotemporal Predictive Learning'
abstract: 'We present PredRNN++, a recurrent network for spatiotemporal predictive learning. In pursuit of a great modeling capability for short-term video dynamics, we make our network deeper in time by leveraging a new recurrent structure named Causal LSTM with cascaded dual memories. To alleviate the gradient propagation difficulties in deep predictive models, we propose a Gradient Highway Unit, which provides alternative quick routes for the gradient flows from outputs back to long-range previous inputs. The gradient highway units work seamlessly with the causal LSTMs, enabling our model to capture the short-term and the long-term video dependencies adaptively. Our model achieves state-of-the-art prediction results on both synthetic and real video datasets, showing its power in modeling entangled motions.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18b.html
PDF: http://proceedings.mlr.press/v80/wang18b/wang18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yunbo
family: Wang
- given: Zhifeng
family: Gao
- given: Mingsheng
family: Long
- given: Jianmin
family: Wang
- given: Philip S
family: Yu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5123-5132
id: wang18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5123
lastpage: 5132
published: 2018-07-03 00:00:00 +0000
- title: 'Analyzing the Robustness of Nearest Neighbors to Adversarial Examples'
abstract: 'Motivated by safety-critical applications, test-time attacks on classifiers via adversarial examples has recently received a great deal of attention. However, there is a general lack of understanding on why adversarial examples arise; whether they originate due to inherent properties of data or due to lack of training samples remains ill-understood. In this work, we introduce a theoretical framework analogous to bias-variance theory for understanding these effects. We use our framework to analyze the robustness of a canonical non-parametric classifier {–} the k-nearest neighbors. Our analysis shows that its robustness properties depend critically on the value of k {–} the classifier may be inherently non-robust for small k, but its robustness approaches that of the Bayes Optimal classifier for fast-growing k. We propose a novel modified 1-nearest neighbor classifier, and guarantee its robustness in the large sample limit. Our experiments suggest that this classifier may have good robustness properties even for reasonable data set sizes.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18c.html
PDF: http://proceedings.mlr.press/v80/wang18c/wang18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yizhen
family: Wang
- given: Somesh
family: Jha
- given: Kamalika
family: Chaudhuri
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5133-5142
id: wang18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5133
lastpage: 5142
published: 2018-07-03 00:00:00 +0000
- title: 'Competitive Multi-agent Inverse Reinforcement Learning with Sub-optimal Demonstrations'
abstract: 'This paper considers the problem of inverse reinforcement learning in zero-sum stochastic games when expert demonstrations are known to be suboptimal. Compared to previous works that decouple agents in the game by assuming optimality in expert policies, we introduce a new objective function that directly pits experts against Nash Equilibrium policies, and we design an algorithm to solve for the reward function in the context of inverse reinforcement learning with deep neural networks as model approximations. To ?nd Nash Equilibrium in large-scale games, we also propose an adversarial training algorithm for zero-sum stochastic games, and show the theoretical appeal of non-existence of local optima in its objective function. In numerical experiments, we demonstrate that our Nash Equilibrium and inverse reinforcement learning algorithms address games that are not amenable to existing benchmark algorithms. Moreover, our algorithm successfully recovers reward and policy functions regardless of the quality of the sub-optimal expert demonstration set.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18d.html
PDF: http://proceedings.mlr.press/v80/wang18d/wang18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xingyu
family: Wang
- given: Diego
family: Klabjan
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5143-5151
id: wang18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5143
lastpage: 5151
published: 2018-07-03 00:00:00 +0000
- title: 'Coded Sparse Matrix Multiplication'
abstract: 'In a large-scale and distributed matrix multiplication problem $C=A^{\intercal}B$, where $C\in\mathbb{R}^{r\times t}$, the coded computation plays an important role to effectively deal with “stragglers” (distributed computations that may get delayed due to few slow or faulty processors). However, existing coded schemes could destroy the significant sparsity that exists in large-scale machine learning problems, and could result in much higher computation overhead, i.e., $O(rt)$ decoding time. In this paper, we develop a new coded computation strategy, we call *sparse code*, which achieves near *optimal recovery threshold*, *low computation overhead*, and *linear decoding time* $O(nnz(C))$. We implement our scheme and demonstrate the advantage of the approach over both uncoded and current fastest coded strategies.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18e.html
PDF: http://proceedings.mlr.press/v80/wang18e/wang18e.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sinong
family: Wang
- given: Jiashang
family: Liu
- given: Ness
family: Shroff
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5152-5160
id: wang18e
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5152
lastpage: 5160
published: 2018-07-03 00:00:00 +0000
- title: 'A Fast and Scalable Joint Estimator for Integrating Additional Knowledge in Learning Multiple Related Sparse Gaussian Graphical Models'
abstract: 'We consider the problem of including additional knowledge in estimating sparse Gaussian graphical models (sGGMs) from aggregated samples, arising often in bioinformatics and neuroimaging applications. Previous joint sGGM estimators either fail to use existing knowledge or cannot scale-up to many tasks (large $K$) under a high-dimensional (large $p$) situation. In this paper, we propose a novel \underline{J}oint \underline{E}lementary \underline{E}stimator incorporating additional \underline{K}nowledge (JEEK) to infer multiple related sparse Gaussian Graphical models from large-scale heterogeneous data. Using domain knowledge as weights, we design a novel hybrid norm as the minimization objective to enforce the superposition of two weighted sparsity constraints, one on the shared interactions and the other on the task-specific structural patterns. This enables JEEK to elegantly consider various forms of existing knowledge based on the domain at hand and avoid the need to design knowledge-specific optimization. JEEK is solved through a fast and entry-wise parallelizable solution that largely improves the computational efficiency of the state-of-the-art $O(p^5K^4)$ to $O(p^2K^4)$. We conduct a rigorous statistical analysis showing that JEEK achieves the same convergence rate $O(\log(Kp)/n_{tot})$ as the state-of-the-art estimators that are much harder to compute. Empirically, on multiple synthetic datasets and one real-world data from neuroscience, JEEP outperforms the speed of the state-of-arts significantly while achieving the same level of prediction accuracy.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18f.html
PDF: http://proceedings.mlr.press/v80/wang18f/wang18f.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Beilun
family: Wang
- given: Arshdeep
family: Sekhon
- given: Yanjun
family: Qi
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5161-5170
id: wang18f
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5161
lastpage: 5170
published: 2018-07-03 00:00:00 +0000
- title: 'Provable Variable Selection for Streaming Features'
abstract: 'In large-scale machine learning applications and high-dimensional statistics, it is ubiquitous to address a considerable number of features among which many are redundant. As a remedy, online feature selection has attracted increasing attention in recent years. It sequentially reveals features and evaluates the importance of them. Though online feature selection has proven an elegant methodology, it is usually challenging to carry out a rigorous theoretical characterization. In this work, we propose a provable online feature selection algorithm that utilizes the online leverage score. The selected features are then fed to $k$-means clustering, making the clustering step memory and computationally efficient. We prove that with high probability, performing $k$-means clustering based on the selected feature space does not deviate far from the optimal clustering using the original data. The empirical results on real-world data sets demonstrate the effectiveness of our algorithm.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18g.html
PDF: http://proceedings.mlr.press/v80/wang18g/wang18g.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jing
family: Wang
- given: Jie
family: Shen
- given: Ping
family: Li
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5171-5179
id: wang18g
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5171
lastpage: 5179
published: 2018-07-03 00:00:00 +0000
- title: 'Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis'
abstract: 'In this work, we propose “global style tokens” (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to-end speech synthesis system. The embeddings are trained with no explicit labels, yet learn to model a large range of acoustic expressiveness. GSTs lead to a rich set of significant results. The soft interpretable “labels” they generate can be used to control synthesis in novel ways, such as varying speed and speaking style – independently of the text content. They can also be used for style transfer, replicating the speaking style of a single audio clip across an entire long-form text corpus. When trained on noisy, unlabeled found data, GSTs learn to factorize noise and speaker identity, providing a path towards highly scalable but robust speech synthesis.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18h.html
PDF: http://proceedings.mlr.press/v80/wang18h/wang18h.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuxuan
family: Wang
- given: Daisy
family: Stanton
- given: Yu
family: Zhang
- given: RJ-Skerry
family: Ryan
- given: Eric
family: Battenberg
- given: Joel
family: Shor
- given: Ying
family: Xiao
- given: Ye
family: Jia
- given: Fei
family: Ren
- given: Rif A.
family: Saurous
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5180-5189
id: wang18h
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5180
lastpage: 5189
published: 2018-07-03 00:00:00 +0000
- title: 'Adversarial Distillation of Bayesian Neural Network Posteriors'
abstract: 'Bayesian neural networks (BNNs) allow us to reason about uncertainty in a principled way. Stochastic Gradient Langevin Dynamics (SGLD) enables efficient BNN learning by drawing samples from the BNN posterior using mini-batches. However, SGLD and its extensions require storage of many copies of the model parameters, a potentially prohibitive cost, especially for large neural networks. We propose a framework, Adversarial Posterior Distillation, to distill the SGLD samples using a Generative Adversarial Network (GAN). At test-time, samples are generated by the GAN. We show that this distillation framework incurs no loss in performance on recent BNN applications including anomaly detection, active learning, and defense against adversarial attacks. By construction, our framework distills not only the Bayesian predictive distribution, but the posterior itself. This allows one to compute quantities such as the approximate model variance, which is useful in downstream tasks. To our knowledge, these are the first results applying MCMC-based BNNs to the aforementioned applications.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18i.html
PDF: http://proceedings.mlr.press/v80/wang18i/wang18i.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18i.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kuan-Chieh
family: Wang
- given: Paul
family: Vicol
- given: James
family: Lucas
- given: Li
family: Gu
- given: Roger
family: Grosse
- given: Richard
family: Zemel
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5190-5199
id: wang18i
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5190
lastpage: 5199
published: 2018-07-03 00:00:00 +0000
- title: 'Minimax Concave Penalized Multi-Armed Bandit Model with High-Dimensional Covariates'
abstract: 'In this paper, we propose a Minimax Concave Penalized Multi-Armed Bandit (MCP-Bandit) algorithm for a decision-maker facing high-dimensional data with latent sparse structure in an online learning and decision-making process. We demonstrate that the MCP-Bandit algorithm asymptotically achieves the optimal cumulative regret in sample size T, O(log T), and further attains a tighter bound in both covariates dimension d and the number of significant covariates s, O(s^2 (s + log d). In addition, we develop a linear approximation method, the 2-step Weighted Lasso procedure, to identify the MCP estimator for the MCP-Bandit algorithm under non-i.i.d. samples. Using this procedure, the MCP estimator matches the oracle estimator with high probability. Finally, we present two experiments to benchmark our proposed the MCP-Bandit algorithm to other bandit algorithms. Both experiments demonstrate that the MCP-Bandit algorithm performs favorably over other benchmark algorithms, especially when there is a high level of data sparsity or when the sample size is not too small.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18j.html
PDF: http://proceedings.mlr.press/v80/wang18j/wang18j.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18j.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xue
family: Wang
- given: Mingcheng
family: Wei
- given: Tao
family: Yao
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5200-5208
id: wang18j
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5200
lastpage: 5208
published: 2018-07-03 00:00:00 +0000
- title: 'Online Convolutional Sparse Coding with Sample-Dependent Dictionary'
abstract: 'Convolutional sparse coding (CSC) has been popularly used for the learning of shift-invariant dictionaries in image and signal processing. However, existing methods have limited scalability. In this paper, instead of convolving with a dictionary shared by all samples, we propose the use of a sample-dependent dictionary in which each filter is a linear combination of a small set of base filters learned from data. This added flexibility allows a large number of sample-dependent patterns to be captured, which is especially useful in the handling of large or high-dimensional data sets. Computationally, the resultant model can be efficiently learned by online learning. Extensive experimental results on a number of data sets show that the proposed method outperforms existing CSC algorithms with significantly reduced time and space complexities.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18k.html
PDF: http://proceedings.mlr.press/v80/wang18k/wang18k.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18k.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yaqing
family: Wang
- given: Quanming
family: Yao
- given: James Tin-Yau
family: Kwok
- given: Lionel M.
family: NI
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5209-5218
id: wang18k
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5209
lastpage: 5218
published: 2018-07-03 00:00:00 +0000
- title: 'Stein Variational Message Passing for Continuous Graphical Models'
abstract: 'We propose a novel distributed inference algorithm for continuous graphical models, by extending Stein variational gradient descent (SVGD) to leverage the Markov dependency structure of the distribution of interest. Our approach combines SVGD with a set of structured local kernel functions defined on the Markov blanket of each node, which alleviates the curse of high dimensionality and simultaneously yields a distributed algorithm for decentralized inference tasks. We justify our method with theoretical analysis and show that the use of local kernels can be viewed as a new type of localized approximation that matches the target distribution on the conditional distributions of each node over its Markov blanket. Our empirical results show that our method outperforms a variety of baselines including standard MCMC and particle message passing methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18l.html
PDF: http://proceedings.mlr.press/v80/wang18l/wang18l.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18l.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dilin
family: Wang
- given: Zhe
family: Zeng
- given: Qiang
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5219-5227
id: wang18l
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5219
lastpage: 5227
published: 2018-07-03 00:00:00 +0000
- title: 'Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions'
abstract: 'We study the parameter tuning problem for the penalized regression model. Finding the optimal choice of the regularization parameter is a challenging problem in high-dimensional regimes where both the number of observations n and the number of parameters p are large. We propose two frameworks to obtain a computationally efficient approximation ALO of the leave-one-out cross validation (LOOCV) risk for nonsmooth losses and regularizers. Our two frameworks are based on the primal and dual formulations of the penalized regression model. We prove the equivalence of the two approaches under smoothness conditions. This equivalence enables us to justify the accuracy of both methods under such conditions. We use our approaches to obtain a risk estimate for several standard problems, including generalized LASSO, nuclear norm regularization and support vector machines. We experimentally demonstrate the effectiveness of our results for non-differentiable cases.'
volume: 80
URL: https://proceedings.mlr.press/v80/wang18m.html
PDF: http://proceedings.mlr.press/v80/wang18m/wang18m.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wang18m.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shuaiwen
family: Wang
- given: Wenda
family: Zhou
- given: Haihao
family: Lu
- given: Arian
family: Maleki
- given: Vahab
family: Mirrokni
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5228-5237
id: wang18m
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5228
lastpage: 5237
published: 2018-07-03 00:00:00 +0000
- title: 'Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks'
abstract: 'We provide theoretical investigation of curriculum learning in the context of stochastic gradient descent when optimizing the convex linear regression loss. We prove that the rate of convergence of an ideal curriculum learning method is monotonically increasing with the difficulty of the examples. Moreover, among all equally difficult points, convergence is faster when using points which incur higher loss with respect to the current hypothesis. We then analyze curriculum learning in the context of training a CNN. We describe a method which infers the curriculum by way of transfer learning from another network, pre-trained on a different task. While this approach can only approximate the ideal curriculum, we observe empirically similar behavior to the one predicted by the theory, namely, a significant boost in convergence speed at the beginning of training. When the task is made more difficult, improvement in generalization performance is also observed. Finally, curriculum learning exhibits robustness against unfavorable conditions such as excessive regularization.'
volume: 80
URL: https://proceedings.mlr.press/v80/weinshall18a.html
PDF: http://proceedings.mlr.press/v80/weinshall18a/weinshall18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-weinshall18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daphna
family: Weinshall
- given: Gad
family: Cohen
- given: Dan
family: Amir
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5238-5246
id: weinshall18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5238
lastpage: 5246
published: 2018-07-03 00:00:00 +0000
- title: 'Extracting Automata from Recurrent Neural Networks Using Queries and Counterexamples'
abstract: 'We present a novel algorithm that uses exact learning and abstraction to extract a deterministic finite automaton describing the state dynamics of a given trained RNN. We do this using Angluin’s \lstar algorithm as a learner and the trained RNN as an oracle. Our technique efficiently extracts accurate automata from trained RNNs, even when the state vectors are large and require fine differentiation.'
volume: 80
URL: https://proceedings.mlr.press/v80/weiss18a.html
PDF: http://proceedings.mlr.press/v80/weiss18a/weiss18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-weiss18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gail
family: Weiss
- given: Yoav
family: Goldberg
- given: Eran
family: Yahav
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5247-5256
id: weiss18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5247
lastpage: 5256
published: 2018-07-03 00:00:00 +0000
- title: 'LeapsAndBounds: A Method for Approximately Optimal Algorithm Configuration'
abstract: 'We consider the problem of configuring general-purpose solvers to run efficiently on problem instances drawn from an unknown distribution. The goal of the configurator is to find a configuration that runs fast on average on most instances, and do so with the least amount of total work. It can run a chosen solver on a random instance until the solver finishes or a timeout is reached. We propose LeapsAndBounds, an algorithm that tests configurations on randomly selected problem instances for longer and longer time. We prove that the capped expected runtime of the configuration returned by LeapsAndBounds is close to the optimal expected runtime, while our algorithm’s running time is near-optimal. Our results show that LeapsAndBounds is more efficient than the recent algorithm of Kleinberg et al. (2017), which, to our knowledge, is the only other algorithm configuration method with non-trivial theoretical guarantees. Experimental results on configuring a public SAT solver on a new benchmark dataset also stand witness to the superiority of our method.'
volume: 80
URL: https://proceedings.mlr.press/v80/weisz18a.html
PDF: http://proceedings.mlr.press/v80/weisz18a/weisz18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-weisz18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gellert
family: Weisz
- given: Andras
family: Gyorgy
- given: Csaba
family: Szepesvari
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5257-5265
id: weisz18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5257
lastpage: 5265
published: 2018-07-03 00:00:00 +0000
- title: 'Deep Predictive Coding Network for Object Recognition'
abstract: 'Based on the predictive coding theory in neuro- science, we designed a bi-directional and recur- rent neural net, namely deep predictive coding networks (PCN), that has feedforward, feedback, and recurrent connections. Feedback connections from a higher layer carry the prediction of its lower-layer representation; feedforward connec- tions carry the prediction errors to its higher-layer. Given image input, PCN runs recursive cycles of bottom-up and top-down computation to update its internal representations and reduce the differ- ence between bottom-up input and top-down pre- diction at every layer. After multiple cycles of recursive updating, the representation is used for image classification. With benchmark datasets (CIFAR-10/100, SVHN, and MNIST), PCN was found to always outperform its feedforward-only counterpart: a model without any mechanism for recurrent dynamics, and its performance tended to improve given more cycles of computation over time. In short, PCN reuses a single architecture to recursively run bottom-up and top-down pro- cesses to refine its representation towards more accurate and definitive object recognition.'
volume: 80
URL: https://proceedings.mlr.press/v80/wen18a.html
PDF: http://proceedings.mlr.press/v80/wen18a/wen18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wen18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Haiguang
family: Wen
- given: Kuan
family: Han
- given: Junxing
family: Shi
- given: Yizhen
family: Zhang
- given: Eugenio
family: Culurciello
- given: Zhongming
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5266-5275
id: wen18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5266
lastpage: 5275
published: 2018-07-03 00:00:00 +0000
- title: 'Towards Fast Computation of Certified Robustness for ReLU Networks'
abstract: 'Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible. Current available methods of computing such a bound are either time-consuming or deliver low quality bounds that are too loose to be useful. In this paper, we exploit the special structure of ReLU networks and provide two computationally efficient algorithms (Fast-Lin, Fast-Lip) that are able to certify non-trivial lower bounds of minimum adversarial distortions. Experiments show that (1) our methods deliver bounds close to (the gap is 2-3X) exact minimum distortions found by Reluplex in small networks while our algorithms are more than 10,000 times faster; (2) our methods deliver similar quality of bounds (the gap is within 35% and usually around 10%; sometimes our bounds are even better) for larger networks compared to the methods based on solving linear programming problems but our algorithms are 33-14,000 times faster; (3) our method is capable of solving large MNIST and CIFAR networks up to 7 layers with more than 10,000 neurons within tens of seconds on a single CPU core. In addition, we show that there is no polynomial time algorithm that can approximately find the minimum $\ell_1$ adversarial distortion of a ReLU network with a $0.99\ln n$ approximation ratio unless NP=P, where $n$ is the number of neurons in the network.'
volume: 80
URL: https://proceedings.mlr.press/v80/weng18a.html
PDF: http://proceedings.mlr.press/v80/weng18a/weng18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-weng18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lily
family: Weng
- given: Huan
family: Zhang
- given: Hongge
family: Chen
- given: Zhao
family: Song
- given: Cho-Jui
family: Hsieh
- given: Luca
family: Daniel
- given: Duane
family: Boning
- given: Inderjit
family: Dhillon
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5276-5285
id: weng18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5276
lastpage: 5285
published: 2018-07-03 00:00:00 +0000
- title: 'Provable Defenses against Adversarial Examples via the Convex Outer Adversarial Polytope'
abstract: 'We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data. For previously unseen examples, the approach is guaranteed to detect all adversarial examples, though it may flag some non-adversarial examples as well. The basic idea is to consider a convex outer approximation of the set of activations reachable through a norm-bounded perturbation, and we develop a robust optimization procedure that minimizes the worst case loss over this outer region (via a linear program). Crucially, we show that the dual problem to this linear program can be represented itself as a deep network similar to the backpropagation network, leading to very efficient optimization approaches that produce guaranteed bounds on the robust loss. The end result is that by executing a few more forward and backward passes through a slightly modified version of the original network (though possibly with much larger batch sizes), we can learn a classifier that is provably robust to any norm-bounded adversarial attack. We illustrate the approach on a number of tasks to train classifiers with robust adversarial guarantees (e.g. for MNIST, we produce a convolutional classifier that provably has less than 5.8% test error for any adversarial attack with bounded $\ell_\infty$ norm less than $\epsilon = 0.1$).'
volume: 80
URL: https://proceedings.mlr.press/v80/wong18a.html
PDF: http://proceedings.mlr.press/v80/wong18a/wong18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wong18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eric
family: Wong
- given: Zico
family: Kolter
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5286-5295
id: wong18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5286
lastpage: 5295
published: 2018-07-03 00:00:00 +0000
- title: 'Local Density Estimation in High Dimensions'
abstract: 'An important question that arises in the study of high dimensional vector representations learned from data is: given a set D of vectors and a query q, estimate the number of points within a specified distance threshold of q. Our algorithm uses locality sensitive hashing to preprocess the data to accurately and efficiently estimate the answers to such questions via an unbiased estimator that uses importance sampling. A key innovation is the ability to maintain a small number of hash tables via preprocessing data structures and algorithms that sample from multiple buckets in each hash table. We give bounds on the space requirements and query complexity of our scheme, and demonstrate the effectiveness of our algorithm by experiments on a standard word embedding dataset.'
volume: 80
URL: https://proceedings.mlr.press/v80/wu18a.html
PDF: http://proceedings.mlr.press/v80/wu18a/wu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xian
family: Wu
- given: Moses
family: Charikar
- given: Vishnu
family: Natchu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5296-5305
id: wu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5296
lastpage: 5305
published: 2018-07-03 00:00:00 +0000
- title: 'Adaptive Exploration-Exploitation Tradeoff for Opportunistic Bandits'
abstract: 'In this paper, we propose and study opportunistic bandits - a new variant of bandits where the regret of pulling a suboptimal arm varies under different environmental conditions, such as network load or produce price. When the load/price is low, so is the cost/regret of pulling a suboptimal arm (e.g., trying a suboptimal network configuration). Therefore, intuitively, we could explore more when the load/price is low and exploit more when the load/price is high. Inspired by this intuition, we propose an Adaptive Upper-Confidence-Bound (AdaUCB) algorithm to adaptively balance the exploration-exploitation tradeoff for opportunistic bandits. We prove that AdaUCB achieves O(log T) regret with a smaller coefficient than the traditional UCB algorithm. Furthermore, AdaUCB achieves O(1) regret with respect to T if the exploration cost is zero when the load level is below a certain threshold. Last, based on both synthetic data and real-world traces, experimental results show that AdaUCB significantly outperforms other bandit algorithms, such as UCB and TS (Thompson Sampling), under large load/price fluctuations.'
volume: 80
URL: https://proceedings.mlr.press/v80/wu18b.html
PDF: http://proceedings.mlr.press/v80/wu18b/wu18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wu18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Huasen
family: Wu
- given: Xueying
family: Guo
- given: Xin
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5306-5314
id: wu18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5306
lastpage: 5314
published: 2018-07-03 00:00:00 +0000
- title: 'SQL-Rank: A Listwise Approach to Collaborative Ranking'
abstract: 'In this paper, we propose a listwise approach for constructing user-specific rankings in recommendation systems in a collaborative fashion. We contrast the listwise approach to previous pointwise and pairwise approaches, which are based on treating either each rating or each pairwise comparison as an independent instance respectively. By extending the work of ListNet (Cao et al., 2007), we cast listwise collaborative ranking as maximum likelihood under a permutation model which applies probability mass to permutations based on a low rank latent score matrix. We present a novel algorithm called SQL-Rank, which can accommodate ties and missing data and can run in linear time. We develop a theoretical framework for analyzing listwise ranking methods based on a novel representation theory for the permutation model. Applying this framework to collaborative ranking, we derive asymptotic statistical rates as the number of users and items grow together. We conclude by demonstrating that our SQL-Rank method often outperforms current state-of-the-art algorithms for implicit feedback such as Weighted-MF and BPR and achieve favorable results when compared to explicit feedback algorithms such as matrix factorization and collaborative ranking.'
volume: 80
URL: https://proceedings.mlr.press/v80/wu18c.html
PDF: http://proceedings.mlr.press/v80/wu18c/wu18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wu18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Liwei
family: Wu
- given: Cho-Jui
family: Hsieh
- given: James
family: Sharpnack
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5315-5324
id: wu18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5315
lastpage: 5324
published: 2018-07-03 00:00:00 +0000
- title: 'Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization'
abstract: 'Large-scale distributed optimization is of great importance in various applications. For data-parallel based distributed learning, the inter-node gradient communication often becomes the performance bottleneck. In this paper, we propose the error compensated quantized stochastic gradient descent algorithm to improve the training efficiency. Local gradients are quantized to reduce the communication overhead, and accumulated quantization error is utilized to speed up the convergence. Furthermore, we present theoretical analysis on the convergence behaviour, and demonstrate its advantage over competitors. Extensive experiments indicate that our algorithm can compress gradients by a factor of up to two magnitudes without performance degradation.'
volume: 80
URL: https://proceedings.mlr.press/v80/wu18d.html
PDF: http://proceedings.mlr.press/v80/wu18d/wu18d.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wu18d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiaxiang
family: Wu
- given: Weidong
family: Huang
- given: Junzhou
family: Huang
- given: Tong
family: Zhang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5325-5333
id: wu18d
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5325
lastpage: 5333
published: 2018-07-03 00:00:00 +0000
- title: 'Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training'
abstract: 'In this paper we study leveraging *confidence information* induced by adversarial training to reinforce adversarial robustness of a given adversarially trained model. A natural measure of confidence is $\|F(x)\|_\infty$ (i.e. how confident $F$ is about its prediction?). We start by analyzing an adversarial training formulation proposed by Madry et al.. We demonstrate that, under a variety of instantiations, an only somewhat good solution to their objective induces confidence to be a discriminator, which can distinguish between right and wrong model predictions in a neighborhood of a point sampled from the underlying distribution. Based on this, we propose Highly Confident Near Neighbor (HCNN) a framework that combines confidence information and nearest neighbor search, to reinforce adversarial robustness of a base model. We give algorithms in this framework and perform a detailed empirical study. We report encouraging experimental results that support our analysis, and also discuss problems we observed with existing adversarial training.'
volume: 80
URL: https://proceedings.mlr.press/v80/wu18e.html
PDF: http://proceedings.mlr.press/v80/wu18e/wu18e.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wu18e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xi
family: Wu
- given: Uyeong
family: Jang
- given: Jiefeng
family: Chen
- given: Lingjiao
family: Chen
- given: Somesh
family: Jha
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5334-5342
id: wu18e
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5334
lastpage: 5342
published: 2018-07-03 00:00:00 +0000
- title: 'Discrete-Continuous Mixtures in Probabilistic Programming: Generalized Semantics and Inference Algorithms'
abstract: 'Despite the recent successes of probabilistic programming languages (PPLs) in AI applications, PPLs offer only limited support for random variables whose distributions combine discrete and continuous elements. We develop the notion of measure-theoretic Bayesian networks (MTBNs) and use it to provide more general semantics for PPLs with arbitrarily many random variables defined over arbitrary measure spaces. We develop two new general sampling algorithms that are provably correct under the MTBN framework: the lexicographic likelihood weighting (LLW) for general MTBNs and the lexicographic particle filter (LPF), a specialized algorithm for state-space models. We further integrate MTBNs into a widely used PPL system, BLOG, and verify the effectiveness of the new inference algorithms through representative examples.'
volume: 80
URL: https://proceedings.mlr.press/v80/wu18f.html
PDF: http://proceedings.mlr.press/v80/wu18f/wu18f.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wu18f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yi
family: Wu
- given: Siddharth
family: Srivastava
- given: Nicholas
family: Hay
- given: Simon
family: Du
- given: Stuart
family: Russell
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5343-5352
id: wu18f
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5343
lastpage: 5352
published: 2018-07-03 00:00:00 +0000
- title: 'Variance Regularized Counterfactual Risk Minimization via Variational Divergence Minimization'
abstract: 'Off-policy learning, the task of evaluating and improving policies using historic data collected from a logging policy, is important because on-policy evaluation is usually expensive and has adverse impacts. One of the major challenge of off-policy learning is to derive counterfactual estimators that also has low variance and thus low generalization error. In this work, inspired by learning bounds for importance sampling problems, we present a new counterfactual learning principle for off-policy learning with bandit feedbacks. Our method regularizes the generalization error by minimizing the distribution divergence between the logging policy and the new policy, and removes the need for iterating through all training samples to compute sample variance regularization in prior work. With neural network policies, our end-to-end training algorithms using variational divergence minimization showed significant improvement over conventional baseline algorithms and is also consistent with our theoretical results.'
volume: 80
URL: https://proceedings.mlr.press/v80/wu18g.html
PDF: http://proceedings.mlr.press/v80/wu18g/wu18g.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wu18g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hang
family: Wu
- given: May
family: Wang
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5353-5362
id: wu18g
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5353
lastpage: 5362
published: 2018-07-03 00:00:00 +0000
- title: 'Deep k-Means: Re-Training and Parameter Sharing with Harder Cluster Assignments for Compressing Deep Convolutions'
abstract: 'The current trend of pushing CNNs deeper with convolutions has created a pressing demand to achieve higher compression gains on CNNs where convolutions dominate the computation and parameter amount (e.g., GoogLeNet, ResNet and Wide ResNet). Further, the high energy consumption of convolutions limits its deployment on mobile devices. To this end, we proposed a simple yet effective scheme for compressing convolutions though applying k-means clustering on the weights, compression is achieved through weight-sharing, by only recording $K$ cluster centers and weight assignment indexes. We then introduced a novel spectrally relaxed $k$-means regularization, which tends to make hard assignments of convolutional layer weights to $K$ learned cluster centers during re-training. We additionally propose an improved set of metrics to estimate energy consumption of CNN hardware implementations, whose estimation results are verified to be consistent with previously proposed energy estimation tool extrapolated from actual hardware measurements. We finally evaluated Deep $k$-Means across several CNN models in terms of both compression ratio and energy consumption reduction, observing promising results without incurring accuracy loss. The code is available at https://github.com/Sandbox3aster/Deep-K-Means'
volume: 80
URL: https://proceedings.mlr.press/v80/wu18h.html
PDF: http://proceedings.mlr.press/v80/wu18h/wu18h.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-wu18h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Junru
family: Wu
- given: Yue
family: Wang
- given: Zhenyu
family: Wu
- given: Zhangyang
family: Wang
- given: Ashok
family: Veeraraghavan
- given: Yingyan
family: Lin
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5363-5372
id: wu18h
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5363
lastpage: 5372
published: 2018-07-03 00:00:00 +0000
- title: 'Bayesian Quadrature for Multiple Related Integrals'
abstract: 'Bayesian probabilistic numerical methods are a set of tools providing posterior distributions on the output of numerical methods. The use of these methods is usually motivated by the fact that they can represent our uncertainty due to incomplete/finite information about the continuous mathematical problem being approximated. In this paper, we demonstrate that this paradigm can provide additional advantages, such as the possibility of transferring information between several numerical methods. This allows users to represent uncertainty in a more faithful manner and, as a by-product, provide increased numerical efficiency. We propose the first such numerical method by extending the well-known Bayesian quadrature algorithm to the case where we are interested in computing the integral of several related functions. We then prove convergence rates for the method in the well-specified and misspecified cases, and demonstrate its efficiency in the context of multi-fidelity models for complex engineering systems and a problem of global illumination in computer graphics.'
volume: 80
URL: https://proceedings.mlr.press/v80/xi18a.html
PDF: http://proceedings.mlr.press/v80/xi18a/xi18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-xi18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xiaoyue
family: Xi
- given: Francois-Xavier
family: Briol
- given: Mark
family: Girolami
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5373-5382
id: xi18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5373
lastpage: 5382
published: 2018-07-03 00:00:00 +0000
- title: 'Model-Level Dual Learning'
abstract: 'Many artificial intelligence tasks appear in dual forms like English$\leftrightarrow$French translation and speech$\leftrightarrow$text transformation. Existing dual learning schemes, which are proposed to solve a pair of such dual tasks, explore how to leverage such dualities from data level. In this work, we propose a new learning framework, model-level dual learning, which takes duality of tasks into consideration while designing the architectures for the primal/dual models, and ties the model parameters that playing similar roles in the two tasks. We study both symmetric and asymmetric model-level dual learning. Our algorithms achieve significant improvements on neural machine translation and sentiment analysis.'
volume: 80
URL: https://proceedings.mlr.press/v80/xia18a.html
PDF: http://proceedings.mlr.press/v80/xia18a/xia18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-xia18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yingce
family: Xia
- given: Xu
family: Tan
- given: Fei
family: Tian
- given: Tao
family: Qin
- given: Nenghai
family: Yu
- given: Tie-Yan
family: Liu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5383-5392
id: xia18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5383
lastpage: 5392
published: 2018-07-03 00:00:00 +0000
- title: 'Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks'
abstract: 'In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing hundreds or even thousands of layers. A variety of pathologies such as vanishing/exploding gradients make training such deep networks challenging. While residual connections and batch normalization do enable training at these depths, it has remained unclear whether such specialized architecture designs are truly necessary to train deep CNNs. In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme. We derive this initialization scheme theoretically by developing a mean field theory for signal propagation and by characterizing the conditions for dynamical isometry, the equilibration of singular values of the input-output Jacobian matrix. These conditions require that the convolution operator be an orthogonal transformation in the sense that it is norm-preserving. We present an algorithm for generating such random initial orthogonal convolution kernels and demonstrate empirically that they enable efficient training of extremely deep architectures.'
volume: 80
URL: https://proceedings.mlr.press/v80/xiao18a.html
PDF: http://proceedings.mlr.press/v80/xiao18a/xiao18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-xiao18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lechao
family: Xiao
- given: Yasaman
family: Bahri
- given: Jascha
family: Sohl-Dickstein
- given: Samuel
family: Schoenholz
- given: Jeffrey
family: Pennington
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5393-5402
id: xiao18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5393
lastpage: 5402
published: 2018-07-03 00:00:00 +0000
- title: 'Orthogonality-Promoting Distance Metric Learning: Convex Relaxation and Theoretical Analysis'
abstract: 'Distance metric learning (DML), which learns a distance metric from labeled "similar" and "dissimilar" data pairs, is widely utilized. Recently, several works investigate orthogonality-promoting regularization (OPR), which encourages the projection vectors in DML to be close to being orthogonal, to achieve three effects: (1) high balancedness – achieving comparable performance on both frequent and infrequent classes; (2) high compactness – using a small number of projection vectors to achieve a "good" metric; (3) good generalizability – alleviating overfitting to training data. While showing promising results, these approaches suffer three problems. First, they involve solving non-convex optimization problems where achieving the global optimal is NP-hard. Second, it lacks a theoretical understanding why OPR can lead to balancedness. Third, the current generalization error analysis of OPR is not directly on the regularizer. In this paper, we address these three issues by (1) seeking convex relaxations of the original nonconvex problems so that the global optimal is guaranteed to be achievable; (2) providing a formal analysis on OPR’s capability of promoting balancedness; (3) providing a theoretical analysis that directly reveals the relationship between OPR and generalization performance. Experiments on various datasets demonstrate that our convex methods are more effective in promoting balancedness, compactness, and generalization, and are computationally more efficient, compared with the nonconvex methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/xie18a.html
PDF: http://proceedings.mlr.press/v80/xie18a/xie18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-xie18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pengtao
family: Xie
- given: Wei
family: Wu
- given: Yichen
family: Zhu
- given: Eric
family: Xing
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5403-5412
id: xie18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5403
lastpage: 5412
published: 2018-07-03 00:00:00 +0000
- title: 'Nonoverlap-Promoting Variable Selection'
abstract: 'Variable selection is a classic problem in machine learning (ML), widely used to find important explanatory factors, and improve generalization performance and interpretability of ML models. In this paper, we consider variable selection for models where multiple responses are to be predicted based on the same set of covariates. Since each response is relevant to a unique subset of covariates, we desire the selected variables for different responses have small overlap. We propose a regularizer that simultaneously encourage orthogonality and sparsity, which jointly brings in an effect of reducing overlap. We apply this regularizer to four model instances and develop efficient algorithms to solve the regularized problems. We provide a formal analysis on why the proposed regularizer can reduce generalization error. Experiments on both simulation studies and real-world datasets demonstrate the effectiveness of the proposed regularizer in selecting less-overlapped variables and improving generalization performance.'
volume: 80
URL: https://proceedings.mlr.press/v80/xie18b.html
PDF: http://proceedings.mlr.press/v80/xie18b/xie18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-xie18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pengtao
family: Xie
- given: Hongbao
family: Zhang
- given: Yichen
family: Zhu
- given: Eric
family: Xing
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5413-5422
id: xie18b
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5413
lastpage: 5422
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Semantic Representations for Unsupervised Domain Adaptation'
abstract: 'It is important to transfer the knowledge from label-rich source domain to unlabeled target domain due to the expensive cost of manual labeling efforts. Prior domain adaptation methods address this problem through aligning the global distribution statistics between source domain and target domain, but a drawback of prior methods is that they ignore the semantic information contained in samples, e.g., features of backpacks in target domain might be mapped near features of cars in source domain. In this paper, we present moving semantic transfer network, which learn semantic representations for unlabeled target samples by aligning labeled source centroid and pseudo-labeled target centroid. Features in same class but different domains are expected to be mapped nearby, resulting in an improved target classification accuracy. Moving average centroid alignment is cautiously designed to compensate the insufficient categorical information within each mini batch. Experiments testify that our model yields state of the art results on standard datasets.'
volume: 80
URL: https://proceedings.mlr.press/v80/xie18c.html
PDF: http://proceedings.mlr.press/v80/xie18c/xie18c.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-xie18c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shaoan
family: Xie
- given: Zibin
family: Zheng
- given: Liang
family: Chen
- given: Chuan
family: Chen
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5423-5432
id: xie18c
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5423
lastpage: 5432
published: 2018-07-03 00:00:00 +0000
- title: 'Rates of Convergence of Spectral Methods for Graphon Estimation'
abstract: 'This paper studies the problem of estimating the graphon function – a generative mechanism for a class of random graphs that are useful approximations to real networks. Specifically, a graph of $n$ vertices is generated such that each pair of two vertices $i$ and $j$ are connected independently with probability $\rho_n \times f(x_i,x_j)$, where $x_i$ is the unknown $d$-dimensional label of vertex $i$, $f$ is an unknown symmetric function, and $\rho_n$, assumed to be $\Omega(\log n/n)$, is a scaling parameter characterizing the graph sparsity. The task is to estimate graphon $f$ given the graph. Recent studies have identified the minimax optimal estimation error rate for $d=1$. However, there exists a wide gap between the known error rates of polynomial-time estimators and the minimax optimal error rate. We improve on the previously known error rates of polynomial-time estimators, by analyzing a spectral method, namely universal singular value thresholding (USVT) algorithm. When $f$ belongs to either Hölder or Sobolev space with smoothness index $\alpha$, we show the error rates of USVT are at most $(n\rho)^{ -2 \alpha / (2\alpha+d)}$. These error rates approach the minimax optimal error rate $\log (n\rho)/(n\rho)$ proved in prior work for $d=1$, as $\alpha$ increases, i.e., $f$ becomes smoother. Furthermore, when $f$ is analytic with infinitely many times differentiability, we show the error rate of USVT is at most $\log^d (n\rho)/(n\rho)$. When $f$ is a step function which corresponds to the stochastic block model with $k$ blocks for some $k$, the error rate of USVT is at most $k/(n\rho)$, which is larger than the minimax optimal error rate by at most a multiplicative factor $k/\log k$. This coincides with the computational gap observed in community detection. A key ingredient of our analysis is to derive the eigenvalue decaying rate of the edge probability matrix using piecewise polynomial approximations of the graphon function $f$.'
volume: 80
URL: https://proceedings.mlr.press/v80/xu18a.html
PDF: http://proceedings.mlr.press/v80/xu18a/xu18a.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-xu18a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiaming
family: Xu
editor:
- given: Jennifer
family: Dy
- given: Andreas
family: Krause
page: 5433-5442
id: xu18a
issued:
date-parts:
- 2018
- 7
- 3
firstpage: 5433
lastpage: 5442
published: 2018-07-03 00:00:00 +0000
- title: 'Learning Registered Point Processes from Idiosyncratic Observations'
abstract: 'A parametric point process model is developed, with modeling based on the assumption that sequential observations often share latent phenomena, while also possessing idiosyncratic effects. An alternating optimization method is proposed to learn a “registered” point process that accounts for shared structure, as well as “warping” functions that characterize idiosyncratic aspects of each observed sequence. Under reasonable constraints, in each iteration we update the sample-specific warping functions by solving a set of constrained nonlinear programming problems in parallel, and update the model by maximum likelihood estimation. The justifiability, complexity and robustness of the proposed method are investigated in detail, and the influence of sequence stitching on the learning results is examined empirically. Experiments on both synthetic and real-world data demonstrate that the method yields explainable point process models, achieving encouraging results compared to state-of-the-art methods.'
volume: 80
URL: https://proceedings.mlr.press/v80/xu18b.html
PDF: http://proceedings.mlr.press/v80/xu18b/xu18b.pdf
edit: https://github.com/mlresearch//v80/edit/gh-pages/_posts/2018-07-03-xu18b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 35th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hongteng
family: Xu
- given: Lawrence
family: Carin
- given: Hongyuan
family: Zha
editor:
- given: Jennifer
family: Dy
- given: Andr