- title: 'Data Structures for Density Estimation'
abstract: 'We study statistical/computational tradeoffs for the following density estimation problem: given $k$ distributions $v_1, \ldots, v_k$ over a discrete domain of size $n$, and sampling access to a distribution $p$, identify $v_i$ that is "close" to $p$. Our main result is the first data structure that, given a sublinear (in $n$) number of samples from $p$, identifies $v_i$ in time sublinear in $k$. We also give an improved version of the algorithm of Acharya et al. (2018) that reports $v_i$ in time linear in $k$. The experimental evaluation of the latter algorithm shows that it achieves a significant reduction in the number of operations needed to achieve a given accuracy compared to prior work.'
volume: 202
URL: https://proceedings.mlr.press/v202/aamand23a.html
PDF: https://proceedings.mlr.press/v202/aamand23a/aamand23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aamand23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anders
family: Aamand
- given: Alexandr
family: Andoni
- given: Justin Y.
family: Chen
- given: Piotr
family: Indyk
- given: Shyam
family: Narayanan
- given: Sandeep
family: Silwal
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1-18
id: aamand23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1
lastpage: 18
published: 2023-07-03 00:00:00 +0000
- title: 'ClusterFuG: Clustering Fully connected Graphs by Multicut'
abstract: 'We propose a graph clustering formulation based on multicut (a.k.a. weighted correlation clustering) on the complete graph. Our formulation does not need specification of the graph topology as in the original sparse formulation of multicut, making our approach simpler and potentially better performing. In contrast to unweighted correlation clustering we allow for a more expressive weighted cost structure. In dense multicut, the clustering objective is given in a factorized form as inner products of node feature vectors. This allows for an efficient formulation and inference in contrast to multicut/weighted correlation clustering, which has at least quadratic representation and computation complexity when working on the complete graph. We show how to rewrite classical greedy algorithms for multicut in our dense setting and how to modify them for greater efficiency and solution quality. In particular, our algorithms scale to graphs with tens of thousands of nodes. Empirical evidence on instance segmentation on Cityscapes and clustering of ImageNet datasets shows the merits of our approach.'
volume: 202
URL: https://proceedings.mlr.press/v202/abbas23a.html
PDF: https://proceedings.mlr.press/v202/abbas23a/abbas23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-abbas23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ahmed
family: Abbas
- given: Paul
family: Swoboda
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 19-30
id: abbas23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 19
lastpage: 30
published: 2023-07-03 00:00:00 +0000
- title: 'Generalization on the Unseen, Logic Reasoning and Degree Curriculum'
abstract: 'This paper considers the learning of logical (Boolean) functions with focus on the generalization on the unseen (GOTU) setting, a strong case of out-of-distribution generalization. This is motivated by the fact that the rich combinatorial nature of data in certain reasoning tasks (e.g., arithmetic/logic) makes representative data sampling challenging, and learning successfully under GOTU gives a first vignette of an ’extrapolating’ or ’reasoning’ learner. We then study how different network architectures trained by (S)GD perform under GOTU and provide both theoretical and experimental evidence that for a class of network models including instances of Transformers, random features models, and diagonal linear networks, a min-degree-interpolator is learned on the unseen. We also provide evidence that other instances with larger learning rates or mean-field networks reach leaky min-degree solutions. These findings lead to two implications: (1) we provide an explanation to the length generalization problem (e.g., Anil et al. 2022); (2) we introduce a curriculum learning algorithm called Degree-Curriculum that learns monomials more efficiently by incrementing supports.'
volume: 202
URL: https://proceedings.mlr.press/v202/abbe23a.html
PDF: https://proceedings.mlr.press/v202/abbe23a/abbe23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-abbe23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Emmanuel
family: Abbe
- given: Samy
family: Bengio
- given: Aryo
family: Lotfi
- given: Kevin
family: Rizk
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 31-60
id: abbe23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 31
lastpage: 60
published: 2023-07-03 00:00:00 +0000
- title: 'Toward Large Kernel Models'
abstract: 'Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods. We provide a PyTorch based implementation which can take advantage of multiple GPUs.'
volume: 202
URL: https://proceedings.mlr.press/v202/abedsoltan23a.html
PDF: https://proceedings.mlr.press/v202/abedsoltan23a/abedsoltan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-abedsoltan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amirhesam
family: Abedsoltan
- given: Mikhail
family: Belkin
- given: Parthe
family: Pandit
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 61-78
id: abedsoltan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 61
lastpage: 78
published: 2023-07-03 00:00:00 +0000
- title: 'Expertise Trees Resolve Knowledge Limitations in Collective Decision-Making'
abstract: 'Experts advising decision-makers are likely to display expertise which varies as a function of the problem instance. In practice, this may lead to sub-optimal or discriminatory decisions against minority cases. In this work, we model such changes in depth and breadth of knowledge as a partitioning of the problem space into regions of differing expertise. We provide here new algorithms that explicitly consider and adapt to the relationship between problem instances and experts’ knowledge. We first propose and highlight the drawbacks of a naive approach based on nearest neighbor queries. To address these drawbacks we then introduce a novel algorithm — expertise trees — that constructs decision trees enabling the learner to select appropriate models. We provide theoretical insights and empirically validate the improved performance of our novel approach on a range of problems for which existing methods proved to be inadequate.'
volume: 202
URL: https://proceedings.mlr.press/v202/abels23a.html
PDF: https://proceedings.mlr.press/v202/abels23a/abels23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-abels23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Axel
family: Abels
- given: Tom
family: Lenaerts
- given: Vito
family: Trianni
- given: Ann
family: Nowe
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 79-90
id: abels23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 79
lastpage: 90
published: 2023-07-03 00:00:00 +0000
- title: 'Comparison of meta-learners for estimating multi-valued treatment heterogeneous effects'
abstract: 'Conditional Average Treatment Effects (CATE) estimation is one of the main challenges in causal inference with observational data. In addition to Machine Learning based-models, nonparametric estimators called meta-learners have been developed to estimate the CATE with the main advantage of not restraining the estimation to a specific supervised learning method. This task becomes, however, more complicated when the treatment is not binary as some limitations of the naive extensions emerge. This paper looks into meta-learners for estimating the heterogeneous effects of multi-valued treatments. We consider different meta-learners, and we carry out a theoretical analysis of their error upper bounds as functions of important parameters such as the number of treatment levels, showing that the naive extensions do not always provide satisfactory results. We introduce and discuss meta-learners that perform well as the number of treatments increases. We empirically confirm the strengths and weaknesses of those methods with synthetic and semi-synthetic datasets.'
volume: 202
URL: https://proceedings.mlr.press/v202/acharki23a.html
PDF: https://proceedings.mlr.press/v202/acharki23a/acharki23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-acharki23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Naoufal
family: Acharki
- given: Ramiro
family: Lugo
- given: Antoine
family: Bertoncello
- given: Josselin
family: Garnier
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 91-132
id: acharki23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 91
lastpage: 132
published: 2023-07-03 00:00:00 +0000
- title: 'BNN-DP: Robustness Certification of Bayesian Neural Networks via Dynamic Programming'
abstract: 'In this paper, we introduce BNN-DP, an efficient algorithmic framework for analysis of adversarial robustness of Bayesian Neural Networks (BNNs). Given a compact set of input points $T\subset \mathbb{R}^n$, BNN-DP computes lower and upper bounds on the BNN’s predictions for all the points in $T$. The framework is based on an interpretation of BNNs as stochastic dynamical systems, which enables the use of Dynamic Programming (DP) algorithms to bound the prediction range along the layers of the network. Specifically, the method uses bound propagation techniques and convex relaxations to derive a backward recursion procedure to over-approximate the prediction range of the BNN with piecewise affine functions. The algorithm is general and can handle both regression and classification tasks. On a set of experiments on various regression and classification tasks and BNN architectures, we show that BNN-DP outperforms state-of-the-art methods by up to four orders of magnitude in both tightness of the bounds and computational efficiency.'
volume: 202
URL: https://proceedings.mlr.press/v202/adams23a.html
PDF: https://proceedings.mlr.press/v202/adams23a/adams23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-adams23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Steven
family: Adams
- given: Andrea
family: Patane
- given: Morteza
family: Lahijanian
- given: Luca
family: Laurenti
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 133-151
id: adams23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 133
lastpage: 151
published: 2023-07-03 00:00:00 +0000
- title: 'SAM operates far from home: eigenvalue regularization as a dynamical phenomenon'
abstract: 'The Sharpness Aware Minimization (SAM) optimization algorithm has been shown to control large eigenvalues of the loss Hessian and provide generalization benefits in a variety of settings. The original motivation for SAM was a modified loss function which penalized sharp minima; subsequent analyses have also focused on the behavior near minima. However, our work reveals that SAM provides a strong regularization of the eigenvalues throughout the learning trajectory. We show that in a simplified setting, SAM dynamically induces a stabilization related to the edge of stability (EOS) phenomenon observed in large learning rate gradient descent. Our theory predicts the largest eigenvalue as a function of the learning rate and SAM radius parameters. Finally, we show that practical models can also exhibit this EOS stabilization, and that understanding SAM must account for these dynamics far away from any minima.'
volume: 202
URL: https://proceedings.mlr.press/v202/agarwala23a.html
PDF: https://proceedings.mlr.press/v202/agarwala23a/agarwala23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-agarwala23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Atish
family: Agarwala
- given: Yann
family: Dauphin
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 152-168
id: agarwala23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 152
lastpage: 168
published: 2023-07-03 00:00:00 +0000
- title: 'Second-order regression models exhibit progressive sharpening to the edge of stability'
abstract: 'Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability). These phenomena are intrinsically non-linear and do not happen for models in the constant Neural Tangent Kernel (NTK) regime, for which the predictive function is approximately linear in the parameters. As such, we consider the next simplest class of predictive models, namely those that are quadratic in the parameters, which we call second-order regression models. For quadratic objectives in two dimensions, we prove that this second-order regression model exhibits progressive sharpening of the NTK eigenvalue towards a value that differs slightly from the edge of stability, which we explicitly compute. In higher dimensions, the model generically shows similar behavior, even without the specific structure of a neural network, suggesting that progressive sharpening and edge-of-stability behavior aren’t unique features of neural networks, and could be a more general property of discrete learning algorithms in high-dimensional non-linear models.'
volume: 202
URL: https://proceedings.mlr.press/v202/agarwala23b.html
PDF: https://proceedings.mlr.press/v202/agarwala23b/agarwala23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-agarwala23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Atish
family: Agarwala
- given: Fabian
family: Pedregosa
- given: Jeffrey
family: Pennington
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 169-195
id: agarwala23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 169
lastpage: 195
published: 2023-07-03 00:00:00 +0000
- title: 'Global optimality of Elman-type RNNs in the mean-field regime'
abstract: 'We analyze Elman-type recurrent neural networks (RNNs) and their training in the mean-field regime. Specifically, we show convergence of gradient descent training dynamics of the RNN to the corresponding mean-field formulation in the large width limit. We also show that the fixed points of the limiting infinite-width dynamics are globally optimal, under some assumptions on the initialization of the weights. Our results establish optimality for feature-learning with wide RNNs in the mean-field regime.'
volume: 202
URL: https://proceedings.mlr.press/v202/agazzi23a.html
PDF: https://proceedings.mlr.press/v202/agazzi23a/agazzi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-agazzi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrea
family: Agazzi
- given: Jianfeng
family: Lu
- given: Sayan
family: Mukherjee
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 196-227
id: agazzi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 196
lastpage: 227
published: 2023-07-03 00:00:00 +0000
- title: 'SemSup-XC: Semantic Supervision for Zero and Few-shot Extreme Classification'
abstract: 'Extreme classification (XC) involves predicting over large numbers of classes (thousands to millions), with real-world applications like news article classification and e-commerce product tagging. The zero-shot version of this task requires generalization to novel classes without additional supervision. In this paper, we develop SemSup-XC, a model that achieves state-of-the-art zero-shot and few-shot performance on three XC datasets derived from legal, e-commerce, and Wikipedia data. To develop SemSup-XC, we use automatically collected semantic class descriptions to represent classes and facilitate generalization through a novel hybrid matching module that matches input instances to class descriptions using a combination of semantic and lexical similarity. Trained with contrastive learning, SemSup-XC significantly outperforms baselines and establishes state-of-the-art performance on all three datasets considered, gaining up to 12 precision points on zero-shot and more than 10 precision points on one-shot tests, with similar gains for recall@10. Our ablation studies highlight the relative importance of our hybrid matching module and automatically collected class descriptions.'
volume: 202
URL: https://proceedings.mlr.press/v202/aggarwal23a.html
PDF: https://proceedings.mlr.press/v202/aggarwal23a/aggarwal23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aggarwal23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pranjal
family: Aggarwal
- given: Ameet
family: Deshpande
- given: Karthik R
family: Narasimhan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 228-247
id: aggarwal23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 228
lastpage: 247
published: 2023-07-03 00:00:00 +0000
- title: 'Adaptive IMLE for Few-shot Pretraining-free Generative Modelling'
abstract: 'Despite their success on large datasets, GANs have been difficult to apply in the few-shot setting, where only a limited number of training examples are provided. Due to mode collapse, GANs tend to ignore some training examples, causing overfitting to a subset of the training dataset, which is small in the first place. A recent method called Implicit Maximum Likelihood Estimation (IMLE) is an alternative to GAN that tries to address this issue. It uses the same kind of generators as GANs but trains it with a different objective that encourages mode coverage. However, the theoretical guarantees of IMLE hold under a restrictive condition that the optimal likelihood at all data points is the same. In this paper, we present a more generalized formulation of IMLE which includes the original formulation as a special case, and we prove that the theoretical guarantees hold under weaker conditions. Using this generalized formulation, we further derive a new algorithm, which we dub Adaptive IMLE, which can adapt to the varying difficulty of different training examples. We demonstrate on multiple few-shot image synthesis datasets that our method significantly outperforms existing methods. Our code is available at https://github.com/mehranagh20/AdaIMLE.'
volume: 202
URL: https://proceedings.mlr.press/v202/aghabozorgi23a.html
PDF: https://proceedings.mlr.press/v202/aghabozorgi23a/aghabozorgi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aghabozorgi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mehran
family: Aghabozorgi
- given: Shichong
family: Peng
- given: Ke
family: Li
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 248-264
id: aghabozorgi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 248
lastpage: 264
published: 2023-07-03 00:00:00 +0000
- title: 'Scaling Laws for Generative Mixed-Modal Language Models'
abstract: 'Generative language models define distributions over sequences of tokens that can represent essentially any combination of data modalities (e.g., any permutation of image tokens from VQ-VAEs, speech tokens from HuBERT, BPE tokens for language or code, and so on). To better understand the scaling properties of such mixed-modal models, we conducted over 250 experiments using seven different modalities and model sizes ranging from 8 million to 30 billion, trained on 5-100 billion tokens. We report new mixed-modal scaling laws that unify the contributions of individual modalities and the interactions between them. Specifically, we explicitly model the optimal synergy and competition due to data and model size as an additive term to previous uni-modal scaling laws. We also find four empirical phenomena observed during the training, such as emergent coordinate-ascent style training that naturally alternates between modalities, guidelines for selecting critical hyper-parameters, and connections between mixed-modal competition and training stability. Finally, we test our scaling law by training a 30B speech-text model, which significantly outperforms the corresponding unimodal models. Overall, our research provides valuable insights into the design and training of mixed-modal generative models, an important new class of unified models that have unique distributional properties.'
volume: 202
URL: https://proceedings.mlr.press/v202/aghajanyan23a.html
PDF: https://proceedings.mlr.press/v202/aghajanyan23a/aghajanyan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aghajanyan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Armen
family: Aghajanyan
- given: Lili
family: Yu
- given: Alexis
family: Conneau
- given: Wei-Ning
family: Hsu
- given: Karen
family: Hambardzumyan
- given: Susan
family: Zhang
- given: Stephen
family: Roller
- given: Naman
family: Goyal
- given: Omer
family: Levy
- given: Luke
family: Zettlemoyer
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 265-279
id: aghajanyan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 265
lastpage: 279
published: 2023-07-03 00:00:00 +0000
- title: 'Hypothesis Transfer Learning with Surrogate Classification Losses: Generalization Bounds through Algorithmic Stability'
abstract: 'Hypothesis transfer learning (HTL) contrasts domain adaptation by allowing for a previous task leverage, named the source, into a new one, the target, without requiring access to the source data. Indeed, HTL relies only on a hypothesis learnt from such source data, relieving the hurdle of expansive data storage and providing great practical benefits. Hence, HTL is highly beneficial for real-world applications relying on big data. The analysis of such a method from a theoretical perspective faces multiple challenges, particularly in classification tasks. This paper deals with this problem by studying the learning theory of HTL through algorithmic stability, an attractive theoretical framework for machine learning algorithms analysis. In particular, we are interested in the statistical behavior of the regularized empirical risk minimizers in the case of binary classification. Our stability analysis provides learning guarantees under mild assumptions. Consequently, we derive several complexity-free generalization bounds for essential statistical quantities like the training error, the excess risk and cross-validation estimates. These refined bounds allow understanding the benefits of transfer learning and comparing the behavior of standard losses in different scenarios, leading to valuable insights for practitioners.'
volume: 202
URL: https://proceedings.mlr.press/v202/aghbalou23a.html
PDF: https://proceedings.mlr.press/v202/aghbalou23a/aghbalou23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aghbalou23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anass
family: Aghbalou
- given: Guillaume
family: Staerman
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 280-303
id: aghbalou23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 280
lastpage: 303
published: 2023-07-03 00:00:00 +0000
- title: 'Constrained Causal Bayesian Optimization'
abstract: 'We propose constrained causal Bayesian optimization (cCBO), an approach for finding interventions in a known causal graph that optimize a target variable under some constraints. cCBO first reduces the search space by exploiting the graph structure and, if available, an observational dataset; and then solves the restricted optimization problem by modelling target and constraint quantities using Gaussian processes and by sequentially selecting interventions via a constrained expected improvement acquisition function. We propose different surrogate models that enable to integrate observational and interventional data while capturing correlation among effects with increasing levels of sophistication. We evaluate cCBO on artificial and real-world causal graphs showing successful trade off between fast convergence and percentage of feasible interventions.'
volume: 202
URL: https://proceedings.mlr.press/v202/aglietti23a.html
PDF: https://proceedings.mlr.press/v202/aglietti23a/aglietti23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aglietti23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Virginia
family: Aglietti
- given: Alan
family: Malek
- given: Ira
family: Ktena
- given: Silvia
family: Chiappa
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 304-321
id: aglietti23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 304
lastpage: 321
published: 2023-07-03 00:00:00 +0000
- title: 'Explaining the effects of non-convergent MCMC in the training of Energy-Based Models'
abstract: 'In this paper, we quantify the impact of using non-convergent Markov chains to train Energy-Based models (EBMs). In particular, we show analytically that EBMs trained with non-persistent short runs to estimate the gradient can perfectly reproduce a set of empirical statistics of the data, not at the level of the equilibrium measure, but through a precise dynamical process. Our results provide a first-principles explanation for the observations of recent works proposing the strategy of using short runs starting from random initial conditions as an efficient way to generate high-quality samples in EBMs, and lay the groundwork for using EBMs as diffusion models. After explaining this effect in generic EBMs, we analyze two solvable models in which the effect of the non-convergent sampling in the trained parameters can be described in detail. Finally, we test these predictions numerically on a ConvNet EBM and a Boltzmann machine.'
volume: 202
URL: https://proceedings.mlr.press/v202/agoritsas23a.html
PDF: https://proceedings.mlr.press/v202/agoritsas23a/agoritsas23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-agoritsas23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elisabeth
family: Agoritsas
- given: Giovanni
family: Catania
- given: Aurélien
family: Decelle
- given: Beatriz
family: Seoane
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 322-336
id: agoritsas23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 322
lastpage: 336
published: 2023-07-03 00:00:00 +0000
- title: 'Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies'
abstract: 'We introduce a new type of test, called a Turing Experiment (TE), for evaluating to what extent a given language model, such as GPT models, can simulate different aspects of human behavior. A TE can also reveal consistent distortions in a language model’s simulation of a specific human behavior. Unlike the Turing Test, which involves simulating a single arbitrary individual, a TE requires simulating a representative sample of participants in human subject research. We carry out TEs that attempt to replicate well-established findings from prior studies. We design a methodology for simulating TEs and illustrate its use to compare how well different language models are able to reproduce classic economic, psycholinguistic, and social psychology experiments: Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of Crowds. In the first three TEs, the existing findings were replicated using recent models, while the last TE reveals a “hyper-accuracy distortion” present in some language models (including ChatGPT and GPT-4), which could affect downstream applications in education and the arts.'
volume: 202
URL: https://proceedings.mlr.press/v202/aher23a.html
PDF: https://proceedings.mlr.press/v202/aher23a/aher23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aher23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gati V
family: Aher
- given: Rosa I.
family: Arriaga
- given: Adam Tauman
family: Kalai
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 337-371
id: aher23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 337
lastpage: 371
published: 2023-07-03 00:00:00 +0000
- title: 'Interventional Causal Representation Learning'
abstract: 'Causal representation learning seeks to extract high-level latent factors from low-level sensory data. Most existing methods rely on observational data and structural assumptions (e.g., conditional independence) to identify the latent factors. However, interventional data is prevalent across applications. Can interventional data facilitate causal representation learning? We explore this question in this paper. The key observation is that interventional data often carries geometric signatures of the latent factors’ support (i.e. what values each latent can possibly take). For example, when the latent factors are causally connected, interventions can break the dependency between the intervened latents’ support and their ancestors’. Leveraging this fact, we prove that the latent causal factors can be identified up to permutation and scaling given data from perfect do interventions. Moreover, we can achieve block affine identification, namely the estimated latent factors are only entangled with a few other latents if we have access to data from imperfect interventions. These results highlight the unique power of interventional data in causal representation learning; they can enable provable identification of latent factors without any assumptions about their distributions or dependency structure.'
volume: 202
URL: https://proceedings.mlr.press/v202/ahuja23a.html
PDF: https://proceedings.mlr.press/v202/ahuja23a/ahuja23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ahuja23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kartik
family: Ahuja
- given: Divyat
family: Mahajan
- given: Yixin
family: Wang
- given: Yoshua
family: Bengio
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 372-407
id: ahuja23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 372
lastpage: 407
published: 2023-07-03 00:00:00 +0000
- title: 'Sequential Underspecified Instrument Selection for Cause-Effect Estimation'
abstract: 'Instrumental variable (IV) methods are used to estimate causal effects in settings with unobserved confounding, where we cannot directly experiment on the treatment variable. Instruments are variables which only affect the outcome indirectly via the treatment variable(s). Most IV applications focus on low-dimensional treatments and crucially require at least as many instruments as treatments. This assumption is restrictive: in the natural sciences we often seek to infer causal effects of high-dimensional treatments (e.g., the effect of gene expressions or microbiota on health and disease), but can only run few experiments with a limited number of instruments (e.g., drugs or antibiotics). In such under-specified problems, the full treatment effect is not identifiable in a single experiment even in the linear case. We show that one can still reliably recover the projection of the treatment effect onto the instrumented subspace and develop techniques to consistently combine such partial estimates from different sets of instruments. We then leverage our combined estimators in an algorithm that iteratively proposes the most informative instruments at each round of experimentation to maximize the overall information about the full causal effect.'
volume: 202
URL: https://proceedings.mlr.press/v202/ailer23a.html
PDF: https://proceedings.mlr.press/v202/ailer23a/ailer23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ailer23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elisabeth
family: Ailer
- given: Jason
family: Hartford
- given: Niki
family: Kilbertus
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 408-420
id: ailer23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 408
lastpage: 420
published: 2023-07-03 00:00:00 +0000
- title: 'Atari-5: Distilling the Arcade Learning Environment down to Five Games'
abstract: 'The Arcade Learning Environment (ALE) has become an essential benchmark for assessing the performance of reinforcement learning algorithms. However, the computational cost of generating results on the entire 57-game dataset limits ALE’s use and makes the reproducibility of many results infeasible. We propose a novel solution to this problem in the form of a principled methodology for selecting small but representative subsets of environments within a benchmark suite. We applied our method to identify a subset of five ALE games, we call *Atari-5*, which produces 57-game median score estimates within 10% of their true values. Extending the subset to 10-games recovers 80% of the variance for log-scores for *all* games within the 57-game set. We show this level of compression is possible due to a high degree of correlation between many of the games in ALE.'
volume: 202
URL: https://proceedings.mlr.press/v202/aitchison23a.html
PDF: https://proceedings.mlr.press/v202/aitchison23a/aitchison23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aitchison23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Aitchison
- given: Penny
family: Sweetser
- given: Marcus
family: Hutter
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 421-438
id: aitchison23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 421
lastpage: 438
published: 2023-07-03 00:00:00 +0000
- title: 'Towards credible visual model interpretation with path attribution'
abstract: 'With its inspirational roots in game-theory, path attribution framework stands out among the post-hoc model interpretation techniques due to its axiomatic nature. However, recent developments show that despite being axiomatic, path attribution methods can compute counter-intuitive feature attributions. Not only that, for deep visual models, the methods may also not conform to the original game-theoretic intuitions that are the basis of their axiomatic nature. To address these issues, we perform a systematic investigation of the path attribution framework. We first pinpoint the conditions in which the counter-intuitive attributions of deep visual models can be avoided under this framework. Then, we identify a mechanism of integrating the attributions over the paths such that they computationally conform to the original insights of game-theory. These insights are eventually combined into a method, which provides intuitive and reliable feature attributions. We also establish the findings empirically by evaluating the method on multiple datasets, models and evaluation metrics. Extensive experiments show a consistent quantitative and qualitative gain in the results over the baselines.'
volume: 202
URL: https://proceedings.mlr.press/v202/akhtar23a.html
PDF: https://proceedings.mlr.press/v202/akhtar23a/akhtar23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-akhtar23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Naveed
family: Akhtar
- given: Mohammad A. A. K.
family: Jalwana
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 439-457
id: akhtar23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 439
lastpage: 457
published: 2023-07-03 00:00:00 +0000
- title: 'Convergence of First-Order Methods for Constrained Nonconvex Optimization with Dependent Data'
abstract: 'We focus on analyzing the classical stochastic projected gradient methods under a general dependent data sampling scheme for constrained smooth nonconvex optimization. We show the worst-case rate of convergence $\tilde{O}(t^{-1/4})$ and complexity $\tilde{O}(\varepsilon^{-4})$ for achieving an $\varepsilon$-near stationary point in terms of the norm of the gradient of Moreau envelope and gradient mapping. While classical convergence guarantee requires i.i.d. data sampling from the target distribution, we only require a mild mixing condition of the conditional distribution, which holds for a wide class of Markov chain sampling algorithms. This improves the existing complexity for the constrained smooth nonconvex optimization with dependent data from $\tilde{O}(\varepsilon^{-8})$ to $\tilde{O}(\varepsilon^{-4})$ with a significantly simpler analysis. We illustrate the generality of our approach by deriving convergence results with dependent data for stochastic proximal gradient methods, adaptive stochastic gradient algorithm AdaGrad and stochastic gradient algorithm with heavy ball momentum. As an application, we obtain first online nonnegative matrix factorization algorithms for dependent data based on stochastic projected gradient methods with adaptive step sizes and optimal rate of convergence.'
volume: 202
URL: https://proceedings.mlr.press/v202/alacaoglu23a.html
PDF: https://proceedings.mlr.press/v202/alacaoglu23a/alacaoglu23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-alacaoglu23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ahmet
family: Alacaoglu
- given: Hanbaek
family: Lyu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 458-489
id: alacaoglu23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 458
lastpage: 489
published: 2023-07-03 00:00:00 +0000
- title: 'Recasting Self-Attention with Holographic Reduced Representations'
abstract: 'In recent years, self-attention has become the dominant paradigm for sequence modeling in a variety of domains. However, in domains with very long sequence lengths the $\mathcal{O}(T^2)$ memory and $\mathcal{O}(T^2 H)$ compute costs can make using transformers infeasible. Motivated by problems in malware detection, where sequence lengths of $T \geq 100,000$ are a roadblock to deep learning, we re-cast self-attention using the neuro-symbolic approach of Holographic Reduced Representations (HRR). In doing so we perform the same high-level strategy of the standard self-attention: a set of queries matching against a set of keys, and returning a weighted response of the values for each key. Implemented as a “Hrrformer” we obtain several benefits including $\mathcal{O}(T H \log H)$ time complexity, $\mathcal{O}(T H)$ space complexity, and convergence in $10\times$ fewer epochs. Nevertheless, the Hrrformer achieves near state-of-the-art accuracy on LRA benchmarks and we are able to learn with just a single layer. Combined, these benefits make our Hrrformer the first viable Transformer for such long malware classification sequences and up to $280\times$ faster to train on the Long Range Arena benchmark.'
volume: 202
URL: https://proceedings.mlr.press/v202/alam23a.html
PDF: https://proceedings.mlr.press/v202/alam23a/alam23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-alam23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mohammad Mahmudul
family: Alam
- given: Edward
family: Raff
- given: Stella
family: Biderman
- given: Tim
family: Oates
- given: James
family: Holt
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 490-507
id: alam23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 490
lastpage: 507
published: 2023-07-03 00:00:00 +0000
- title: 'The Saddle-Point Method in Differential Privacy'
abstract: 'We characterize the differential privacy guarantees of privacy mechanisms in the large-composition regime, i.e., when a privacy mechanism is sequentially applied a large number of times to sensitive data. Via exponentially tilting the privacy loss random variable, we derive a new formula for the privacy curve expressing it as a contour integral over an integration path that runs parallel to the imaginary axis with a free real-axis intercept. Then, using the method of steepest descent from mathematical physics, we demonstrate that the choice of saddle-point as the real-axis intercept yields closed-form accurate approximations of the desired contour integral. This procedure—dubbed the saddle-point accountant (SPA)—yields a constant-time accurate approximation of the privacy curve. Theoretically, our results can be viewed as a refinement of both Gaussian Differential Privacy and the moments accountant method found in Rényi Differential Privacy. In practice, we demonstrate through numerical experiments that the SPA provides a precise approximation of privacy guarantees competitive with purely numerical-based methods (such as FFT-based accountants), while enjoying closed-form mathematical expressions.'
volume: 202
URL: https://proceedings.mlr.press/v202/alghamdi23a.html
PDF: https://proceedings.mlr.press/v202/alghamdi23a/alghamdi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-alghamdi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wael
family: Alghamdi
- given: Juan Felipe
family: Gomez
- given: Shahab
family: Asoodeh
- given: Flavio
family: Calmon
- given: Oliver
family: Kosut
- given: Lalitha
family: Sankar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 508-528
id: alghamdi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 508
lastpage: 528
published: 2023-07-03 00:00:00 +0000
- title: 'Nonlinear Advantage: Trained Networks Might Not Be As Complex as You Think'
abstract: 'We perform an empirical study of the behaviour of deep networks when fully linearizing some of its feature channels through a sparsity prior on the overall number of nonlinear units in the network. In experiments on image classification and machine translation tasks, we investigate how much we can simplify the network function towards linearity before performance collapses. First, we observe a significant performance gap when reducing nonlinearity in the network function early on as opposed to late in training, in-line with recent observations on the time-evolution of the data-dependent NTK. Second, we find that after training, we are able to linearize a significant number of nonlinear units while maintaining a high performance, indicating that much of a network’s expressivity remains unused but helps gradient descent in early stages of training. To characterize the depth of the resulting partially linearized network, we introduce a measure called average path length, representing the average number of active nonlinearities encountered along a path in the network graph. Under sparsity pressure, we find that the remaining nonlinear units organize into distinct structures, forming core-networks of near constant effective depth and width, which in turn depend on task difficulty.'
volume: 202
URL: https://proceedings.mlr.press/v202/ali-mehmeti-gopel23a.html
PDF: https://proceedings.mlr.press/v202/ali-mehmeti-gopel23a/ali-mehmeti-gopel23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ali-mehmeti-gopel23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christian H.X.
family: Ali Mehmeti-Göpel
- given: Jan
family: Disselhoff
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 529-546
id: ali-mehmeti-gopel23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 529
lastpage: 546
published: 2023-07-03 00:00:00 +0000
- title: 'A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models'
abstract: 'Contrastively trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot classifiers need prompt engineering to achieve high accuracy. Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. In particular, we ask *“Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?"*. We demonstrate that this is possible. In doing so, we identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data, and we propose a novel prompt scoring method that corrects for the biases. Using our proposed scoring method to create a weighted average prompt ensemble, our method overall outperforms equal average ensemble, as well as hand-crafted prompts, on ImageNet, 4 of its variants, and 11 fine-grained classification benchmarks. while being fully automatic, optimization-free, and not requiring access to labeled validation data.'
volume: 202
URL: https://proceedings.mlr.press/v202/allingham23a.html
PDF: https://proceedings.mlr.press/v202/allingham23a/allingham23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-allingham23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: James Urquhart
family: Allingham
- given: Jie
family: Ren
- given: Michael W
family: Dusenberry
- given: Xiuye
family: Gu
- given: Yin
family: Cui
- given: Dustin
family: Tran
- given: Jeremiah Zhe
family: Liu
- given: Balaji
family: Lakshminarayanan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 547-568
id: allingham23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 547
lastpage: 568
published: 2023-07-03 00:00:00 +0000
- title: 'On the Privacy-Robustness-Utility Trilemma in Distributed Learning'
abstract: 'The ubiquity of distributed machine learning (ML) in sensitive public domain applications calls for algorithms that protect data privacy, while being robust to faults and adversarial behaviors. Although privacy and robustness have been extensively studied independently in distributed ML, their synthesis remains poorly understood. We present the first tight analysis of the error incurred by any algorithm ensuring robustness against a fraction of adversarial machines, as well as differential privacy (DP) for honest machines’ data against any other curious entity. Our analysis exhibits a fundamental trade-off between privacy, robustness, and utility. To prove our lower bound, we consider the case of mean estimation, subject to distributed DP and robustness constraints, and devise reductions to centralized estimation of one-way marginals. We prove our matching upper bound by presenting a new distributed ML algorithm using a high-dimensional robust aggregation rule. The latter amortizes the dependence on the dimension in the error (caused by adversarial workers and DP), while being agnostic to the statistical properties of the data.'
volume: 202
URL: https://proceedings.mlr.press/v202/allouah23a.html
PDF: https://proceedings.mlr.press/v202/allouah23a/allouah23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-allouah23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Youssef
family: Allouah
- given: Rachid
family: Guerraoui
- given: Nirupam
family: Gupta
- given: Rafael
family: Pinot
- given: John
family: Stephan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 569-626
id: allouah23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 569
lastpage: 626
published: 2023-07-03 00:00:00 +0000
- title: 'Differentially Private Distributed Bayesian Linear Regression with MCMC'
abstract: 'We propose a novel Bayesian inference framework for distributed differentially private linear regression. We consider a distributed setting where multiple parties hold parts of the data and share certain summary statistics of their portions in privacy-preserving noise. We develop a novel generative statistical model for privately shared statistics, which exploits a useful distributional relation between the summary statistics of linear regression. We propose Bayesian estimation of the regression coefficients, mainly using Markov chain Monte Carlo algorithms, while we also provide a fast version that performs approximate Bayesian estimation in one iteration. The proposed methods have computational advantages over their competitors. We provide numerical results on both real and simulated data, which demonstrate that the proposed algorithms provide well-rounded estimation and prediction.'
volume: 202
URL: https://proceedings.mlr.press/v202/alparslan23a.html
PDF: https://proceedings.mlr.press/v202/alparslan23a/alparslan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-alparslan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Baris
family: Alparslan
- given: Sinan
family: Yıldırım
- given: Ilker
family: Birbil
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 627-641
id: alparslan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 627
lastpage: 641
published: 2023-07-03 00:00:00 +0000
- title: 'Robust and Scalable Bayesian Online Changepoint Detection'
abstract: 'This paper proposes an online, provably robust, and scalable Bayesian approach for changepoint detection. The resulting algorithm has key advantages over previous work: it provides provable robustness by leveraging the generalised Bayesian perspective, and also addresses the scalability issues of previous attempts. Specifically, the proposed generalised Bayesian formalism leads to conjugate posteriors whose parameters are available in closed form by leveraging diffusion score matching. The resulting algorithm is exact, can be updated through simple algebra, and is more than 10 times faster than its closest competitor.'
volume: 202
URL: https://proceedings.mlr.press/v202/altamirano23a.html
PDF: https://proceedings.mlr.press/v202/altamirano23a/altamirano23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-altamirano23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matias
family: Altamirano
- given: Francois-Xavier
family: Briol
- given: Jeremias
family: Knoblauch
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 642-663
id: altamirano23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 642
lastpage: 663
published: 2023-07-03 00:00:00 +0000
- title: 'Neural Wasserstein Gradient Flows for Discrepancies with Riesz Kernels'
abstract: 'Wasserstein gradient flows of maximum mean discrepancy (MMD) functionals with non-smooth Riesz kernels show a rich structure as singular measures can become absolutely continuous ones and conversely. In this paper we contribute to the understanding of such flows. We propose to approximate the backward scheme of Jordan, Kinderlehrer and Otto for computing such Wasserstein gradient flows as well as a forward scheme for so-called Wasserstein steepest descent flows by neural networks (NNs). Since we cannot restrict ourselves to absolutely continuous measures, we have to deal with transport plans and velocity plans instead of usual transport maps and velocity fields. Indeed, we approximate the disintegration of both plans by generative NNs which are learned with respect to appropriate loss functions. In order to evaluate the quality of both neural schemes, we benchmark them on the interaction energy. Here we provide analytic formulas for Wasserstein schemes starting at a Dirac measure and show their convergence as the time step size tends to zero. Finally, we illustrate our neural MMD flows by numerical examples.'
volume: 202
URL: https://proceedings.mlr.press/v202/altekruger23a.html
PDF: https://proceedings.mlr.press/v202/altekruger23a/altekruger23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-altekruger23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Fabian
family: Altekrüger
- given: Johannes
family: Hertrich
- given: Gabriele
family: Steidl
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 664-690
id: altekruger23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 664
lastpage: 690
published: 2023-07-03 00:00:00 +0000
- title: 'Distributed Contextual Linear Bandits with Minimax Optimal Communication Cost'
abstract: 'We study distributed contextual linear bandits with stochastic contexts, where $N$ agents/learners act cooperatively to solve a linear bandit-optimization problem with $d$-dimensional features over the course of $T$ rounds. For this problem, we derive the first ever information-theoretic lower bound $\Omega(dN)$ on the communication cost of any algorithm that performs optimally in a regret minimization setup. We then propose a distributed batch elimination version of the LinUCB algorithm, DisBE-LUCB, where the agents share information among each other through a central server. We prove that the communication cost of DisBE-LUCB, matches our lower bound up to logarithmic factors. In particular, for scenarios with known context distribution, the communication cost of DisBE-LUCB is only $\tilde{\mathcal{O}}(dN)$ and its regret is $\tilde{\mathcal{O}}(\sqrt{dNT})$, which is of the same order as that incurred by an optimal single-agent algorithm for $NT$ rounds. We also provide similar bounds for practical settings where the context distribution can only be estimated. Therefore, our proposed algorithm is nearly minimax optimal in terms of *both regret and communication cost*. Finally, we propose DecBE-LUCB, a fully decentralized version of DisBE-LUCB, which operates without a central server, where agents share information with their *immediate neighbors* through a carefully designed consensus procedure.'
volume: 202
URL: https://proceedings.mlr.press/v202/amani23a.html
PDF: https://proceedings.mlr.press/v202/amani23a/amani23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-amani23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sanae
family: Amani
- given: Tor
family: Lattimore
- given: András
family: György
- given: Lin
family: Yang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 691-717
id: amani23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 691
lastpage: 717
published: 2023-07-03 00:00:00 +0000
- title: 'A Kernelized Stein Discrepancy for Biological Sequences'
abstract: 'Generative models of biological sequences are a powerful tool for learning from complex sequence data, predicting the effects of mutations, and designing novel biomolecules with desired properties. To evaluate generative models it is important to accurately measure differences between high-dimensional distributions. In this paper we propose the “KSD-B”, a novel divergence measure for distributions over biological sequences that is based on the kernelized Stein discrepancy (KSD). The KSD-B can be evaluated even when the normalizing constant of the model is unknown; it allows for variable length sequences and can take into account biological notions of sequence distance. Unlike previous KSDs over discrete spaces the KSD-B (a) is theoretically guaranteed to detect convergence and non-convergence of distributions over sequence space and (b) can be efficiently estimated in practice. We demonstrate the advantages of the KSD-B on problems with synthetic and real data, and apply it to measure the fit of state-of-the-art machine learning models. Overall, the KSD-B enables rigorous evaluation of generative biological sequence models, allowing the accuracy of models, sampling procedures, and library designs to be checked reliably.'
volume: 202
URL: https://proceedings.mlr.press/v202/amin23a.html
PDF: https://proceedings.mlr.press/v202/amin23a/amin23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-amin23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alan Nawzad
family: Amin
- given: Eli N
family: Weinstein
- given: Debora Susan
family: Marks
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 718-767
id: amin23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 718
lastpage: 767
published: 2023-07-03 00:00:00 +0000
- title: 'The Optimal Approximation Factors in Misspecified Off-Policy Value Function Estimation'
abstract: 'Theoretical guarantees in reinforcement learning (RL) are known to suffer multiplicative blow-up factors with respect to the misspecification error of function approximation. Yet, the nature of such *approximation factors*—especially their optimal form in a given learning problem—is poorly understood. In this paper we study this question in linear off-policy value function estimation, where many open questions remain. We study the approximation factor in a broad spectrum of settings, such as presence vs. absence of state aliasing and full vs. partial coverage of the state space. Our core results include instance-dependent upper bounds on the approximation factors with respect to both the weighted $L_2$-norm (where the weighting is the offline state distribution) and the $L_\infty$ norm. We show that these approximation factors are optimal (in an instance-dependent sense) for a number of these settings. In other cases, we show that the instance-dependent parameters which appear in the upper bounds are necessary, and that the finiteness of either alone cannot guarantee a finite approximation factor even in the limit of infinite data.'
volume: 202
URL: https://proceedings.mlr.press/v202/amortila23a.html
PDF: https://proceedings.mlr.press/v202/amortila23a/amortila23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-amortila23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Philip
family: Amortila
- given: Nan
family: Jiang
- given: Csaba
family: Szepesvari
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 768-790
id: amortila23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 768
lastpage: 790
published: 2023-07-03 00:00:00 +0000
- title: 'Meta Optimal Transport'
abstract: 'We study the use of amortized optimization to predict optimal transport (OT) maps from the input measures, which we call Meta OT. This helps repeatedly solve similar OT problems between different measures by leveraging the knowledge and information present from past problems to rapidly predict and solve new problems. Otherwise, standard methods ignore the knowledge of the past solutions and suboptimally re-solve each problem from scratch. We instantiate Meta OT models in discrete and continuous settings between grayscale images, spherical data, classification labels, and color palettes and use them to improve the computational time of standard OT solvers. Our source code is available at http://github.com/facebookresearch/meta-ot'
volume: 202
URL: https://proceedings.mlr.press/v202/amos23a.html
PDF: https://proceedings.mlr.press/v202/amos23a/amos23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-amos23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brandon
family: Amos
- given: Giulia
family: Luise
- given: Samuel
family: Cohen
- given: Ievgen
family: Redko
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 791-813
id: amos23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 791
lastpage: 813
published: 2023-07-03 00:00:00 +0000
- title: 'Near-Optimal $Φ$-Regret Learning in Extensive-Form Games'
abstract: 'In this paper, we establish efficient and uncoupled learning dynamics so that, when employed by all players in multiplayer perfect-recall imperfect-information extensive-form games, the trigger regret of each player grows as $O(\log T)$ after $T$ repetitions of play. This improves exponentially over the prior best known trigger-regret bound of $O(T^{1/4})$, and settles a recent open question by Bai et al. (2022). As an immediate consequence, we guarantee convergence to the set of extensive-form correlated equilibria and coarse correlated equilibria at a near-optimal rate of $\frac{\log T}{T}$. Building on prior work, at the heart of our construction lies a more general result regarding fixed points deriving from rational functions with polynomial degree, a property that we establish for the fixed points of (coarse) trigger deviation functions. Moreover, our construction leverages a refined regret circuit for the convex hull, which—unlike prior guarantees—preserves the RVU property introduced by Syrgkanis et al. (NIPS, 2015); this observation has an independent interest in establishing near-optimal regret under learning dynamics based on a CFR-type decomposition of the regret.'
volume: 202
URL: https://proceedings.mlr.press/v202/anagnostides23a.html
PDF: https://proceedings.mlr.press/v202/anagnostides23a/anagnostides23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-anagnostides23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ioannis
family: Anagnostides
- given: Gabriele
family: Farina
- given: Tuomas
family: Sandholm
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 814-839
id: anagnostides23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 814
lastpage: 839
published: 2023-07-03 00:00:00 +0000
- title: 'A Modern Look at the Relationship between Sharpness and Generalization'
abstract: 'Sharpness of minima is a promising quantity that can correlate with generalization in deep networks and, when optimized during training, can improve generalization. However, standard sharpness is not invariant under reparametrizations of neural networks, and, to fix this, reparametrization-invariant sharpness definitions have been proposed, most prominently adaptive sharpness (Kwon et al., 2021). But does it really capture generalization in modern practical settings? We comprehensively explore this question in a detailed study of various definitions of adaptive sharpness in settings ranging from training from scratch on ImageNet and CIFAR-10 to fine-tuning CLIP on ImageNet and BERT on MNLI. We focus mostly on transformers for which little is known in terms of sharpness despite their widespread usage. Overall, we observe that sharpness does not correlate well with generalization but rather with some training parameters like the learning rate that can be positively or negatively correlated with generalization depending on the setup. Interestingly, in multiple cases, we observe a consistent negative correlation of sharpness with OOD generalization implying that sharper minima can generalize better. Finally, we illustrate on a simple model that the right sharpness measure is highly data-dependent, and that we do not understand well this aspect for realistic data distributions.'
volume: 202
URL: https://proceedings.mlr.press/v202/andriushchenko23a.html
PDF: https://proceedings.mlr.press/v202/andriushchenko23a/andriushchenko23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-andriushchenko23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maksym
family: Andriushchenko
- given: Francesco
family: Croce
- given: Maximilian
family: Müller
- given: Matthias
family: Hein
- given: Nicolas
family: Flammarion
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 840-902
id: andriushchenko23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 840
lastpage: 902
published: 2023-07-03 00:00:00 +0000
- title: 'SGD with Large Step Sizes Learns Sparse Features'
abstract: 'We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD) in the training of neural networks. We present empirical observations that commonly used large step sizes (i) may lead the iterates to jump from one side of a valley to the other causing *loss stabilization*, and (ii) this stabilization induces a hidden stochastic dynamics that *biases it implicitly* toward simple predictors. Furthermore, we show empirically that the longer large step sizes keep SGD high in the loss landscape valleys, the better the implicit regularization can operate and find sparse representations. Notably, no explicit regularization is used: the regularization effect comes solely from the SGD dynamics influenced by the large step sizes schedule. Therefore, these observations unveil how, through the step size schedules, both gradient and noise drive together the SGD dynamics through the loss landscape of neural networks. We justify these findings theoretically through the study of simple neural network models as well as qualitative arguments inspired from stochastic processes. This analysis allows us to shed new light on some common practices and observed phenomena when training deep networks.'
volume: 202
URL: https://proceedings.mlr.press/v202/andriushchenko23b.html
PDF: https://proceedings.mlr.press/v202/andriushchenko23b/andriushchenko23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-andriushchenko23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maksym
family: Andriushchenko
- given: Aditya Vardhan
family: Varre
- given: Loucas
family: Pillaud-Vivien
- given: Nicolas
family: Flammarion
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 903-925
id: andriushchenko23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 903
lastpage: 925
published: 2023-07-03 00:00:00 +0000
- title: 'Neural Continuous-Discrete State Space Models for Irregularly-Sampled Time Series'
abstract: 'Learning accurate predictive models of real-world dynamic phenomena (e.g., climate, biological) remains a challenging task. One key issue is that the data generated by both natural and artificial processes often comprise time series that are irregularly sampled and/or contain missing observations. In this work, we propose the Neural Continuous-Discrete State Space Model (NCDSSM) for continuous-time modeling of time series through discrete-time observations. NCDSSM employs auxiliary variables to disentangle recognition from dynamics, thus requiring amortized inference only for the auxiliary variables. Leveraging techniques from continuous-discrete filtering theory, we demonstrate how to perform accurate Bayesian inference for the dynamic states. We propose three flexible parameterizations of the latent dynamics and an efficient training objective that marginalizes the dynamic states during inference. Empirical results on multiple benchmark datasets across various domains show improved imputation and forecasting performance of NCDSSM over existing models.'
volume: 202
URL: https://proceedings.mlr.press/v202/ansari23a.html
PDF: https://proceedings.mlr.press/v202/ansari23a/ansari23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ansari23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Abdul Fatir
family: Ansari
- given: Alvin
family: Heng
- given: Andre
family: Lim
- given: Harold
family: Soh
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 926-951
id: ansari23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 926
lastpage: 951
published: 2023-07-03 00:00:00 +0000
- title: 'Paging with Succinct Predictions'
abstract: 'Paging is a prototypical problem in the area of online algorithms. It has also played a central role in the development of learning-augmented algorithms. Previous work on learning-augmented paging has investigated predictions on (i) when the current page will be requested again (reoccurrence predictions), (ii) the current state of the cache in an optimal algorithm (state predictions), (iii) all requests until the current page gets requested again, and (iv) the relative order in which pages are requested. We study learning-augmented paging from the new perspective of requiring the least possible amount of predicted information. More specifically, the predictions obtained alongside each page request are limited to one bit only. We develop algorithms satisfy all three desirable properties of learning-augmented algorithms – that is, they are consistent, robust and smooth – despite being limited to a one-bit prediction per request. We also present lower bounds establishing that our algorithms are essentially best possible.'
volume: 202
URL: https://proceedings.mlr.press/v202/antoniadis23a.html
PDF: https://proceedings.mlr.press/v202/antoniadis23a/antoniadis23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-antoniadis23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Antonios
family: Antoniadis
- given: Joan
family: Boyar
- given: Marek
family: Elias
- given: Lene Monrad
family: Favrholdt
- given: Ruben
family: Hoeksma
- given: Kim S.
family: Larsen
- given: Adam
family: Polak
- given: Bertrand
family: Simon
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 952-968
id: antoniadis23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 952
lastpage: 968
published: 2023-07-03 00:00:00 +0000
- title: 'Mixing Predictions for Online Metric Algorithms'
abstract: 'A major technique in learning-augmented online algorithms is combining multiple algorithms or predictors. Since the performance of each predictor may vary over time, it is desirable to use not the single best predictor as a benchmark, but rather a dynamic combination which follows different predictors at different times. We design algorithms that combine predictions and are competitive against such dynamic combinations for a wide class of online problems, namely, metrical task systems. Against the best (in hindsight) unconstrained combination of $\ell$ predictors, we obtain a competitive ratio of $O(\ell^2)$, and show that this is best possible. However, for a benchmark with slightly constrained number of switches between different predictors, we can get a $(1+\epsilon)$-competitive algorithm. Moreover, our algorithms can be adapted to access predictors in a bandit-like fashion, querying only one predictor at a time. An unexpected implication of one of our lower bounds is a new structural insight about covering formulations for the $k$-server problem.'
volume: 202
URL: https://proceedings.mlr.press/v202/antoniadis23b.html
PDF: https://proceedings.mlr.press/v202/antoniadis23b/antoniadis23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-antoniadis23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Antonios
family: Antoniadis
- given: Christian
family: Coester
- given: Marek
family: Elias
- given: Adam
family: Polak
- given: Bertrand
family: Simon
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 969-983
id: antoniadis23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 969
lastpage: 983
published: 2023-07-03 00:00:00 +0000
- title: 'Exponential Smoothing for Off-Policy Learning'
abstract: 'Off-policy learning (OPL) aims at finding improved policies from logged bandit data, often by minimizing the inverse propensity scoring (IPS) estimator of the risk. In this work, we investigate a smooth regularization for IPS, for which we derive a two-sided PAC-Bayes generalization bound. The bound is tractable, scalable, interpretable and provides learning certificates. In particular, it is also valid for standard IPS without making the assumption that the importance weights are bounded. We demonstrate the relevance of our approach and its favorable performance through a set of learning tasks. Since our bound holds for standard IPS, we are able to provide insight into when regularizing IPS is useful. Namely, we identify cases where regularization might not be needed. This goes against the belief that, in practice, clipped IPS often enjoys favorable performance than standard IPS in OPL.'
volume: 202
URL: https://proceedings.mlr.press/v202/aouali23a.html
PDF: https://proceedings.mlr.press/v202/aouali23a/aouali23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aouali23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Imad
family: Aouali
- given: Victor-Emmanuel
family: Brunel
- given: David
family: Rohde
- given: Anna
family: Korba
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 984-1017
id: aouali23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 984
lastpage: 1017
published: 2023-07-03 00:00:00 +0000
- title: 'Polynomial Time and Private Learning of Unbounded Gaussian Mixture Models'
abstract: 'We study the problem of privately estimating the parameters of $d$-dimensional Gaussian Mixture Models (GMMs) with $k$ components. For this, we develop a technique to reduce the problem to its non-private counterpart. This allows us to privatize existing non-private algorithms in a blackbox manner, while incurring only a small overhead in the sample complexity and running time. As the main application of our framework, we develop an $(\varepsilon, \delta)$-differentially private algorithm to learn GMMs using the non-private algorithm of Moitra and Valiant (2010) as a blackbox. Consequently, this gives the first sample complexity upper bound and first polynomial time algorithm for privately learning GMMs without any boundedness assumptions on the parameters. As part of our analysis, we prove a tight (up to a constant factor) lower bound on the total variation distance of high-dimensional Gaussians which can be of independent interest.'
volume: 202
URL: https://proceedings.mlr.press/v202/arbas23a.html
PDF: https://proceedings.mlr.press/v202/arbas23a/arbas23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-arbas23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jamil
family: Arbas
- given: Hassan
family: Ashtiani
- given: Christopher
family: Liaw
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1018-1040
id: arbas23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1018
lastpage: 1040
published: 2023-07-03 00:00:00 +0000
- title: 'Principled Acceleration of Iterative Numerical Methods Using Machine Learning'
abstract: 'Iterative methods are ubiquitous in large-scale scientific computing applications, and a number of approaches based on meta-learning have been recently proposed to accelerate them. However, a systematic study of these approaches and how they differ from meta-learning is lacking. In this paper, we propose a framework to analyze such learning-based acceleration approaches, where one can immediately identify a departure from classical meta-learning. We theoretically show that this departure may lead to arbitrary deterioration of model performance, and at the same time, we identify a methodology to ameliorate it by modifying the loss objective, leading to a novel training method for learning-based acceleration of iterative algorithms. We demonstrate the significant advantage and versatility of the proposed approach through various numerical applications.'
volume: 202
URL: https://proceedings.mlr.press/v202/arisaka23a.html
PDF: https://proceedings.mlr.press/v202/arisaka23a/arisaka23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-arisaka23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sohei
family: Arisaka
- given: Qianxiao
family: Li
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1041-1059
id: arisaka23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1041
lastpage: 1059
published: 2023-07-03 00:00:00 +0000
- title: 'Faster Rates of Convergence to Stationary Points in Differentially Private Optimization'
abstract: 'We study the problem of approximating stationary points of Lipschitz and smooth functions under $(\varepsilon,\delta)$-differential privacy (DP) in both the finite-sum and stochastic settings. A point $\widehat{w}$ is called an $\alpha$-stationary point of a function $F:\mathbb{R}^d\rightarrow\mathbb{R}$ if $\|\nabla F(\widehat{w})\|\leq \alpha$. We give a new construction that improves over the existing rates in the stochastic optimization setting, where the goal is to find approximate stationary points of the population risk given $n$ samples. Our construction finds a $\tilde{O}\big(\frac{1}{n^{1/3}} + \big[\frac{\sqrt{d}}{n\varepsilon}\big]^{1/2}\big)$-stationary point of the population risk in time linear in $n$. We also provide an efficient algorithm that finds an $\tilde{O}\big(\big[\frac{\sqrt{d}}{n\varepsilon}\big]^{2/3}\big)$-stationary point in the finite-sum setting. This improves on the previous best rate of $\tilde{O}\big(\big[\frac{\sqrt{d}}{n\varepsilon}\big]^{1/2}\big)$. Furthermore, under the additional assumption of convexity, we completely characterize the sample complexity of finding stationary points of the population risk (up to polylog factors) and show that the optimal rate on population stationarity is $\tilde \Theta\big(\frac{1}{\sqrt{n}}+\frac{\sqrt{d}}{n\varepsilon}\big)$. Finally, we show that our methods can be used to provide dimension-independent rates of $O\big(\frac{1}{\sqrt{n}}+\min\big(\big[\frac{\sqrt{rank}}{n\varepsilon}\big]^{2/3},\frac{1}{(n\varepsilon)^{2/5}}\big)\big)$ on population stationarity for Generalized Linear Models (GLM), where $rank$ is the rank of the design matrix, which improves upon the previous best known rate.'
volume: 202
URL: https://proceedings.mlr.press/v202/arora23a.html
PDF: https://proceedings.mlr.press/v202/arora23a/arora23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-arora23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Raman
family: Arora
- given: Raef
family: Bassily
- given: Tomás
family: González
- given: Cristóbal A
family: Guzmán
- given: Michael
family: Menart
- given: Enayat
family: Ullah
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1060-1092
id: arora23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1060
lastpage: 1092
published: 2023-07-03 00:00:00 +0000
- title: 'Prototype-Sample Relation Distillation: Towards Replay-Free Continual Learning'
abstract: 'In Continual learning (CL) balancing effective adaptation while combating catastrophic forgetting is a central challenge. Many of the recent best-performing methods utilize various forms of prior task data, e.g. a replay buffer, to tackle the catastrophic forgetting problem. Having access to previous task data can be restrictive in many real-world scenarios, for example when task data is sensitive or proprietary. To overcome the necessity of using previous tasks’ data, in this work, we start with strong representation learning methods that have been shown to be less prone to forgetting. We propose a holistic approach to jointly learn the representation and class prototypes while maintaining the relevance of old class prototypes and their embedded similarities. Specifically, samples are mapped to an embedding space where the representations are learned using a supervised contrastive loss. Class prototypes are evolved continually in the same latent space, enabling learning and prediction at any point. To continually adapt the prototypes without keeping any prior task data, we propose a novel distillation loss that constrains class prototypes to maintain relative similarities as compared to new task data. This method yields state-of-the-art performance in the task-incremental setting, outperforming methods relying on large amounts of data, and provides strong performance in the class-incremental setting without using any stored data points.'
volume: 202
URL: https://proceedings.mlr.press/v202/asadi23a.html
PDF: https://proceedings.mlr.press/v202/asadi23a/asadi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-asadi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nader
family: Asadi
- given: Mohammadreza
family: Davari
- given: Sudhir
family: Mudur
- given: Rahaf
family: Aljundi
- given: Eugene
family: Belilovsky
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1093-1106
id: asadi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1093
lastpage: 1106
published: 2023-07-03 00:00:00 +0000
- title: 'Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime'
abstract: 'We consider online learning problems in the realizable setting, where there is a zero-loss solution, and propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds. For the problem of online prediction from experts, we design new algorithms that obtain near-optimal regret $O \big( \varepsilon^{-1} \mathsf{poly}(\log{d}) \big)$ where $d$ is the number of experts. This significantly improves over the best existing regret bounds for the DP non-realizable setting which are $O \big( \varepsilon^{-1} \min\big\{d, \sqrt{T\log d}\big\} \big)$. We also develop an adaptive algorithm for the small-loss setting with regret $(L^\star+ \varepsilon^{-1}) \cdot O(\mathsf{poly}(\log{d}))$ where $L^\star$ is the total loss of the best expert. Additionally, we consider DP online convex optimization in the realizable setting and propose an algorithm with near-optimal regret $O \big(\varepsilon^{-1} \mathsf{poly}(d) \big)$, as well as an algorithm for the smooth case with regret $O \big( (\sqrt{Td}/\varepsilon)^{2/3} \big)$, both significantly improving over existing bounds in the non-realizable regime.'
volume: 202
URL: https://proceedings.mlr.press/v202/asi23a.html
PDF: https://proceedings.mlr.press/v202/asi23a/asi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-asi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hilal
family: Asi
- given: Vitaly
family: Feldman
- given: Tomer
family: Koren
- given: Kunal
family: Talwar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1107-1120
id: asi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1107
lastpage: 1120
published: 2023-07-03 00:00:00 +0000
- title: 'From Robustness to Privacy and Back'
abstract: 'We study the relationship between two desiderata of algorithms in statistical inference and machine learning—differential privacy and robustness to adversarial data corruptions. Their conceptual similarity was first observed by Dwork and Lei (STOC 2009), who observed that private algorithms satisfy robustness, and gave a general method for converting robust algorithms to private ones. However, all general methods for transforming robust algorithms into private ones lead to suboptimal error rates. Our work gives the first black-box transformation that converts any adversarially robust algorithm into one that satisfies pure differential privacy. Moreover, we show that for any low-dimensional estimation task, applying our transformation to an optimal robust estimator results in an optimal private estimator. Thus, we conclude that for any low-dimensional task, the optimal error rate for $\varepsilon$-differentially private estimators is essentially the same as the optimal error rate for estimators that are robust to adversarially corrupting $1/\varepsilon$ training samples. We apply our transformation to obtain new optimal private estimators for several high-dimensional statistical tasks, including Gaussian linear regression and PCA. Finally, we present an extension of our transformation that leads to approximately differentially private algorithms whose error does not depend on the range of the output space, which is impossible under pure differential privacy.'
volume: 202
URL: https://proceedings.mlr.press/v202/asi23b.html
PDF: https://proceedings.mlr.press/v202/asi23b/asi23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-asi23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hilal
family: Asi
- given: Jonathan
family: Ullman
- given: Lydia
family: Zakynthinou
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1121-1146
id: asi23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1121
lastpage: 1146
published: 2023-07-03 00:00:00 +0000
- title: 'SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance'
abstract: 'We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive (self-tuning) method for first-order stochastic optimization. Despite being well studied, existing analyses of this method suffer from various shortcomings: they either assume some knowledge of the problem parameters, impose strong global Lipschitz conditions, or fail to give bounds that hold with high probability. We provide a comprehensive analysis of this basic method without any of these limitations, in both the convex and non-convex (smooth) cases, that additionally supports a general “affine variance” noise model and provides sharp rates of convergence in both the low-noise and high-noise regimes.'
volume: 202
URL: https://proceedings.mlr.press/v202/attia23a.html
PDF: https://proceedings.mlr.press/v202/attia23a/attia23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-attia23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amit
family: Attia
- given: Tomer
family: Koren
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1147-1171
id: attia23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1147
lastpage: 1171
published: 2023-07-03 00:00:00 +0000
- title: 'Adversarially Robust PAC Learnability of Real-Valued Functions'
abstract: 'We study robustness to test-time adversarial attacks in the regression setting with $\ell_p$ losses and arbitrary perturbation sets. We address the question of which function classes are PAC learnable in this setting. We show that classes of finite fat-shattering dimension are learnable in both the realizable and agnostic settings. Moreover, for convex function classes, they are even properly learnable. In contrast, some non-convex function classes provably require improper learning algorithms. Our main technique is based on a construction of an adversarially robust sample compression scheme of a size determined by the fat-shattering dimension. Along the way, we introduce a novel agnostic sample compression scheme for real-valued functions, which may be of independent interest.'
volume: 202
URL: https://proceedings.mlr.press/v202/attias23a.html
PDF: https://proceedings.mlr.press/v202/attias23a/attias23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-attias23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Idan
family: Attias
- given: Steve
family: Hanneke
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1172-1199
id: attias23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1172
lastpage: 1199
published: 2023-07-03 00:00:00 +0000
- title: 'Infusing Lattice Symmetry Priors in Attention Mechanisms for Sample-Efficient Abstract Geometric Reasoning'
abstract: 'The Abstraction and Reasoning Corpus (ARC) (Chollet, 2019) and its most recent language-complete instantiation (LARC) has been postulated as an important step towards general AI. Yet, even state-of-the-art machine learning models struggle to achieve meaningful performance on these problems, falling behind non-learning based approaches. We argue that solving these tasks requires extreme generalization that can only be achieved by proper accounting for core knowledge priors. As a step towards this goal, we focus on geometry priors and introduce LatFormer, a model that incorporates lattice symmetry priors in attention masks. We show that, for any transformation of the hypercubic lattice, there exists a binary attention mask that implements that group action. Hence, our study motivates a modification to the standard attention mechanism, where attention weights are scaled using soft masks generated by a convolutional network. Experiments on synthetic geometric reasoning show that LatFormer requires 2 orders of magnitude fewer data than standard attention and transformers. Moreover, our results on ARC and LARC tasks that incorporate geometric priors provide preliminary evidence that these complex datasets do not lie out of the reach of deep learning models.'
volume: 202
URL: https://proceedings.mlr.press/v202/atzeni23a.html
PDF: https://proceedings.mlr.press/v202/atzeni23a/atzeni23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-atzeni23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mattia
family: Atzeni
- given: Mrinmaya
family: Sachan
- given: Andreas
family: Loukas
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1200-1217
id: atzeni23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1200
lastpage: 1217
published: 2023-07-03 00:00:00 +0000
- title: 'Learning to Initiate and Reason in Event-Driven Cascading Processes'
abstract: 'Training agents to control a dynamic environment is a fundamental task in AI. In many environments, the dynamics can be summarized by a small set of events that capture the semantic behavior of the system. Typically, these events form chains or cascades. We often wish to change the system behavior using a single intervention that propagates through the cascade. For instance, one may trigger a biochemical cascade to switch the state of a cell or, in logistics, reroute a truck to meet an unexpected, urgent delivery. We introduce a new supervised learning setup called Cascade. An agent observes a system with known dynamics evolving from some initial state. The agent is given a structured semantic instruction and needs to make an intervention that triggers a cascade of events, such that the system reaches an alternative (counterfactual) behavior. We provide a test-bed for this problem, consisting of physical objects. We combine semantic tree search with an event-driven forward model and devise an algorithm that learns to efficiently search in exponentially large semantic trees. We demonstrate that our approach learns to follow instructions to intervene in new complex scenes. When provided with an observed cascade of events, it can also reason about alternative outcomes.'
volume: 202
URL: https://proceedings.mlr.press/v202/atzmon23a.html
PDF: https://proceedings.mlr.press/v202/atzmon23a/atzmon23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-atzmon23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuval
family: Atzmon
- given: Eli
family: Meirom
- given: Shie
family: Mannor
- given: Gal
family: Chechik
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1218-1243
id: atzmon23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1218
lastpage: 1243
published: 2023-07-03 00:00:00 +0000
- title: 'On the convergence of the MLE as an estimator of the learning rate in the Exp3 algorithm'
abstract: 'When fitting the learning data of an individual to algorithm-like learning models, the observations are so dependent and non-stationary that one may wonder what the classical Maximum Likelihood Estimator (MLE) could do, even if it is the usual tool applied to experimental cognition. Our objective in this work is to show that the estimation of the learning rate cannot be efficient if the learning rate is constant in the classical Exp3 (Exponential weights for Exploration and Exploitation) algorithm. Secondly, we show that if the learning rate decreases polynomially with the sample size, then the prediction error and in some cases the estimation error of the MLE satisfy bounds in probability that decrease at a polynomial rate.'
volume: 202
URL: https://proceedings.mlr.press/v202/aubert23a.html
PDF: https://proceedings.mlr.press/v202/aubert23a/aubert23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-aubert23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Julien
family: Aubert
- given: Luc
family: Lehéricy
- given: Patricia
family: Reynaud-Bouret
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1244-1275
id: aubert23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1244
lastpage: 1275
published: 2023-07-03 00:00:00 +0000
- title: 'Dirichlet Diffusion Score Model for Biological Sequence Generation'
abstract: 'Designing biological sequences is an important challenge that requires satisfying complex constraints and thus is a natural problem to address with deep generative modeling. Diffusion generative models have achieved considerable success in many applications. Score-based generative stochastic differential equations (SDE) model is a continuous-time diffusion model framework that enjoys many benefits, but the originally proposed SDEs are not naturally designed for modeling discrete data. To develop generative SDE models for discrete data such as biological sequences, here we introduce a diffusion process defined in the probability simplex space with stationary distribution being the Dirichlet distribution. This makes diffusion in continuous space natural for modeling discrete data. We refer to this approach as Dirchlet diffusion score model. We demonstrate that this technique can generate samples that satisfy hard constraints using a Sudoku generation task. This generative model can also solve Sudoku, including hard puzzles, without additional training. Finally, we applied this approach to develop the first human promoter DNA sequence design model and showed that designed sequences share similar properties with natural promoter sequences.'
volume: 202
URL: https://proceedings.mlr.press/v202/avdeyev23a.html
PDF: https://proceedings.mlr.press/v202/avdeyev23a/avdeyev23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-avdeyev23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pavel
family: Avdeyev
- given: Chenlai
family: Shi
- given: Yuhao
family: Tan
- given: Kseniia
family: Dudnyk
- given: Jian
family: Zhou
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1276-1301
id: avdeyev23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1276
lastpage: 1301
published: 2023-07-03 00:00:00 +0000
- title: 'Gradient Descent Converges Linearly for Logistic Regression on Separable Data'
abstract: 'We show that running gradient descent with variable learning rate guarantees loss $f(x) ≤ 1.1 \cdot f(x^*)+\epsilon$ for the logistic regression objective, where the error $\epsilon$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x$. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.'
volume: 202
URL: https://proceedings.mlr.press/v202/axiotis23a.html
PDF: https://proceedings.mlr.press/v202/axiotis23a/axiotis23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-axiotis23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kyriakos
family: Axiotis
- given: Maxim
family: Sviridenko
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1302-1319
id: axiotis23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1302
lastpage: 1319
published: 2023-07-03 00:00:00 +0000
- title: 'Naive imputation implicitly regularizes high-dimensional linear models'
abstract: 'Two different approaches exist to handle missing values for prediction: either imputation, prior to fitting any predictive algorithms, or dedicated methods able to natively incorporate missing values. While imputation is widely (and easily) use, it is unfortunately biased when low-capacity predictors (such as linear models) are applied afterward. However, in practice, naive imputation exhibits good predictive performance. In this paper, we study the impact of imputation in a high-dimensional linear model with MCAR missing data. We prove that zero imputation performs an implicit regularization closely related to the ridge method, often used in high-dimensional problems. Leveraging on this connection, we establish that the imputation bias is controlled by a ridge bias, which vanishes in high dimension. As a predictor, we argue in favor of the averaged SGD strategy, applied to zero-imputed data. We establish an upper bound on its generalization error, highlighting that imputation is benign in the $d \gg \sqrt{n}$ regime. Experiments illustrate our findings.'
volume: 202
URL: https://proceedings.mlr.press/v202/ayme23a.html
PDF: https://proceedings.mlr.press/v202/ayme23a/ayme23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ayme23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexis
family: Ayme
- given: Claire
family: Boyer
- given: Aymeric
family: Dieuleveut
- given: Erwan
family: Scornet
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1320-1340
id: ayme23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1320
lastpage: 1340
published: 2023-07-03 00:00:00 +0000
- title: 'Half-Hop: A graph upsampling approach for slowing down message passing'
abstract: 'Message passing neural networks have shown a lot of success on graph-structured data. However, there are many instances where message passing can lead to over-smoothing or fail when neighboring nodes belong to different classes. In this work, we introduce a simple yet general framework for improving learning in message passing neural networks. Our approach essentially upsamples edges in the original graph by adding "slow nodes" at each edge that can mediate communication between a source and a target node. Our method only modifies the input graph, making it plug-and-play and easy to use with existing models. To understand the benefits of slowing down message passing, we provide theoretical and empirical analyses. We report results on several supervised and self-supervised benchmarks, and show improvements across the board, notably in heterophilic conditions where adjacent nodes are more likely to have different labels. Finally, we show how our approach can be used to generate augmentations for self-supervised learning, where slow nodes are randomly introduced into different edges in the graph to generate multi-scale views with variable path lengths.'
volume: 202
URL: https://proceedings.mlr.press/v202/azabou23a.html
PDF: https://proceedings.mlr.press/v202/azabou23a/azabou23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-azabou23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mehdi
family: Azabou
- given: Venkataramana
family: Ganesh
- given: Shantanu
family: Thakoor
- given: Chi-Heng
family: Lin
- given: Lakshmi
family: Sathidevi
- given: Ran
family: Liu
- given: Michal
family: Valko
- given: Petar
family: Veličković
- given: Eva L
family: Dyer
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1341-1360
id: azabou23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1341
lastpage: 1360
published: 2023-07-03 00:00:00 +0000
- title: 'CLUTR: Curriculum Learning via Unsupervised Task Representation Learning'
abstract: 'Reinforcement Learning (RL) algorithms are often known for sample inefficiency and difficult generalization. Recently, Unsupervised Environment Design (UED) emerged as a new paradigm for zero-shot generalization by simultaneously learning a task distribution and agent policies on the generated tasks. This is a non-stationary process where the task distribution evolves along with agent policies; creating an instability over time. While past works demonstrated the potential of such approaches, sampling effectively from the task space remains an open challenge, bottlenecking these approaches. To this end, we introduce CLUTR: a novel unsupervised curriculum learning algorithm that decouples task representation and curriculum learning into a two-stage optimization. It first trains a recurrent variational autoencoder on randomly generated tasks to learn a latent task manifold. Next, a teacher agent creates a curriculum by maximizing a minimax REGRET-based objective on a set of latent tasks sampled from this manifold. Using the fixed-pretrained task manifold, we show that CLUTR successfully overcomes the non-stationarity problem and improves stability. Our experimental results show CLUTR outperforms PAIRED, a principled and popular UED method, in the challenging CarRacing and navigation environments: achieving 10.6X and 45% improvement in zero-shot generalization, respectively. CLUTR also performs comparably to the non-UED state-of-the-art for CarRacing, while requiring 500X fewer environment interactions. We open source our code at https://github.com/clutr/clutr.'
volume: 202
URL: https://proceedings.mlr.press/v202/azad23a.html
PDF: https://proceedings.mlr.press/v202/azad23a/azad23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-azad23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Abdus Salam
family: Azad
- given: Izzeddin
family: Gur
- given: Jasper
family: Emhoff
- given: Nathaniel
family: Alexis
- given: Aleksandra
family: Faust
- given: Pieter
family: Abbeel
- given: Ion
family: Stoica
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1361-1395
id: azad23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1361
lastpage: 1395
published: 2023-07-03 00:00:00 +0000
- title: 'Personalized Subgraph Federated Learning'
abstract: 'Subgraphs of a larger global graph may be distributed across multiple devices, and only locally accessible due to privacy restrictions, although there may be links between subgraphs. Recently proposed subgraph Federated Learning (FL) methods deal with those missing links across local subgraphs while distributively training Graph Neural Networks (GNNs) on them. However, they have overlooked the inevitable heterogeneity between subgraphs comprising different communities of a global graph, consequently collapsing the incompatible knowledge from local GNN models. To this end, we introduce a new subgraph FL problem, personalized subgraph FL, which focuses on the joint improvement of the interrelated local GNNs rather than learning a single global model, and propose a novel framework, FEDerated Personalized sUBgraph learning (FED-PUB), to tackle it. Since the server cannot access the subgraph in each client, FED-PUB utilizes functional embeddings of the local GNNs using random graphs as inputs to compute similarities between them, and use the similarities to perform weighted averaging for server-side aggregation. Further, it learns a personalized sparse mask at each client to select and update only the subgraph-relevant subset of the aggregated parameters. We validate our FED-PUB for its subgraph FL performance on six datasets, considering both non-overlapping and overlapping subgraphs, on which it significantly outperforms relevant baselines. Our code is available at https://github.com/JinheonBaek/FED-PUB.'
volume: 202
URL: https://proceedings.mlr.press/v202/baek23a.html
PDF: https://proceedings.mlr.press/v202/baek23a/baek23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-baek23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jinheon
family: Baek
- given: Wonyong
family: Jeong
- given: Jiongdao
family: Jin
- given: Jaehong
family: Yoon
- given: Sung Ju
family: Hwang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1396-1415
id: baek23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1396
lastpage: 1415
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language'
abstract: 'Current self-supervised learning algorithms are often modality-specific and require large amounts of computational resources. To address these issues, we increase the training efficiency of data2vec, a learning objective that generalizes across several modalities. We do not encode masked tokens, use a fast convolutional decoder and amortize the effort to build teacher representations. data2vec 2.0 benefits from the rich contextualized target representations introduced in data2vec which enable a fast self-supervised learner. Experiments on ImageNet-1K image classification show that data2vec 2.0 matches the accuracy of Masked Autoencoders in 16.4x lower pre-training time, on Librispeech speech recognition it performs as well as wav2vec 2.0 in 10.6x less time, and on GLUE natural language understanding it matches a retrained RoBERTa model in half the time. Trading some speed for accuracy results in ImageNet-1K top-1 accuracy of 86.8% with a ViT-L model trained for 150 epochs.'
volume: 202
URL: https://proceedings.mlr.press/v202/baevski23a.html
PDF: https://proceedings.mlr.press/v202/baevski23a/baevski23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-baevski23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexei
family: Baevski
- given: Arun
family: Babu
- given: Wei-Ning
family: Hsu
- given: Michael
family: Auli
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1416-1429
id: baevski23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1416
lastpage: 1429
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient preconditioned stochastic gradient descent for estimation in latent variable models'
abstract: 'Latent variable models are powerful tools for modeling complex phenomena involving in particular partially observed data, unobserved variables or underlying complex unknown structures. Inference is often difficult due to the latent structure of the model. To deal with parameter estimation in the presence of latent variables, well-known efficient methods exist, such as gradient-based and EM-type algorithms, but with practical and theoretical limitations. In this paper, we propose as an alternative for parameter estimation an efficient preconditioned stochastic gradient algorithm. Our method includes a preconditioning step based on a positive definite Fisher information matrix estimate. We prove convergence results for the proposed algorithm under mild assumptions for very general latent variables models. We illustrate through relevant simulations the performance of the proposed methodology in a nonlinear mixed effects model and in a stochastic block model.'
volume: 202
URL: https://proceedings.mlr.press/v202/baey23a.html
PDF: https://proceedings.mlr.press/v202/baey23a/baey23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-baey23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Charlotte
family: Baey
- given: Maud
family: Delattre
- given: Estelle
family: Kuhn
- given: Jean-Benoist
family: Leger
- given: Sarah
family: Lemler
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1430-1453
id: baey23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1430
lastpage: 1453
published: 2023-07-03 00:00:00 +0000
- title: 'Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection'
abstract: 'Modern machine learning models deployed in the wild can encounter both covariate and semantic shifts, giving rise to the problems of out-of-distribution (OOD) generalization and OOD detection respectively. While both problems have received significant research attention lately, they have been pursued independently. This may not be surprising, since the two tasks have seemingly conflicting goals. This paper provides a new unified approach that is capable of simultaneously generalizing to covariate shifts while robustly detecting semantic shifts. We propose a margin-based learning framework that exploits freely available unlabeled data in the wild that captures the environmental test-time OOD distributions under both covariate and semantic shifts. We show both empirically and theoretically that the proposed margin constraint is the key to achieving both OOD generalization and detection. Extensive experiments show the superiority of our framework, outperforming competitive baselines that specialize in either OOD generalization or OOD detection. Code is publicly available at https://github.com/deeplearning-wisc/scone.'
volume: 202
URL: https://proceedings.mlr.press/v202/bai23a.html
PDF: https://proceedings.mlr.press/v202/bai23a/bai23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bai23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Haoyue
family: Bai
- given: Gregory
family: Canal
- given: Xuefeng
family: Du
- given: Jeongyeol
family: Kwon
- given: Robert D
family: Nowak
- given: Yixuan
family: Li
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1454-1471
id: bai23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1454
lastpage: 1471
published: 2023-07-03 00:00:00 +0000
- title: 'Answering Complex Logical Queries on Knowledge Graphs via Query Computation Tree Optimization'
abstract: 'Answering complex logical queries on incomplete knowledge graphs is a challenging task, and has been widely studied. Embedding-based methods require training on complex queries and may not generalize well to out-of-distribution query structures. Recent work frames this task as an end-to-end optimization problem, and it only requires a pretrained link predictor. However, due to the exponentially large combinatorial search space, the optimal solution can only be approximated, limiting the final accuracy. In this work, we propose QTO (Query Computation Tree Optimization) that can efficiently find the exact optimal solution. QTO finds the optimal solution by a forward-backward propagation on the tree-like computation graph, i.e., query computation tree. In particular, QTO utilizes the independence encoded in the query computation tree to reduce the search space, where only local computations are involved during the optimization procedure. Experiments on 3 datasets show that QTO obtains state-of-the-art performance on complex query answering, outperforming previous best results by an average of 22%. Moreover, QTO can interpret the intermediate solutions for each of the one-hop atoms in the query with over 90% accuracy.'
volume: 202
URL: https://proceedings.mlr.press/v202/bai23b.html
PDF: https://proceedings.mlr.press/v202/bai23b/bai23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bai23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yushi
family: Bai
- given: Xin
family: Lv
- given: Juanzi
family: Li
- given: Lei
family: Hou
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1472-1491
id: bai23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1472
lastpage: 1491
published: 2023-07-03 00:00:00 +0000
- title: 'Linear optimal partial transport embedding'
abstract: 'Optimal transport (OT) has gained popularity due to its various applications in fields such as machine learning, statistics, and signal processing. However, the balanced mass requirement limits its performance in practical problems. To address these limitations, variants of the OT problem, including unbalanced OT, Optimal partial transport (OPT), and Hellinger Kantorovich (HK), have been proposed. In this paper, we propose the Linear optimal partial transport (LOPT) embedding, which extends the (local) linearization technique on OT and HK to the OPT problem. The proposed embedding allows for faster computation of OPT distance between pairs of positive measures. Besides our theoretical contributions, we demonstrate the LOPT embedding technique in point-cloud interpolation and PCA analysis. Our code is available at https://github.com/Baio0/LinearOPT.'
volume: 202
URL: https://proceedings.mlr.press/v202/bai23c.html
PDF: https://proceedings.mlr.press/v202/bai23c/bai23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bai23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yikun
family: Bai
- given: Ivan Vladimir
family: Medri
- given: Rocio
family: Diaz Martin
- given: Rana
family: Shahroz
- given: Soheil
family: Kolouri
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1492-1520
id: bai23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1492
lastpage: 1520
published: 2023-07-03 00:00:00 +0000
- title: 'Implicit Graph Neural Networks: A Monotone Operator Viewpoint'
abstract: 'Implicit graph neural networks (IGNNs) – that solve a fixed-point equilibrium equation using Picard iteration for representation learning – have shown remarkable performance in learning long-range dependencies (LRD) in the underlying graphs. However, IGNNs suffer from several issues, including 1) their expressivity is limited by their parameterizations for the well-posedness guarantee, 2) IGNNs are unstable in learning LRD, and 3) IGNNs become computationally inefficient when learning LRD. In this paper, we provide a new well-posedness characterization for IGNNs leveraging monotone operator theory, resulting in a much more expressive parameterization than the existing one. We also propose an orthogonal parameterization for IGNN based on Cayley transform to stabilize learning LRD. Furthermore, we leverage Anderson-accelerated operator splitting schemes to efficiently solve for the fixed point of the equilibrium equation of IGNN with monotone or orthogonal parameterization. We verify the computational efficiency and accuracy of the new models over existing IGNNs on various graph learning tasks at both graph and node levels.'
volume: 202
URL: https://proceedings.mlr.press/v202/baker23a.html
PDF: https://proceedings.mlr.press/v202/baker23a/baker23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-baker23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Justin
family: Baker
- given: Qingsong
family: Wang
- given: Cory D
family: Hauck
- given: Bao
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1521-1548
id: baker23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1521
lastpage: 1548
published: 2023-07-03 00:00:00 +0000
- title: 'Tensor Decompositions Meet Control Theory: Learning General Mixtures of Linear Dynamical Systems'
abstract: 'Recently Chen and Poor initiated the study of learning mixtures of linear dynamical systems. While linear dynamical systems already have wide-ranging applications in modeling time-series data, using mixture models can lead to a better fit or even a richer understanding of underlying subpopulations represented in the data. In this work we give a new approach to learning mixtures of linear dynamical systems that is based on tensor decompositions. As a result, our algorithm succeeds without strong separation conditions on the components, and can be used to compete with the Bayes optimal clustering of the trajectories. Moreover our algorithm works in the challenging partially-observed setting. Our starting point is the simple but powerful observation that the classic Ho-Kalman algorithm is a relative of modern tensor decomposition methods for learning latent variable models. This gives us a playbook for how to extend it to work with more complicated generative models.'
volume: 202
URL: https://proceedings.mlr.press/v202/bakshi23a.html
PDF: https://proceedings.mlr.press/v202/bakshi23a/bakshi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bakshi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ainesh
family: Bakshi
- given: Allen
family: Liu
- given: Ankur
family: Moitra
- given: Morris
family: Yau
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1549-1563
id: bakshi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1549
lastpage: 1563
published: 2023-07-03 00:00:00 +0000
- title: 'Block Subsampled Randomized Hadamard Transform for Nyström Approximation on Distributed Architectures'
abstract: 'This article introduces a novel structured random matrix composed blockwise from subsampled randomized Hadamard transforms (SRHTs). The block SRHT is expected to outperform well-known dimension reduction maps, including SRHT and Gaussian matrices on distributed architectures. We prove that a block SRHT with enough rows is an oblivious subspace embedding, i.e., an approximate isometry for an arbitrary low-dimensional subspace with high probability. Our estimate of the required number of rows is similar to that of the standard SRHT. This suggests that the two transforms should provide the same accuracy of approximation in the algorithms. The block SRHT can be readily incorporated into randomized methods for computing a low-rank approximation of a large-scale matrix, such as the Nyström method. For completeness, we revisit this method with a discussion of its implementation on distributed architectures.'
volume: 202
URL: https://proceedings.mlr.press/v202/balabanov23a.html
PDF: https://proceedings.mlr.press/v202/balabanov23a/balabanov23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-balabanov23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Oleg
family: Balabanov
- given: Matthias
family: Beaupère
- given: Laura
family: Grigori
- given: Victor
family: Lederer
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1564-1576
id: balabanov23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1564
lastpage: 1576
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient Online Reinforcement Learning with Offline Data'
abstract: 'Sample efficiency and exploration remain major challenges in online reinforcement learning (RL). A powerful approach that can be applied to address these issues is the inclusion of offline data, such as prior trajectories from a human expert or a sub-optimal exploration policy. Previous methods have relied on extensive modifications and additional complexity to ensure the effective use of this data. Instead, we ask: *can we simply apply existing off-policy methods to leverage offline data when learning online?* In this work, we demonstrate that the answer is yes; however, a set of minimal but important changes to existing off-policy RL algorithms are required to achieve reliable performance. We extensively ablate these design choices, demonstrating the key factors that most affect performance, and arrive at a set of recommendations that practitioners can readily apply, whether their data comprise a small number of expert demonstrations or large volumes of sub-optimal trajectories. We see that correct application of these simple recommendations can provide a $\mathbf{2.5\times}$ improvement over existing approaches across a diverse set of competitive benchmarks, with no additional computational overhead.'
volume: 202
URL: https://proceedings.mlr.press/v202/ball23a.html
PDF: https://proceedings.mlr.press/v202/ball23a/ball23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ball23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Philip J.
family: Ball
- given: Laura
family: Smith
- given: Ilya
family: Kostrikov
- given: Sergey
family: Levine
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1577-1594
id: ball23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1577
lastpage: 1594
published: 2023-07-03 00:00:00 +0000
- title: 'Mirror Sinkhorn: Fast Online Optimization on Transport Polytopes'
abstract: 'Optimal transport is an important tool in machine learning, allowing to capture geometric properties of the data through a linear program on transport polytopes. We present a single-loop optimization algorithm for minimizing general convex objectives on these domains, utilizing the principles of Sinkhorn matrix scaling and mirror descent. The proposed algorithm is robust to noise, and can be used in an online setting. We provide theoretical guarantees for convex objectives and experimental results showcasing it effectiveness on both synthetic and real-world data.'
volume: 202
URL: https://proceedings.mlr.press/v202/ballu23a.html
PDF: https://proceedings.mlr.press/v202/ballu23a/ballu23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ballu23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marin
family: Ballu
- given: Quentin
family: Berthet
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1595-1613
id: ballu23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1595
lastpage: 1613
published: 2023-07-03 00:00:00 +0000
- title: 'On the Functional Similarity of Robust and Non-Robust Neural Representations'
abstract: 'Model stitching—where the internal representations of two neural networks are aligned linearly—helped demonstrate that the representations of different neural networks for the same task are surprisingly similar in a functional sense. At the same time, the representations of adversarially robust networks are considered to be different from non-robust representations. For example, robust image classifiers are invertible, while non-robust networks are not. Here, we investigate the functional similarity of robust and non-robust representations for image classification with the help of model stitching. We find that robust and non-robust networks indeed have different representations. However, these representations are compatible regarding accuracy. From the point of view of robust accuracy, compatibility decreases quickly after the first few layers but the representations become compatible again in the last layers, in the sense that the properties of the front model can be recovered. Moreover, this is true even in the case of cross-task stitching. Our results suggest that stitching in the initial, preprocessing layers and the final, abstract layers test different kinds of compatibilities. In particular, the final layers are easy to match, because their representations depend mostly on the same abstract task specification, in our case, the classification of the input into $n$ classes.'
volume: 202
URL: https://proceedings.mlr.press/v202/balogh23a.html
PDF: https://proceedings.mlr.press/v202/balogh23a/balogh23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-balogh23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: András
family: Balogh
- given: Márk
family: Jelasity
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1614-1635
id: balogh23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1614
lastpage: 1635
published: 2023-07-03 00:00:00 +0000
- title: 'Robust Budget Pacing with a Single Sample'
abstract: 'Major Internet advertising platforms offer budget pacing tools as a standard service for advertisers to manage their ad campaigns. Given the inherent non-stationarity in an advertiser’s value and also competing advertisers’ values over time, a commonly used approach is to learn a target expenditure plan that specifies a target spend as a function of time, and then run a controller that tracks this plan. This raises the question: *how many historical samples are required to learn a good expenditure plan*? We study this question by considering an advertiser repeatedly participating in $T$ second-price auctions, where the tuple of her value and the highest competing bid is drawn from an unknown time-varying distribution. The advertiser seeks to maximize her total utility subject to her budget constraint. Prior work has shown the sufficiency of *$T\log T$ samples per distribution* to achieve the optimal $O(\sqrt{T})$-regret. We dramatically improve this state-of-the-art and show that *just one sample per distribution* is enough to achieve the near-optimal $\tilde O(\sqrt{T})$-regret, while still being robust to noise in the sampling distributions.'
volume: 202
URL: https://proceedings.mlr.press/v202/balseiro23a.html
PDF: https://proceedings.mlr.press/v202/balseiro23a/balseiro23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-balseiro23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Santiago R.
family: Balseiro
- given: Rachitesh
family: Kumar
- given: Vahab
family: Mirrokni
- given: Balasubramanian
family: Sivan
- given: Di
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1636-1659
id: balseiro23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1636
lastpage: 1659
published: 2023-07-03 00:00:00 +0000
- title: 'Dynamic Constrained Submodular Optimization with Polylogarithmic Update Time'
abstract: 'Maximizing a monotone submodular function under cardinality constraint $k$ is a core problem in machine learning and database with many basic applications, including video and data summarization, recommendation systems, feature extraction, exemplar clustering, and coverage problems. We study this classic problem in the fully dynamic model where a stream of insertions and deletions of elements of an underlying ground set is given and the goal is to maintain an approximate solution using a fast update time. A recent paper at NeurIPS’20 by Lattanzi, Mitrovic, Norouzi-Fard, Tarnawski, Zadimoghaddam claims to obtain a dynamic algorithm for this problem with a $(\frac{1}{2} -\epsilon)$ approximation ratio and a query complexity bounded by $\mathrm{poly}(\log(n),\log(k),\epsilon^{-1})$. However, as we explain in this paper, the analysis has some important gaps. Having a dynamic algorithm for the problem with polylogarithmic update time is even more important in light of a recent result by Chen and Peng at STOC’22 who show a matching lower bound for the problem – any randomized algorithm with a $\frac{1}{2}+\epsilon$ approximation ratio must have an amortized query complexity that is polynomial in $n$. In this paper, we develop a simpler algorithm for the problem that maintains a $(\frac{1}{2}-\epsilon)$-approximate solution for submodular maximization under cardinality constraint $k$ using a polylogarithmic amortized update time.'
volume: 202
URL: https://proceedings.mlr.press/v202/banihashem23a.html
PDF: https://proceedings.mlr.press/v202/banihashem23a/banihashem23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-banihashem23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kiarash
family: Banihashem
- given: Leyla
family: Biabani
- given: Samira
family: Goudarzi
- given: Mohammadtaghi
family: Hajiaghayi
- given: Peyman
family: Jabbarzade
- given: Morteza
family: Monemizadeh
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1660-1691
id: banihashem23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1660
lastpage: 1691
published: 2023-07-03 00:00:00 +0000
- title: 'One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale'
abstract: 'This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions relevant to a set of multi-modal data in one model. Our key insight is – learning diffusion models for marginal, conditional, and joint distributions can be unified as predicting the noise in the perturbed data, where the perturbation levels (i.e. timesteps) can be different for different modalities. Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model – perturbs data in all modalities instead of a single modality, inputs individual timesteps in different modalities, and predicts the noise of all modalities instead of a single modality. UniDiffuser is parameterized by a transformer for diffusion models to handle input types of different modalities. Implemented on large-scale paired image-text data, UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead. In particular, UniDiffuser is able to produce perceptually realistic samples in all tasks and its quantitative results (e.g., the FID and CLIP score) are not only superior to existing general-purpose models but also comparable to the bespoken models (e.g., Stable Diffusion and DALL-E 2) in representative tasks (e.g., text-to-image generation).'
volume: 202
URL: https://proceedings.mlr.press/v202/bao23a.html
PDF: https://proceedings.mlr.press/v202/bao23a/bao23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bao23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Fan
family: Bao
- given: Shen
family: Nie
- given: Kaiwen
family: Xue
- given: Chongxuan
family: Li
- given: Shi
family: Pu
- given: Yaole
family: Wang
- given: Gang
family: Yue
- given: Yue
family: Cao
- given: Hang
family: Su
- given: Jun
family: Zhu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1692-1717
id: bao23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1692
lastpage: 1717
published: 2023-07-03 00:00:00 +0000
- title: 'Optimizing the Collaboration Structure in Cross-Silo Federated Learning'
abstract: 'In federated learning (FL), multiple clients collaborate to train machine learning models together while keeping their data decentralized. Through utilizing more training data, FL suffers from the potential negative transfer problem: the global FL model may even perform worse than the models trained with local data only. In this paper, we propose FedCollab, a novel FL framework that alleviates negative transfer by clustering clients into non-overlapping coalitions based on their distribution distances and data quantities. As a result, each client only collaborates with the clients having similar data distributions, and tends to collaborate with more clients when it has less data. We evaluate our framework with a variety of datasets, models, and types of non-IIDness. Our results demonstrate that FedCollab effectively mitigates negative transfer across a wide range of FL algorithms and consistently outperforms other clustered FL algorithms.'
volume: 202
URL: https://proceedings.mlr.press/v202/bao23b.html
PDF: https://proceedings.mlr.press/v202/bao23b/bao23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bao23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wenxuan
family: Bao
- given: Haohan
family: Wang
- given: Jun
family: Wu
- given: Jingrui
family: He
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1718-1736
id: bao23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1718
lastpage: 1736
published: 2023-07-03 00:00:00 +0000
- title: 'MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation'
abstract: 'Recent advances in text-to-image generation with diffusion models present transformative capabilities in image quality. However, user controllability of the generated image, and fast adaptation to new tasks still remains an open challenge, currently mostly addressed by costly and long re-training and fine-tuning or ad-hoc adaptations to specific image generation tasks. In this work, we present MultiDiffusion, a unified framework that enables versatile and controllable image generation, using a pre-trained text-to-image diffusion model, without any further training or finetuning. At the center of our approach is a new generation process, based on an optimization task that binds together multiple diffusion generation processes with a shared set of parameters or constraints. We show that MultiDiffusion can be readily applied to generate high quality and diverse images that adhere to user-provided controls, such as desired aspect ratio (e.g., panorama), and spatial guiding signals, ranging from tight segmentation masks to bounding boxes.'
volume: 202
URL: https://proceedings.mlr.press/v202/bar-tal23a.html
PDF: https://proceedings.mlr.press/v202/bar-tal23a/bar-tal23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bar-tal23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Omer
family: Bar-Tal
- given: Lior
family: Yariv
- given: Yaron
family: Lipman
- given: Tali
family: Dekel
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1737-1752
id: bar-tal23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1737
lastpage: 1752
published: 2023-07-03 00:00:00 +0000
- title: 'Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space'
abstract: 'We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure. Beyond the standard cumulative reward RL setting, this problem includes as particular cases constrained RL, pure exploration and learning from demonstrations among others. For this problem, we propose a simpler single-loop parameter-free normalized policy gradient algorithm. Implementing a recursive momentum variance reduction mechanism, our algorithm achieves $\tilde{\mathcal{O}}(\epsilon^{-3})$ and $\tilde{\mathcal{O}}(\epsilon^{-2})$ sample complexities for $\epsilon$-first-order stationarity and $\epsilon$-global optimality respectively, under adequate assumptions. We further address the setting of large finite state action spaces via linear function approximation of the occupancy measure and show a $\tilde{\mathcal{O}}(\epsilon^{-4})$ sample complexity for a simple policy gradient method with a linear regression subroutine.'
volume: 202
URL: https://proceedings.mlr.press/v202/barakat23a.html
PDF: https://proceedings.mlr.press/v202/barakat23a/barakat23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-barakat23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anas
family: Barakat
- given: Ilyas
family: Fatkhullin
- given: Niao
family: He
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1753-1800
id: barakat23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1753
lastpage: 1800
published: 2023-07-03 00:00:00 +0000
- title: 'Interpretable Neural-Symbolic Concept Reasoning'
abstract: 'Deep learning methods are highly accurate, yet their opaque decision process prevents them from earning full human trust. Concept-based models aim to address this issue by learning tasks based on a set of human-understandable concepts. However, state-of-the-art concept-based models rely on high-dimensional concept embedding representations which lack a clear semantic meaning, thus questioning the interpretability of their decision process. To overcome this limitation, we propose the Deep Concept Reasoner (DCR), the first interpretable concept-based model that builds upon concept embeddings. In DCR, neural networks do not make task predictions directly, but they build syntactic rule structures using concept embeddings. DCR then executes these rules on meaningful concept truth degrees to provide a final interpretable and semantically-consistent prediction in a differentiable manner. Our experiments show that DCR: (i) improves up to +25% w.r.t. state-of-the-art interpretable concept-based models on challenging benchmarks (ii) discovers meaningful logic rules matching known ground truths even in the absence of concept supervision during training, and (iii), facilitates the generation of counterfactual examples providing the learnt rules as guidance.'
volume: 202
URL: https://proceedings.mlr.press/v202/barbiero23a.html
PDF: https://proceedings.mlr.press/v202/barbiero23a/barbiero23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-barbiero23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pietro
family: Barbiero
- given: Gabriele
family: Ciravegna
- given: Francesco
family: Giannini
- given: Mateo
family: Espinosa Zarlenga
- given: Lucie Charlotte
family: Magister
- given: Alberto
family: Tonda
- given: Pietro
family: Lio
- given: Frederic
family: Precioso
- given: Mateja
family: Jamnik
- given: Giuseppe
family: Marra
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1801-1825
id: barbiero23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1801
lastpage: 1825
published: 2023-07-03 00:00:00 +0000
- title: 'Moccasin: Efficient Tensor Rematerialization for Neural Networks'
abstract: 'The deployment and training of neural networks on edge computing devices pose many challenges. The low memory nature of edge devices is often one of the biggest limiting factors encountered in the deployment of large neural network models. Tensor rematerialization or recompute is a way to address high memory requirements for neural network training and inference. In this paper we consider the problem of execution time minimization of compute graphs subject to a memory budget. In particular, we develop a new constraint programming formulation called Moccasin with only $O(n)$ integer variables, where $n$ is the number of nodes in the compute graph. This is a significant improvement over the works in the recent literature that propose formulations with $O(n^2)$ Boolean variables. We present numerical studies that show that our approach is up to an order of magnitude faster than recent work especially for large-scale graphs.'
volume: 202
URL: https://proceedings.mlr.press/v202/bartan23a.html
PDF: https://proceedings.mlr.press/v202/bartan23a/bartan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bartan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Burak
family: Bartan
- given: Haoming
family: Li
- given: Harris
family: Teague
- given: Christopher
family: Lott
- given: Bistra
family: Dilkina
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1826-1837
id: bartan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1826
lastpage: 1837
published: 2023-07-03 00:00:00 +0000
- title: 'User-level Private Stochastic Convex Optimization with Optimal Rates'
abstract: 'We study the problem of differentially private (DP) stochastic convex optimization (SCO) under the notion of user-level differential privacy. In this problem, there are $n$ users, each contributing $m>1$ samples to the input dataset of the private SCO algorithm, and the notion of indistinguishability embedded in DP is w.r.t. replacing the entire local dataset of any given user. Under smoothness conditions of the loss, we establish the optimal rates for user-level DP-SCO in both the central and local models of DP. In particular, we show, roughly, that the optimal rate is $\frac{1}{\sqrt{nm}}+\frac{\sqrt{d}}{\varepsilon n \sqrt{m}}$ in the central setting and is $\frac{\sqrt{d}}{\varepsilon \sqrt{nm}}$ in the local setting, where $d$ is the dimensionality of the problem and $\varepsilon$ is the privacy parameter. Our algorithms combine new user-level DP mean estimation techniques with carefully designed first-order stochastic optimization methods. For the central DP setting, our optimal rate improves over the rate attained for the same setting in Levy et al. (2021) by $\sqrt{d}$ factor. One of the main ingredients that enabled such an improvement is a novel application of the generalization properties of DP in the context of multi-pass stochastic gradient methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/bassily23a.html
PDF: https://proceedings.mlr.press/v202/bassily23a/bassily23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bassily23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Raef
family: Bassily
- given: Ziteng
family: Sun
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1838-1851
id: bassily23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1838
lastpage: 1851
published: 2023-07-03 00:00:00 +0000
- title: 'A Statistical Perspective on Retrieval-Based Models'
abstract: 'Many modern high-performing machine learning models increasingly rely on scaling up models, e.g., transformer networks. Simultaneously, a parallel line of work aims to improve the model performance by augmenting an input instance with other (labeled) instances during inference. Examples of such augmentations include task-specific prompts and similar examples retrieved from the training data by a nonparametric component. Despite a growing literature showcasing the promise of these retrieval-based models, their theoretical underpinnings %for such models remain under-explored. In this paper, we present a formal treatment of retrieval-based models to characterize their performance via a novel statistical perspective. In particular, we study two broad classes of retrieval-based classification approaches: First, we analyze a local learning framework that employs an explicit local empirical risk minimization based on retrieved examples for each input instance. Interestingly, we show that breaking down the underlying learning task into local sub-tasks enables the model to employ a low complexity parametric component to ensure good overall performance. The second class of retrieval-based approaches we explore learns a global model using kernel methods to directly map an input instance and retrieved examples to a prediction, without explicitly solving a local learning task.'
volume: 202
URL: https://proceedings.mlr.press/v202/basu23a.html
PDF: https://proceedings.mlr.press/v202/basu23a/basu23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-basu23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Soumya
family: Basu
- given: Ankit Singh
family: Rawat
- given: Manzil
family: Zaheer
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1852-1886
id: basu23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1852
lastpage: 1886
published: 2023-07-03 00:00:00 +0000
- title: 'Human-Timescale Adaptation in an Open-Ended Task Space'
abstract: 'Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL). In this work, we demonstrate that training an RL agent at scale leads to a general in-context learning algorithm that can adapt to open-ended novel embodied 3D problems as quickly as humans. In a vast space of held-out environment dynamics, our adaptive agent (AdA) displays on-the-fly hypothesis-driven exploration, efficient exploitation of acquired knowledge, and can successfully be prompted with first-person demonstrations. Adaptation emerges from three ingredients: (1) meta-reinforcement learning across a vast, smooth and diverse task distribution, (2) a policy parameterised as a large-scale attention-based memory architecture, and (3) an effective automated curriculum that prioritises tasks at the frontier of an agent’s capabilities. We demonstrate characteristic scaling laws with respect to network size, memory length, and richness of the training task distribution. We believe our results lay the foundation for increasingly general and adaptive RL agents that perform well across ever-larger open-ended domains.'
volume: 202
URL: https://proceedings.mlr.press/v202/bauer23a.html
PDF: https://proceedings.mlr.press/v202/bauer23a/bauer23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bauer23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jakob
family: Bauer
- given: Kate
family: Baumli
- given: Feryal
family: Behbahani
- given: Avishkar
family: Bhoopchand
- given: Nathalie
family: Bradley-Schmieg
- given: Michael
family: Chang
- given: Natalie
family: Clay
- given: Adrian
family: Collister
- given: Vibhavari
family: Dasagi
- given: Lucy
family: Gonzalez
- given: Karol
family: Gregor
- given: Edward
family: Hughes
- given: Sheleem
family: Kashem
- given: Maria
family: Loks-Thompson
- given: Hannah
family: Openshaw
- given: Jack
family: Parker-Holder
- given: Shreya
family: Pathak
- given: Nicolas
family: Perez-Nieves
- given: Nemanja
family: Rakicevic
- given: Tim
family: Rocktäschel
- given: Yannick
family: Schroecker
- given: Satinder
family: Singh
- given: Jakub
family: Sygnowski
- given: Karl
family: Tuyls
- given: Sarah
family: York
- given: Alexander
family: Zacherl
- given: Lei M
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1887-1935
id: bauer23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1887
lastpage: 1935
published: 2023-07-03 00:00:00 +0000
- title: 'A Kernel Stein Test of Goodness of Fit for Sequential Models'
abstract: 'We propose a goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The KSD is defined by its Stein operator: current operators used in testing apply to fixed-dimensional spaces. As our main contribution, we extend the KSD to the variable-dimension setting by identifying appropriate Stein operators, and propose a novel KSD goodness-of-fit test. As with the previous variants, the proposed KSD does not require the density to be normalized, allowing the evaluation of a large class of models. Our test is shown to perform well in practice on discrete sequential data benchmarks.'
volume: 202
URL: https://proceedings.mlr.press/v202/baum23a.html
PDF: https://proceedings.mlr.press/v202/baum23a/baum23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-baum23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jerome
family: Baum
- given: Heishiro
family: Kanagawa
- given: Arthur
family: Gretton
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1936-1953
id: baum23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1936
lastpage: 1953
published: 2023-07-03 00:00:00 +0000
- title: 'Individually Fair Learning with One-Sided Feedback'
abstract: 'We consider an online learning problem with one-sided feedback, in which the learner is able to observe the true label only for positively predicted instances. On each round, $k$ instances arrive and receive classification outcomes according to a randomized policy deployed by the learner, whose goal is to maximize accuracy while deploying individually fair policies. We first present a novel auditing scheme, capable of utilizing feedback from dynamically-selected panels of multiple, possibly inconsistent, auditors regarding fairness violations. In particular, we show how our proposed auditing scheme allows for algorithmically exploring the resulting accuracy-fairness frontier, with no need for additional feedback from auditors. We then present an efficient reduction from our problem of online learning with one-sided feedback and a panel reporting fairness violations to the contextual combinatorial semi-bandit problem (Cesa-Bianchi & Lugosi, 2009; Gyorgy et al., 2007), allowing us to leverage algorithms for contextual combinatorial semi-bandits to establish multi-criteria no regret guarantees in our setting, simultaneously for accuracy and fairness. Our results eliminate two potential sources of bias from prior work: the “hidden outcomes” that are not available to an algorithm operating in the full information setting, and human biases that might be present in any single human auditor, but can be mitigated by selecting a well-chosen panel.'
volume: 202
URL: https://proceedings.mlr.press/v202/bechavod23a.html
PDF: https://proceedings.mlr.press/v202/bechavod23a/bechavod23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bechavod23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yahav
family: Bechavod
- given: Aaron
family: Roth
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1954-1977
id: bechavod23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1954
lastpage: 1977
published: 2023-07-03 00:00:00 +0000
- title: 'Predicting Ordinary Differential Equations with Transformers'
abstract: 'We develop a transformer-based sequence-to-sequence model that recovers scalar ordinary differential equations (ODEs) in symbolic form from irregularly sampled and noisy observations of a single solution trajectory. We demonstrate in extensive empirical evaluations that our model performs better or on par with existing methods in terms of accurate recovery across various settings. Moreover, our method is efficiently scalable: after one-time pretraining on a large set of ODEs, we can infer the governing law of a new observed solution in a few forward passes of the model.'
volume: 202
URL: https://proceedings.mlr.press/v202/becker23a.html
PDF: https://proceedings.mlr.press/v202/becker23a/becker23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-becker23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sören
family: Becker
- given: Michal
family: Klein
- given: Alexander
family: Neitz
- given: Giambattista
family: Parascandolo
- given: Niki
family: Kilbertus
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 1978-2002
id: becker23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 1978
lastpage: 2002
published: 2023-07-03 00:00:00 +0000
- title: 'Explaining Reinforcement Learning with Shapley Values'
abstract: 'For reinforcement learning systems to be widely adopted, their users must understand and trust them. We present a theoretical analysis of explaining reinforcement learning using Shapley values, following a principled approach from game theory for identifying the contribution of individual players to the outcome of a cooperative game. We call this general framework Shapley Values for Explaining Reinforcement Learning (SVERL). Our analysis exposes the limitations of earlier uses of Shapley values in reinforcement learning. We then develop an approach that uses Shapley values to explain agent performance. In a variety of domains, SVERL produces meaningful explanations that match and supplement human intuition.'
volume: 202
URL: https://proceedings.mlr.press/v202/beechey23a.html
PDF: https://proceedings.mlr.press/v202/beechey23a/beechey23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-beechey23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniel
family: Beechey
- given: Thomas M. S.
family: Smith
- given: Özgür
family: Şimşek
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2003-2014
id: beechey23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2003
lastpage: 2014
published: 2023-07-03 00:00:00 +0000
- title: 'TIDE: Time Derivative Diffusion for Deep Learning on Graphs'
abstract: 'A prominent paradigm for graph neural networks is based on the message-passing framework. In this framework, information communication is realized only between neighboring nodes. The challenge of approaches that use this paradigm is to ensure efficient and accurate long-distance communication between nodes, as deep convolutional networks are prone to over smoothing. In this paper, we present a novel method based on time derivative graph diffusion (TIDE) to overcome these structural limitations of the message-passing framework. Our approach allows for optimizing the spatial extent of diffusion across various tasks and network channels, thus enabling medium and long-distance communication efficiently. Furthermore, we show that our architecture design also enables local message-passing and thus inherits from the capabilities of local message-passing approaches. We show that on both widely used graph benchmarks and synthetic mesh and graph datasets, the proposed framework outperforms state-of-the-art methods by a significant margin.'
volume: 202
URL: https://proceedings.mlr.press/v202/behmanesh23a.html
PDF: https://proceedings.mlr.press/v202/behmanesh23a/behmanesh23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-behmanesh23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Maysam
family: Behmanesh
- given: Maximilian
family: Krahn
- given: Maks
family: Ovsjanikov
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2015-2030
id: behmanesh23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2015
lastpage: 2030
published: 2023-07-03 00:00:00 +0000
- title: 'Fast as CHITA: Neural Network Pruning with Combinatorial Optimization'
abstract: 'The sheer size of modern neural networks makes model serving a serious computational challenge. A popular class of compression techniques overcomes this challenge by pruning or sparsifying the weights of pretrained networks. While useful, these techniques often face serious tradeoffs between computational requirements and compression quality. In this work, we propose a novel optimization-based pruning framework that considers the combined effect of pruning (and updating) multiple weights subject to a sparsity constraint. Our approach, CHITA, extends the classical Optimal Brain Surgeon framework and results in significant improvements in speed, memory, and performance over existing optimization-based approaches for network pruning. CHITA’s main workhorse performs combinatorial optimization updates on a memory-friendly representation of local quadratic approximation(s) of the loss function. On a standard benchmark of pretrained models and datasets, CHITA leads to superior sparsity-accuracy tradeoffs than competing methods. For example, for MLPNet with only 2% of the weights retained, our approach improves the accuracy by 63% relative to the state of the art. Furthermore, when used in conjunction with fine-tuning SGD steps, our method achieves significant accuracy gains over state-of-the-art approaches. Our code is publicly available at: https://github.com/mazumder-lab/CHITA .'
volume: 202
URL: https://proceedings.mlr.press/v202/benbaki23a.html
PDF: https://proceedings.mlr.press/v202/benbaki23a/benbaki23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-benbaki23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Riade
family: Benbaki
- given: Wenyu
family: Chen
- given: Xiang
family: Meng
- given: Hussein
family: Hazimeh
- given: Natalia
family: Ponomareva
- given: Zhe
family: Zhao
- given: Rahul
family: Mazumder
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2031-2049
id: benbaki23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2031
lastpage: 2049
published: 2023-07-03 00:00:00 +0000
- title: 'Continuously Parameterized Mixture Models'
abstract: 'Mixture models are universal approximators of smooth densities but are difficult to utilize in complicated datasets due to restrictions on typically available modes and challenges with initialiations. We show that by continuously parameterizing a mixture of factor analyzers using a learned ordinary differential equation, we can improve the fit of mixture models over direct methods. Once trained, the mixture components can be extracted and the neural ODE can be discarded, leaving us with an effective, but low-resource model. We additionally explore the use of a training curriculum from an easy-to-model latent space extracted from a normalizing flow to the more complex input space and show that the smooth curriculum helps to stabilize and improve results with and without the continuous parameterization. Finally, we introduce a hierarchical version of the model to enable more flexible, robust classification and clustering, and show substantial improvements against traditional parameterizations of GMMs.'
volume: 202
URL: https://proceedings.mlr.press/v202/bender23a.html
PDF: https://proceedings.mlr.press/v202/bender23a/bender23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bender23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christopher M
family: Bender
- given: Yifeng
family: Shi
- given: Marc
family: Niethammer
- given: Junier
family: Oliva
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2050-2062
id: bender23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2050
lastpage: 2062
published: 2023-07-03 00:00:00 +0000
- title: 'Controllable Neural Symbolic Regression'
abstract: 'In symbolic regression, the objective is to find an analytical expression that accurately fits experimental data with the minimal use of mathematical symbols such as operators, variables, and constants. However, the combinatorial space of possible expressions can make it challenging for traditional evolutionary algorithms to find the correct expression in a reasonable amount of time. To address this issue, Neural Symbolic Regression (NSR) algorithms have been developed that can quickly identify patterns in the data and generate analytical expressions. However, these methods, in their current form, lack the capability to incorporate user-defined prior knowledge, which is often required in natural sciences and engineering fields. To overcome this limitation, we propose a novel neural symbolic regression method, named Neural Symbolic Regression with Hypothesis (NSRwH) that enables the explicit incorporation of assumptions about the expected structure of the ground-truth expression into the prediction process. Our experiments demonstrate that the proposed conditioned deep learning model outperforms its unconditioned counterparts in terms of accuracy while also providing control over the predicted expression structure.'
volume: 202
URL: https://proceedings.mlr.press/v202/bendinelli23a.html
PDF: https://proceedings.mlr.press/v202/bendinelli23a/bendinelli23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bendinelli23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tommaso
family: Bendinelli
- given: Luca
family: Biggio
- given: Pierre-Alexandre
family: Kamienny
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2063-2077
id: bendinelli23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2063
lastpage: 2077
published: 2023-07-03 00:00:00 +0000
- title: 'On Second-Order Scoring Rules for Epistemic Uncertainty Quantification'
abstract: 'It is well known that accurate probabilistic predictors can be trained through empirical risk minimisation with proper scoring rules as loss functions. While such learners capture so-called aleatoric uncertainty of predictions, various machine learning methods have recently been developed with the goal to let the learner also represent its epistemic uncertainty, i.e., the uncertainty caused by a lack of knowledge and data. An emerging branch of the literature proposes the use of a second-order learner that provides predictions in terms of distributions on probability distributions. However, recent work has revealed serious theoretical shortcomings for second-order predictors based on loss minimisation. In this paper, we generalise these findings and prove a more fundamental result: There seems to be no loss function that provides an incentive for a second-order learner to faithfully represent its epistemic uncertainty in the same manner as proper scoring rules do for standard (first-order) learners. As a main mathematical tool to prove this result, we introduce the generalised notion of second-order scoring rules.'
volume: 202
URL: https://proceedings.mlr.press/v202/bengs23a.html
PDF: https://proceedings.mlr.press/v202/bengs23a/bengs23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bengs23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Viktor
family: Bengs
- given: Eyke
family: Hüllermeier
- given: Willem
family: Waegeman
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2078-2091
id: bengs23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2078
lastpage: 2091
published: 2023-07-03 00:00:00 +0000
- title: 'Certified Robust Neural Networks: Generalization and Corruption Resistance'
abstract: 'Recent work have demonstrated that robustness (to "corruption") can be at odds with generalization. Adversarial training, for instance, aims to reduce the problematic susceptibility of modern neural networks to small data perturbations. Surprisingly, overfitting is a major concern in adversarial training despite being mostly absent in standard training. We provide here theoretical evidence for this peculiar “robust overfitting” phenomenon. Subsequently, we advance a novel distributionally robust loss function bridging robustness and generalization. We demonstrate both theoretically as well as empirically the loss to enjoy a certified level of robustness against two common types of corruption|data evasion and poisoning attacks|while ensuring guaranteed generalization. We show through careful numerical experiments that our resulting holistic robust (HR) training procedure yields SOTA performance. Finally, we indicate that HR training can be interpreted as a direct extension of adversarial training and comes with a negligible additional computational burden. A ready-to-use python library implementing our algorithm is available at https://github.com/RyanLucas3/HR_Neural_Networks.'
volume: 202
URL: https://proceedings.mlr.press/v202/bennouna23a.html
PDF: https://proceedings.mlr.press/v202/bennouna23a/bennouna23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bennouna23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Amine
family: Bennouna
- given: Ryan
family: Lucas
- given: Bart
family: Van Parys
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2092-2112
id: bennouna23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2092
lastpage: 2112
published: 2023-07-03 00:00:00 +0000
- title: 'Gaussian processes at the Helm(holtz): A more fluid model for ocean currents'
abstract: 'Oceanographers are interested in predicting ocean currents and identifying divergences in a current vector field based on sparse observations of buoy velocities. Since we expect current dynamics to be smooth but highly non-linear, Gaussian processes (GPs) offer an attractive model. But we show that applying a GP with a standard stationary kernel directly to buoy data can struggle at both current prediction and divergence identification – due to some physically unrealistic prior assumptions. To better reflect known physical properties of currents, we propose to instead put a standard stationary kernel on the divergence and curl-free components of a vector field obtained through a Helmholtz decomposition. We show that, because this decomposition relates to the original vector field just via mixed partial derivatives, we can still perform inference given the original data with only a small constant multiple of additional computational expense. We illustrate the benefits of our method on synthetic and real oceans data.'
volume: 202
URL: https://proceedings.mlr.press/v202/berlinghieri23a.html
PDF: https://proceedings.mlr.press/v202/berlinghieri23a/berlinghieri23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-berlinghieri23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Renato
family: Berlinghieri
- given: Brian L.
family: Trippe
- given: David R.
family: Burt
- given: Ryan James
family: Giordano
- given: Kaushik
family: Srinivasan
- given: Tamay
family: Özgökmen
- given: Junfei
family: Xia
- given: Tamara
family: Broderick
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2113-2163
id: berlinghieri23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2113
lastpage: 2163
published: 2023-07-03 00:00:00 +0000
- title: 'Optimal Rates and Efficient Algorithms for Online Bayesian Persuasion'
abstract: 'Bayesian persuasion studies how an informed sender should influence beliefs of rational receivers that take decisions through Bayesian updating of a common prior. We focus on the online Bayesian persuasion framework, in which the sender repeatedly faces one or more receivers with unknown and adversarially selected types. First, we show how to obtain a tight $\tilde O(T^{1/2})$ regret bound in the case in which the sender faces a single receiver and has bandit feedback, improving over the best previously known bound of $\tilde O(T^{4/5})$. Then, we provide the first no-regret guarantees for the multi-receiver setting under bandit feedback. Finally, we show how to design no-regret algorithms with polynomial per-iteration running time by exploiting type reporting, thereby circumventing known complexity results on online Bayesian persuasion. We provide efficient algorithms guaranteeing a $O(T^{1/2})$ regret upper bound both in the single- and multi-receiver scenario when type reporting is allowed.'
volume: 202
URL: https://proceedings.mlr.press/v202/bernasconi23a.html
PDF: https://proceedings.mlr.press/v202/bernasconi23a/bernasconi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bernasconi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Martino
family: Bernasconi
- given: Matteo
family: Castiglioni
- given: Andrea
family: Celli
- given: Alberto
family: Marchesi
- given: Francesco
family: Trovò
- given: Nicola
family: Gatti
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2164-2183
id: bernasconi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2164
lastpage: 2183
published: 2023-07-03 00:00:00 +0000
- title: 'Constrained Phi-Equilibria'
abstract: 'The computational study of equilibria involving constraints on players’ strategies has been largely neglected. However, in real-world applications, players are usually subject to constraints ruling out the feasibility of some of their strategies, such as, e.g., safety requirements and budget caps. Computational studies on constrained versions of the Nash equilibrium have lead to some results under very stringent assumptions, while finding constrained versions of the correlated equilibrium (CE) is still unexplored. In this paper, we introduce and computationally characterize constrained Phi-equilibria—a more general notion than constrained CEs—in normal-form games. We show that computing such equilibria is in general computationally intractable, and also that the set of the equilibria may not be convex, providing a sharp divide with unconstrained CEs. Nevertheless, we provide a polynomial-time algorithm for computing a constrained (approximate) Phi-equilibrium maximizing a given linear function, when either the number of constraints or that of players’ actions is fixed. Moreover, in the special case in which a player’s constraints do not depend on other players’ strategies, we show that an exact, function-maximizing equilibrium can be computed in polynomial time, while one (approximate) equilibrium can be found with an efficient decentralized no-regret learning algorithm.'
volume: 202
URL: https://proceedings.mlr.press/v202/bernasconi23b.html
PDF: https://proceedings.mlr.press/v202/bernasconi23b/bernasconi23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bernasconi23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Martino
family: Bernasconi
- given: Matteo
family: Castiglioni
- given: Alberto
family: Marchesi
- given: Francesco
family: Trovò
- given: Nicola
family: Gatti
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2184-2205
id: bernasconi23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2184
lastpage: 2205
published: 2023-07-03 00:00:00 +0000
- title: 'Differentiable and Transportable Structure Learning'
abstract: 'Directed acyclic graphs (DAGs) encode a lot of information about a particular distribution in their structure. However, compute required to infer these structures is typically super-exponential in the number of variables, as inference requires a sweep of a combinatorially large space of potential structures. That is, until recent advances made it possible to search this space using a differentiable metric, drastically reducing search time. While this technique— named NOTEARS —is widely considered a seminal work in DAG-discovery, it concedes an important property in favour of differentiability: transportability. To be transportable, the structures discovered on one dataset must apply to another dataset from the same domain. We introduce D-Struct which recovers transportability in the discovered structures through a novel architecture and loss function while remaining fully differentiable. Because D-Struct remains differentiable, our method can be easily adopted in existing differentiable architectures, as was previously done with NOTEARS. In our experiments, we empirically validate D-Struct with respect to edge accuracy and structural Hamming distance in a variety of settings.'
volume: 202
URL: https://proceedings.mlr.press/v202/berrevoets23a.html
PDF: https://proceedings.mlr.press/v202/berrevoets23a/berrevoets23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-berrevoets23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jeroen
family: Berrevoets
- given: Nabeel
family: Seedat
- given: Fergus
family: Imrie
- given: Mihaela
family: Van Der Schaar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2206-2233
id: berrevoets23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2206
lastpage: 2233
published: 2023-07-03 00:00:00 +0000
- title: 'Polyhedral Complex Extraction from ReLU Networks using Edge Subdivision'
abstract: 'A neural network consisting of piecewise affine building blocks, such as fully-connected layers and ReLU activations, is itself a piecewise affine function supported on a polyhedral complex. This complex has been previously studied to characterize theoretical properties of neural networks, but, in practice, extracting it remains a challenge due to its high combinatorial complexity. A natural idea described in previous works is to subdivide the regions via intersections with hyperplanes induced by each neuron. However, we argue that this view leads to computational redundancy. Instead of regions, we propose to subdivide edges, leading to a novel method for polyhedral complex extraction. A key to this are sign-vectors, which encode the combinatorial structure of the complex. Our approach allows to use standard tensor operations on a GPU, taking seconds for millions of cells on a consumer grade machine. Motivated by the growing interest in neural shape representation, we use the speed and differentiablility of our method to optimize geometric properties of the complex. The code is available at https://github.com/arturs-berzins/relu_edge_subdivision.'
volume: 202
URL: https://proceedings.mlr.press/v202/berzins23a.html
PDF: https://proceedings.mlr.press/v202/berzins23a/berzins23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-berzins23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arturs
family: Berzins
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2234-2244
id: berzins23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2234
lastpage: 2244
published: 2023-07-03 00:00:00 +0000
- title: 'Robust One-Class Classification with Signed Distance Function using 1-Lipschitz Neural Networks'
abstract: 'We propose a new method, dubbed One Class Signed Distance Function (OCSDF), to perform One Class Classification (OCC) by provably learning the Signed Distance Function (SDF) to the boundary of the support of any distribution. The distance to the support can be interpreted as a normality score, and its approximation using 1-Lipschitz neural networks provides robustness bounds against $l2$ adversarial attacks, an under-explored weakness of deep learning-based OCC algorithms. As a result, OCSDF comes with a new metric, certified AUROC, that can be computed at the same cost as any classical AUROC. We show that OCSDF is competitive against concurrent methods on tabular and image data while being way more robust to adversarial attacks, illustrating its theoretical properties. Finally, as exploratory research perspectives, we theoretically and empirically show how OCSDF connects OCC with image generation and implicit neural surface parametrization.'
volume: 202
URL: https://proceedings.mlr.press/v202/bethune23a.html
PDF: https://proceedings.mlr.press/v202/bethune23a/bethune23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bethune23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Louis
family: Béthune
- given: Paul
family: Novello
- given: Guillaume
family: Coiffier
- given: Thibaut
family: Boissin
- given: Mathieu
family: Serrurier
- given: Quentin
family: Vincenot
- given: Andres
family: Troya-Galvis
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2245-2271
id: bethune23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2245
lastpage: 2271
published: 2023-07-03 00:00:00 +0000
- title: 'Neural Algorithmic Reasoning with Causal Regularisation'
abstract: 'Recent work on neural algorithmic reasoning has investigated the reasoning capabilities of neural networks, effectively demonstrating they can learn to execute classical algorithms on unseen data coming from the train distribution. However, the performance of existing neural reasoners significantly degrades on out-of-distribution (OOD) test data, where inputs have larger sizes. In this work, we make an important observation: there are many different inputs for which an algorithm will perform certain intermediate computations identically. This insight allows us to develop data augmentation procedures that, given an algorithm’s intermediate trajectory, produce inputs for which the target algorithm would have exactly the same next trajectory step. We ensure invariance in the next-step prediction across such inputs, by employing a self-supervised objective derived by our observation, formalised in a causal graph. We prove that the resulting method, which we call Hint-ReLIC, improves the OOD generalisation capabilities of the reasoner. We evaluate our method on the CLRS algorithmic reasoning benchmark, where we show up to 3x improvements on the OOD test data.'
volume: 202
URL: https://proceedings.mlr.press/v202/bevilacqua23a.html
PDF: https://proceedings.mlr.press/v202/bevilacqua23a/bevilacqua23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bevilacqua23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Beatrice
family: Bevilacqua
- given: Kyriacos
family: Nikiforou
- given: Borja
family: Ibarz
- given: Ioana
family: Bica
- given: Michela
family: Paganini
- given: Charles
family: Blundell
- given: Jovana
family: Mitrovic
- given: Petar
family: Veličković
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2272-2288
id: bevilacqua23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2272
lastpage: 2288
published: 2023-07-03 00:00:00 +0000
- title: 'Optimally-weighted Estimators of the Maximum Mean Discrepancy for Likelihood-Free Inference'
abstract: 'Likelihood-free inference methods typically make use of a distance between simulated and real data. A common example is the maximum mean discrepancy (MMD), which has previously been used for approximate Bayesian computation, minimum distance estimation, generalised Bayesian inference, and within the nonparametric learning framework. The MMD is commonly estimated at a root-$m$ rate, where $m$ is the number of simulated samples. This can lead to significant computational challenges since a large $m$ is required to obtain an accurate estimate, which is crucial for parameter estimation. In this paper, we propose a novel estimator for the MMD with significantly improved sample complexity. The estimator is particularly well suited for computationally expensive smooth simulators with low- to mid-dimensional inputs. This claim is supported through both theoretical results and an extensive simulation study on benchmark simulators.'
volume: 202
URL: https://proceedings.mlr.press/v202/bharti23a.html
PDF: https://proceedings.mlr.press/v202/bharti23a/bharti23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bharti23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ayush
family: Bharti
- given: Masha
family: Naslidnyk
- given: Oscar
family: Key
- given: Samuel
family: Kaski
- given: Francois-Xavier
family: Briol
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2289-2312
id: bharti23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2289
lastpage: 2312
published: 2023-07-03 00:00:00 +0000
- title: 'Bandit Online Linear Optimization with Hints and Queries'
abstract: 'We study variants of the online linear optimization (OLO) problem with bandit feedback, where the algorithm has access to external information about the unknown cost vector. Our motivation is the recent body of work on using such “hints” towards improving regret bounds for OLO problems in the full-information setting. Unlike in the full-information OLO setting, with bandit feedback, we first show that one cannot improve the standard regret bounds of $\tilde{O}(\sqrt{T})$ by using hints, even if they are always well-correlated with the cost vector. In contrast, if the algorithm is empowered to issue queries and if all the responses are correct, then we show $O(\log T)$ regret is achievable. We then show how to make this result more robust—when some of the query responses can be adversarial—by using a little feedback on the quality of the responses.'
volume: 202
URL: https://proceedings.mlr.press/v202/bhaskara23a.html
PDF: https://proceedings.mlr.press/v202/bhaskara23a/bhaskara23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bhaskara23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aditya
family: Bhaskara
- given: Ashok
family: Cutkosky
- given: Ravi
family: Kumar
- given: Manish
family: Purohit
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2313-2336
id: bhaskara23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2313
lastpage: 2336
published: 2023-07-03 00:00:00 +0000
- title: 'Improved Online Conformal Prediction via Strongly Adaptive Online Learning'
abstract: 'We study the problem of uncertainty quantification via prediction sets, in an online setting where the data distribution may vary arbitrarily over time. Recent work develops *online conformal prediction* techniques that leverage regret minimization algorithms from the online learning literature to learn prediction sets with approximately valid coverage and small regret. However, standard regret minimization is insufficient for handling changing environments, where performance guarantees may be desired not only over the full time horizon but also in all (sub-)intervals of time. We develop new online conformal prediction methods that minimize the *strongly adaptive regret*, which measures the worst-case regret over all intervals of a fixed length. We prove that our methods achieve near-optimal strongly adaptive regret for all interval lengths simultaneously, and approximately valid coverage. Experiments show that our methods consistently obtain better coverage and smaller prediction sets than existing methods on real-world tasks such as time series forecasting and image classification under distribution shift.'
volume: 202
URL: https://proceedings.mlr.press/v202/bhatnagar23a.html
PDF: https://proceedings.mlr.press/v202/bhatnagar23a/bhatnagar23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bhatnagar23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aadyot
family: Bhatnagar
- given: Huan
family: Wang
- given: Caiming
family: Xiong
- given: Yu
family: Bai
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2337-2363
id: bhatnagar23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2337
lastpage: 2363
published: 2023-07-03 00:00:00 +0000
- title: 'Data-Copying in Generative Models: A Formal Framework'
abstract: 'There has been some recent interest in detecting and addressing memorization of training data by deep neural networks. A formal framework for memorization in generative models, called “data-copying” was proposed by Meehan et. al (2020). We build upon their work to show that their framework may fail to detect certain kinds of blatant memorization. Motivated by this and the theory of non-parametric methods, we provide an alternative definition of data-copying that applies more locally. We provide a method to detect data-copying, and provably show that it works with high probability when enough data is available. We also provide lower bounds that characterize the sample requirement for reliable detection.'
volume: 202
URL: https://proceedings.mlr.press/v202/bhattacharjee23a.html
PDF: https://proceedings.mlr.press/v202/bhattacharjee23a/bhattacharjee23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bhattacharjee23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Robi
family: Bhattacharjee
- given: Sanjoy
family: Dasgupta
- given: Kamalika
family: Chaudhuri
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2364-2396
id: bhattacharjee23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2364
lastpage: 2396
published: 2023-07-03 00:00:00 +0000
- title: 'Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling'
abstract: 'How do large language models (LLMs) develop and evolve over the course of training? How do these patterns change as models scale? To answer these questions, we introduce *Pythia*, a suite of 16 LLMs all trained on public data seen in the exact same order and ranging in size from 70M to 12B parameters. We provide public access to 154 checkpoints for each one of the 16 models, alongside tools to download and reconstruct their exact training dataloaders for further study. We intend *Pythia* to facilitate research in many areas, and we present several case studies including novel results in memorization, term frequency effects on few-shot performance, and reducing gender bias. We demonstrate that this highly controlled setup can be used to yield novel insights toward LLMs and their training dynamics. Trained models, analysis code, training code, and training data can be found at https://github.com/EleutherAI/pythia.'
volume: 202
URL: https://proceedings.mlr.press/v202/biderman23a.html
PDF: https://proceedings.mlr.press/v202/biderman23a/biderman23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-biderman23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Stella
family: Biderman
- given: Hailey
family: Schoelkopf
- given: Quentin Gregory
family: Anthony
- given: Herbie
family: Bradley
- given: Kyle
family: O’Brien
- given: Eric
family: Hallahan
- given: Mohammad Aflah
family: Khan
- given: Shivanshu
family: Purohit
- given: Usvsn Sai
family: Prashanth
- given: Edward
family: Raff
- given: Aviya
family: Skowron
- given: Lintang
family: Sutawika
- given: Oskar
family: Van Der Wal
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2397-2430
id: biderman23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2397
lastpage: 2430
published: 2023-07-03 00:00:00 +0000
- title: 'StriderNet: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes'
abstract: 'Optimization of atomic structures presents a challenging problem, due to their highly rough and non-convex energy landscape, with wide applications in the fields of drug design, materials discovery, and mechanics. Here, we present a graph reinforcement learning approach, StriderNet, that learns a policy to displace the atoms towards low energy configurations. We evaluate the performance of StriderNet on three complex atomic systems, namely, binary Lennard-Jones particles, calcium silicate hydrates gel, and disordered silicon. We show that StriderNet outperforms all classical optimization algorithms and enables the discovery of a lower energy minimum. In addition, StriderNet exhibits a higher rate of reaching minima with energies, as confirmed by the average over multiple realizations. Finally, we show that StriderNet exhibits inductivity to unseen system sizes that are an order of magnitude different from the training system. All the codes and datasets are available at https://github.com/M3RG-IITD/StriderNET.'
volume: 202
URL: https://proceedings.mlr.press/v202/bihani23a.html
PDF: https://proceedings.mlr.press/v202/bihani23a/bihani23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bihani23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vaibhav
family: Bihani
- given: Sahil
family: Manchanda
- given: Srikanth
family: Sastry
- given: Sayan
family: Ranu
- given: N M Anoop
family: Krishnan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2431-2451
id: bihani23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2431
lastpage: 2451
published: 2023-07-03 00:00:00 +0000
- title: 'Modeling Temporal Data as Continuous Functions with Stochastic Process Diffusion'
abstract: 'Temporal data such as time series can be viewed as discretized measurements of the underlying function. To build a generative model for such data we have to model the stochastic process that governs it. We propose a solution by defining the denoising diffusion model in the function space which also allows us to naturally handle irregularly-sampled observations. The forward process gradually adds noise to functions, preserving their continuity, while the learned reverse process removes the noise and returns functions as new samples. To this end, we define suitable noise sources and introduce novel denoising and score-matching models. We show how our method can be used for multivariate probabilistic forecasting and imputation, and how our model can be interpreted as a neural process.'
volume: 202
URL: https://proceedings.mlr.press/v202/bilos23a.html
PDF: https://proceedings.mlr.press/v202/bilos23a/bilos23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bilos23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marin
family: Biloš
- given: Kashif
family: Rasul
- given: Anderson
family: Schneider
- given: Yuriy
family: Nevmyvaka
- given: Stephan
family: Günnemann
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2452-2470
id: bilos23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2452
lastpage: 2470
published: 2023-07-03 00:00:00 +0000
- title: 'In or Out? Fixing ImageNet Out-of-Distribution Detection Evaluation'
abstract: 'Out-of-distribution (OOD) detection is the problem of identifying inputs which are unrelated to the in-distribution task. The OOD detection performance when the in-distribution (ID) is ImageNet-1K is commonly being tested on a small range of test OOD datasets. We find that most of the currently used test OOD datasets, including datasets from the open set recognition (OSR) literature, have severe issues: In some cases more than 50$%$ of the dataset contains objects belonging to one of the ID classes. These erroneous samples heavily distort the evaluation of OOD detectors. As a solution, we introduce with NINCO a novel test OOD dataset, each sample checked to be ID free, which with its fine-grained range of OOD classes allows for a detailed analysis of an OOD detector’s strengths and failure modes, particularly when paired with a number of synthetic “OOD unit-tests”. We provide detailed evaluations across a large set of architectures and OOD detection methods on NINCO and the unit-tests, revealing new insights about model weaknesses and the effects of pretraining on OOD detection performance. We provide code and data at https://github.com/j-cb/NINCO.'
volume: 202
URL: https://proceedings.mlr.press/v202/bitterwolf23a.html
PDF: https://proceedings.mlr.press/v202/bitterwolf23a/bitterwolf23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bitterwolf23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Julian
family: Bitterwolf
- given: Maximilian
family: Müller
- given: Matthias
family: Hein
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2471-2506
id: bitterwolf23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2471
lastpage: 2506
published: 2023-07-03 00:00:00 +0000
- title: 'Invariant Slot Attention: Object Discovery with Slot-Centric Reference Frames'
abstract: 'Automatically discovering composable abstractions from raw perceptual data is a long-standing challenge in machine learning. Recent slot-based neural networks that learn about objects in a self-supervised manner have made exciting progress in this direction. However, they typically fall short at adequately capturing spatial symmetries present in the visual world, which leads to sample inefficiency, such as when entangling object appearance and pose. In this paper, we present a simple yet highly effective method for incorporating spatial symmetries via slot-centric reference frames. We incorporate equivariance to per-object pose transformations into the attention and generation mechanism of Slot Attention by translating, scaling, and rotating position encodings. These changes result in little computational overhead, are easy to implement, and can result in large gains in terms of data efficiency and overall improvements to object discovery. We evaluate our method on a wide range of synthetic object discovery benchmarks namely CLEVR, Tetrominoes, CLEVRTex, Objects Room and MultiShapeNet, and show promising improvements on the challenging real-world Waymo Open dataset.'
volume: 202
URL: https://proceedings.mlr.press/v202/biza23a.html
PDF: https://proceedings.mlr.press/v202/biza23a/biza23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-biza23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ondrej
family: Biza
- given: Sjoerd Van
family: Steenkiste
- given: Mehdi S. M.
family: Sajjadi
- given: Gamaleldin Fathy
family: Elsayed
- given: Aravindh
family: Mahendran
- given: Thomas
family: Kipf
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2507-2527
id: biza23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2507
lastpage: 2527
published: 2023-07-03 00:00:00 +0000
- title: 'Understanding Oversquashing in GNNs through the Lens of Effective Resistance'
abstract: 'Message passing graph neural networks (GNNs) are a popular learning architectures for graph-structured data. However, one problem GNNs experience is oversquashing, where a GNN has difficulty sending information between distant nodes. Understanding and mitigating oversquashing has recently received significant attention from the research community. In this paper, we continue this line of work by analyzing oversquashing through the lens of the *effective resistance* between nodes in the input graph. Effective resistance intuitively captures the “strength” of connection between two nodes by paths in the graph, and has a rich literature spanning many areas of graph theory. We propose to use *total effective resistance* as a bound of the total amount of oversquashing in a graph and provide theoretical justification for its use. We further develop an algorithm to identify edges to be added to an input graph to minimize the total effective resistance, thereby alleviating oversquashing. We provide empirical evidence of the effectiveness of our total effective resistance based rewiring strategies for improving the performance of GNNs.'
volume: 202
URL: https://proceedings.mlr.press/v202/black23a.html
PDF: https://proceedings.mlr.press/v202/black23a/black23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-black23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mitchell
family: Black
- given: Zhengchao
family: Wan
- given: Amir
family: Nayyeri
- given: Yusu
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2528-2547
id: black23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2528
lastpage: 2547
published: 2023-07-03 00:00:00 +0000
- title: 'Unit Scaling: Out-of-the-Box Low-Precision Training'
abstract: 'We present unit scaling, a paradigm for designing deep learning models that simplifies the use of low-precision number formats. Training in FP16 or the recently proposed FP8 formats offers substantial efficiency gains, but can lack sufficient range for out-of-the-box training. Unit scaling addresses this by introducing a principled approach to model numerics: seeking unit variance of all weights, activations and gradients at initialisation. Unlike alternative methods, this approach neither requires multiple training runs to find a suitable scale nor has significant computational overhead. We demonstrate the efficacy of unit scaling across a range of models and optimisers. We further show that existing models can be adapted to be unit-scaled, training BERT-Large in FP16 and then FP8 with no degradation in accuracy.'
volume: 202
URL: https://proceedings.mlr.press/v202/blake23a.html
PDF: https://proceedings.mlr.press/v202/blake23a/blake23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-blake23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Charlie
family: Blake
- given: Douglas
family: Orr
- given: Carlo
family: Luschi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2548-2576
id: blake23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2548
lastpage: 2576
published: 2023-07-03 00:00:00 +0000
- title: 'FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems'
abstract: 'Model-based reinforcement learning is a powerful tool, but collecting data to fit an accurate model of the system can be costly. Exploring an unknown environment in a sample-efficient manner is hence of great importance. However, the complexity of dynamics and the computational limitations of real systems make this task challenging. In this work, we introduce FLEX, an exploration algorithm for nonlinear dynamics based on optimal experimental design. Our policy maximizes the information of the next step and results in an adaptive exploration algorithm, compatible with arbitrary parametric learning models, and requiring minimal computing resources. We test our method on a number of nonlinear environments covering different settings, including time-varying dynamics. Keeping in mind that exploration is intended to serve an exploitation objective, we also test our algorithm on downstream model-based classical control tasks and compare it to other state-of-the-art model-based and model-free approaches. The performance achieved by FLEX is competitive and its computational cost is low.'
volume: 202
URL: https://proceedings.mlr.press/v202/blanke23a.html
PDF: https://proceedings.mlr.press/v202/blanke23a/blanke23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-blanke23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthieu
family: Blanke
- given: Marc
family: Lelarge
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2577-2591
id: blanke23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2577
lastpage: 2591
published: 2023-07-03 00:00:00 +0000
- title: 'Not all Strongly Rayleigh Distributions Have Small Probabilistic Generating Circuits'
abstract: 'Probabilistic modeling is a central task in machine learning. Probabilistic models should be tractable, i.e., allowing tractable probabilistic inference, but also efficient, i.e., being able to represent a large set of probability distributions. Zhang et al. (ICML 2021) recently proposed a new model, probabilistic generating circuits. They raised the question whether every strongly Rayleigh distribution can be efficiently represented by such circuits. We prove that this question has a negative answer, there are strongly Rayleigh distributions that cannot be represented by polynomial-sized probabilistic generating circuits, assuming a widely accepted complexity theoretic conjecture.'
volume: 202
URL: https://proceedings.mlr.press/v202/blaser23a.html
PDF: https://proceedings.mlr.press/v202/blaser23a/blaser23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-blaser23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Markus
family: Bläser
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2592-2602
id: blaser23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2592
lastpage: 2602
published: 2023-07-03 00:00:00 +0000
- title: 'Learning the Dynamics of Sparsely Observed Interacting Systems'
abstract: 'We address the problem of learning the dynamics of an unknown non-parametric system linking a target and a feature time series. The feature time series is measured on a sparse and irregular grid, while we have access to only a few points of the target time series. Once learned, we can use these dynamics to predict values of the target from the previous values of the feature time series. We frame this task as learning the solution map of a controlled differential equation (CDE). By leveraging the rich theory of signatures, we are able to cast this non-linear problem as a high-dimensional linear regression. We provide an oracle bound on the prediction error which exhibits explicit dependencies on the individual-specific sampling schemes. Our theoretical results are illustrated by simulations which show that our method outperforms existing algorithms for recovering the full time series while being computationally cheap. We conclude by demonstrating its potential on real-world epidemiological data.'
volume: 202
URL: https://proceedings.mlr.press/v202/bleistein23a.html
PDF: https://proceedings.mlr.press/v202/bleistein23a/bleistein23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bleistein23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Linus
family: Bleistein
- given: Adeline
family: Fermanian
- given: Anne-Sophie
family: Jannot
- given: Agathe
family: Guilloux
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2603-2640
id: bleistein23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2603
lastpage: 2640
published: 2023-07-03 00:00:00 +0000
- title: 'Subset Selection Based On Multiple Rankings in the Presence of Bias: Effectiveness of Fairness Constraints for Multiwinner Voting Score Functions'
abstract: 'We consider the problem of subset selection where one is given multiple rankings of items and the goal is to select the highest "quality" subset. Score functions from the multiwinner voting literature have been used to aggregate rankings into quality scores for subsets. We study this setting of subset selection problems when, in addition, rankings may contain systemic or unconscious biases toward a group of items. For a general model of input rankings and biases, we show that requiring the selected subset to satisfy group fairness constraints can improve the quality of the selection with respect to unbiased rankings. Importantly, we show that for fairness constraints to be effective, different multiwinner score functions may require a drastically different number of rankings: While for some functions, fairness constraints need an exponential number of rankings to recover a close-to-optimal solution, for others, this dependency is only polynomial. This result relies on a novel notion of "smoothness" of submodular functions in this setting that quantifies how well a function can "correctly" assess the quality of items in the presence of bias. The results in this paper can be used to guide the choice of multiwinner score functions for the subset selection setting considered here; we additionally provide a tool to empirically enable this.'
volume: 202
URL: https://proceedings.mlr.press/v202/boehmer23a.html
PDF: https://proceedings.mlr.press/v202/boehmer23a/boehmer23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-boehmer23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Niclas
family: Boehmer
- given: L. Elisa
family: Celis
- given: Lingxiao
family: Huang
- given: Anay
family: Mehrotra
- given: Nisheeth K.
family: Vishnoi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2641-2688
id: boehmer23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2641
lastpage: 2688
published: 2023-07-03 00:00:00 +0000
- title: 'Properties of the Mallows Model Depending on the Number of Alternatives: A Warning for an Experimentalist'
abstract: 'The Mallows model is a popular distribution for ranked data. We empirically and theoretically analyze how the properties of rankings sampled from the Mallows model change when increasing the number of alternatives. We find that real-world data behaves differently from the Mallows model, yet is in line with its recent variant proposed by Boehmer et al. [IJCAI ’21]. As part of our study, we issue several warnings about using the classic Mallows model. For instance, we find that one should be extremely careful when using the Mallows model to generate data for experiments with a varying number of alternatives, as observed trends in such experiments might be due to the changing nature of the generated data.'
volume: 202
URL: https://proceedings.mlr.press/v202/boehmer23b.html
PDF: https://proceedings.mlr.press/v202/boehmer23b/boehmer23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-boehmer23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Niclas
family: Boehmer
- given: Piotr
family: Faliszewski
- given: Sonja
family: Kraiczy
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2689-2711
id: boehmer23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2689
lastpage: 2711
published: 2023-07-03 00:00:00 +0000
- title: 'A Robust Optimisation Perspective on Counterexample-Guided Repair of Neural Networks'
abstract: 'Counterexample-guided repair aims at creating neural networks with mathematical safety guarantees, facilitating the application of neural networks in safety-critical domains. However, whether counterexample-guided repair is guaranteed to terminate remains an open question. We approach this question by showing that counterexample-guided repair can be viewed as a robust optimisation algorithm. While termination guarantees for neural network repair itself remain beyond our reach, we prove termination for more restrained machine learning models and disprove termination in a general setting. We empirically study the practical implications of our theoretical results, demonstrating the suitability of common verifiers and falsifiers for repair despite a disadvantageous theoretical result. Additionally, we use our theoretical insights to devise a novel algorithm for repairing linear regression models based on quadratic programming, surpassing existing approaches.'
volume: 202
URL: https://proceedings.mlr.press/v202/boetius23a.html
PDF: https://proceedings.mlr.press/v202/boetius23a/boetius23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-boetius23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Boetius
- given: Stefan
family: Leue
- given: Tobias
family: Sutter
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2712-2737
id: boetius23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2712
lastpage: 2737
published: 2023-07-03 00:00:00 +0000
- title: 'Beyond the Universal Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels'
abstract: 'Machine learning models are vulnerable to adversarial perturbations, and a thought-provoking paper by Bubeck and Sellke has analyzed this phenomenon through the lens of over-parameterization: interpolating smoothly the data requires significantly more parameters than simply memorizing it. However, this "universal" law provides only a necessary condition for robustness, and it is unable to discriminate between models. In this paper, we address these gaps by focusing on empirical risk minimization in two prototypical settings, namely, random features and the neural tangent kernel (NTK). We prove that, for random features, the model is not robust for any degree of over-parameterization, even when the necessary condition coming from the universal law of robustness is satisfied. In contrast, for even activations, the NTK model meets the universal lower bound, and it is robust as soon as the necessary condition on over-parameterization is fulfilled. This also addresses a conjecture in prior work by Bubeck, Li and Nagaraj. Our analysis decouples the effect of the kernel of the model from an "interaction matrix", which describes the interaction with the test data and captures the effect of the activation. Our theoretical results are corroborated by numerical evidence on both synthetic and standard datasets (MNIST, CIFAR-10).'
volume: 202
URL: https://proceedings.mlr.press/v202/bombari23a.html
PDF: https://proceedings.mlr.press/v202/bombari23a/bombari23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bombari23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Simone
family: Bombari
- given: Shayan
family: Kiyani
- given: Marco
family: Mondelli
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2738-2776
id: bombari23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2738
lastpage: 2776
published: 2023-07-03 00:00:00 +0000
- title: 'Sliced-Wasserstein on Symmetric Positive Definite Matrices for M/EEG Signals'
abstract: 'When dealing with electro or magnetoencephalography records, many supervised prediction tasks are solved by working with covariance matrices to summarize the signals. Learning with these matrices requires the usage of Riemanian geometry to account for their structure. In this paper, we propose a new method to deal with distributions of covariance matrices, and demonstrate its computational efficiency on M/EEG multivariate time series. More specifically, we define a Sliced-Wasserstein distance between measures of symmetric positive definite matrices that comes with strong theoretical guarantees. Then, we take advantage of its properties and kernel methods to apply this discrepancy to brain-age prediction from MEG data, and compare it to state-of-the-art algorithms based on Riemannian geometry. Finally, we show that it is an efficient surrogate to the Wasserstein distance in domain adaptation for Brain Computer Interface applications.'
volume: 202
URL: https://proceedings.mlr.press/v202/bonet23a.html
PDF: https://proceedings.mlr.press/v202/bonet23a/bonet23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bonet23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Clément
family: Bonet
- given: Benoı̂t
family: Malézieux
- given: Alain
family: Rakotomamonjy
- given: Lucas
family: Drumetz
- given: Thomas
family: Moreau
- given: Matthieu
family: Kowalski
- given: Nicolas
family: Courty
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2777-2805
id: bonet23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2777
lastpage: 2805
published: 2023-07-03 00:00:00 +0000
- title: 'Spherical Fourier Neural Operators: Learning Stable Dynamics on the Sphere'
abstract: 'Fourier Neural Operators (FNOs) have proven to be an efficient and effective method for resolution-independent operator learning in a broad variety of application areas across scientific machine learning. A key reason for their success is their ability to accurately model long-range dependencies in spatio-temporal data by learning global convolutions in a computationally efficient manner. To this end, FNOs rely on the discrete Fourier transform (DFT), however, DFTs cause visual and spectral artifacts as well as pronounced dissipation when learning operators in spherical coordinates by incorrectly assuming flat geometry. To overcome this limitation, we generalize FNOs on the sphere, introducing Spherical FNOs (SFNOs) for learning operators on spherical geometries. We apply SFNOs to forecasting atmo- spheric dynamics, and demonstrate stable autoregressive rollouts for a year of simulated time (1,460 steps), while retaining physically plausible dynamics. The SFNO has important implications for machine learning-based simulation of climate dynamics that could eventually help accelerate our response to climate change.'
volume: 202
URL: https://proceedings.mlr.press/v202/bonev23a.html
PDF: https://proceedings.mlr.press/v202/bonev23a/bonev23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bonev23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Boris
family: Bonev
- given: Thorsten
family: Kurth
- given: Christian
family: Hundt
- given: Jaideep
family: Pathak
- given: Maximilian
family: Baust
- given: Karthik
family: Kashinath
- given: Anima
family: Anandkumar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2806-2823
id: bonev23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2806
lastpage: 2823
published: 2023-07-03 00:00:00 +0000
- title: 'The Regret of Exploration and the Control of Bad Episodes in Reinforcement Learning'
abstract: 'The first contribution of this paper is the introduction of a new performance measure of a RL algorithm that is more discriminating than the regret, that we call the *regret of exploration* that measures the asymptotic cost of exploration. The second contribution is a new *performance test* (PT) to end episodes in RL optimistic algorithms. This test is based on the performance of the current policy with respect to the best policy over the current confidence set. This is in contrast with all existing RL algorithms whose episode lengths are only based on the number of visits to the states. This modification does not harm the regret and brings an additional property. We show that while all current episodic RL algorithms have a linear regret of exploration, our method has a $O(\log{T})$ regret of exploration for non-degenerate deterministic MDPs.'
volume: 202
URL: https://proceedings.mlr.press/v202/boone23a.html
PDF: https://proceedings.mlr.press/v202/boone23a/boone23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-boone23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Victor
family: Boone
- given: Bruno
family: Gaujal
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2824-2856
id: boone23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2824
lastpage: 2856
published: 2023-07-03 00:00:00 +0000
- title: 'Model-agnostic Measure of Generalization Difficulty'
abstract: 'The measure of a machine learning algorithm is the difficulty of the tasks it can perform, and sufficiently difficult tasks are critical drivers of strong machine learning models. However, quantifying the generalization difficulty of machine learning benchmarks has remained challenging. We propose what is to our knowledge the first model-agnostic measure of the inherent generalization difficulty of tasks. Our inductive bias complexity measure quantifies the total information required to generalize well on a task minus the information provided by the data. It does so by measuring the fractional volume occupied by hypotheses that generalize on a task given that they fit the training data. It scales exponentially with the intrinsic dimensionality of the space over which the model must generalize but only polynomially in resolution per dimension, showing that tasks which require generalizing over many dimensions are drastically more difficult than tasks involving more detail in fewer dimensions. Our measure can be applied to compute and compare supervised learning, reinforcement learning and meta-learning generalization difficulties against each other. We show that applied empirically, it formally quantifies intuitively expected trends, e.g. that in terms of required inductive bias, MNIST $<$ CIFAR10 $<$ Imagenet and fully observable Markov decision processes (MDPs) $<$ partially observable MDPs. Further, we show that classification of complex images $<$ few-shot meta-learning with simple images. Our measure provides a quantitative metric to guide the construction of more complex tasks requiring greater inductive bias, and thereby encourages the development of more sophisticated architectures and learning algorithms with more powerful generalization capabilities.'
volume: 202
URL: https://proceedings.mlr.press/v202/boopathy23a.html
PDF: https://proceedings.mlr.press/v202/boopathy23a/boopathy23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-boopathy23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Akhilan
family: Boopathy
- given: Kevin
family: Liu
- given: Jaedong
family: Hwang
- given: Shu
family: Ge
- given: Asaad
family: Mohammedsaleh
- given: Ila R
family: Fiete
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2857-2884
id: boopathy23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2857
lastpage: 2884
published: 2023-07-03 00:00:00 +0000
- title: 'Returning The Favour: When Regression Benefits From Probabilistic Causal Knowledge'
abstract: 'A directed acyclic graph (DAG) provides valuable prior knowledge that is often discarded in regression tasks in machine learning. We show that the independences arising from the presence of collider structures in DAGs provide meaningful inductive biases, which constrain the regression hypothesis space and improve predictive performance. We introduce collider regression, a framework to incorporate probabilistic causal knowledge from a collider in a regression problem. When the hypothesis space is a reproducing kernel Hilbert space, we prove a strictly positive generalisation benefit under mild assumptions and provide closed-form estimators of the empirical risk minimiser. Experiments on synthetic and climate model data demonstrate performance gains of the proposed methodology.'
volume: 202
URL: https://proceedings.mlr.press/v202/bouabid23a.html
PDF: https://proceedings.mlr.press/v202/bouabid23a/bouabid23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bouabid23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shahine
family: Bouabid
- given: Jake
family: Fawkes
- given: Dino
family: Sejdinovic
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2885-2913
id: bouabid23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2885
lastpage: 2913
published: 2023-07-03 00:00:00 +0000
- title: 'In Search for a Generalizable Method for Source Free Domain Adaptation'
abstract: 'Source-free domain adaptation (SFDA) is compelling because it allows adapting an off-the-shelf model to a new domain using only unlabelled data. In this work, we apply existing SFDA techniques to a challenging set of naturally-occurring distribution shifts in bioacoustics, which are very different from the ones commonly studied in computer vision. We find existing methods perform differently relative to each other than observed in vision benchmarks, and sometimes perform worse than no adaptation at all. We propose a new simple method which outperforms the existing methods on our new shifts while exhibiting strong performance on a range of vision datasets. Our findings suggest that existing SFDA methods are not as generalizable as previously thought and that considering diverse modalities can be a useful avenue for designing more robust models.'
volume: 202
URL: https://proceedings.mlr.press/v202/boudiaf23a.html
PDF: https://proceedings.mlr.press/v202/boudiaf23a/boudiaf23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-boudiaf23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Malik
family: Boudiaf
- given: Tom
family: Denton
- given: Bart
family: Van Merrienboer
- given: Vincent
family: Dumoulin
- given: Eleni
family: Triantafillou
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2914-2931
id: boudiaf23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2914
lastpage: 2931
published: 2023-07-03 00:00:00 +0000
- title: 'Quantum Speedups for Zero-Sum Games via Improved Dynamic Gibbs Sampling'
abstract: 'We give a quantum algorithm for computing an $\epsilon$-approximate Nash equilibrium of a zero-sum game in a $m \times n$ payoff matrix with bounded entries. Given a standard quantum oracle for accessing the payoff matrix our algorithm runs in time $\widetilde{O}(\sqrt{m + n}\cdot \epsilon^{-2.5} + \epsilon^{-3})$ and outputs a classical representation of the $\epsilon$-approximate Nash equilibrium. This improves upon the best prior quantum runtime of $\widetilde{O}(\sqrt{m + n} \cdot \epsilon^{-3})$ obtained by [van Apeldoorn, Gilyen ’19] and the classical $\widetilde{O}((m + n) \cdot \epsilon^{-2})$ runtime due to [Grigoradis, Khachiyan ’95] whenever $\epsilon = \Omega((m +n)^{-1})$. We obtain this result by designing new quantum data structures for efficiently sampling from a slowly-changing Gibbs distribution.'
volume: 202
URL: https://proceedings.mlr.press/v202/bouland23a.html
PDF: https://proceedings.mlr.press/v202/bouland23a/bouland23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bouland23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Adam
family: Bouland
- given: Yosheb M
family: Getachew
- given: Yujia
family: Jin
- given: Aaron
family: Sidford
- given: Kevin
family: Tian
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2932-2952
id: bouland23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2932
lastpage: 2952
published: 2023-07-03 00:00:00 +0000
- title: 'Diffusion Models as Artists: Are we Closing the Gap between Humans and Machines?'
abstract: 'An important milestone for AI is the development of algorithms that can produce drawings that are indistinguishable from those of humans. Here, we adapt the ”diversity vs. recognizability” scoring framework from Boutin et al (2022) and find that one-shot diffusion models have indeed started to close the gap between humans and machines. However, using a finer-grained measure of the originality of individual samples, we show that strengthening the guidance of diffusion models helps improve the humanness of their drawings, but they still fall short of approximating the originality and recognizability of human drawings. Comparing human category diagnostic features, collected through an online psychophysics experiment, against those derived from diffusion models reveals that humans rely on fewer and more localized features. Overall, our study suggests that diffusion models have significantly helped improve the quality of machine-generated drawings; however, a gap between humans and machines remains – in part explainable by discrepancies in visual strategies.'
volume: 202
URL: https://proceedings.mlr.press/v202/boutin23a.html
PDF: https://proceedings.mlr.press/v202/boutin23a/boutin23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-boutin23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Victor
family: Boutin
- given: Thomas
family: Fel
- given: Lakshya
family: Singhal
- given: Rishav
family: Mukherji
- given: Akash
family: Nagaraj
- given: Julien
family: Colin
- given: Thomas
family: Serre
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 2953-3002
id: boutin23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 2953
lastpage: 3002
published: 2023-07-03 00:00:00 +0000
- title: 'Settling the Reward Hypothesis'
abstract: 'The *reward hypothesis* posits that, "all of what we mean by goals and purposes can be well thought of as maximization of the expected value of the cumulative sum of a received scalar signal (reward)." We aim to fully settle this hypothesis. This will not conclude with a simple affirmation or refutation, but rather specify completely the implicit requirements on goals and purposes under which the hypothesis holds.'
volume: 202
URL: https://proceedings.mlr.press/v202/bowling23a.html
PDF: https://proceedings.mlr.press/v202/bowling23a/bowling23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bowling23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Michael
family: Bowling
- given: John D
family: Martin
- given: David
family: Abel
- given: Will
family: Dabney
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3003-3020
id: bowling23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3003
lastpage: 3020
published: 2023-07-03 00:00:00 +0000
- title: 'ILLUME: Rationalizing Vision-Language Models through Human Interactions'
abstract: 'Bootstrapping from pre-trained language models has been proven to be an efficient approach for building vision-language models (VLM) for tasks such as image captioning or visual question answering. However, outputs of these models rarely align with user’s rationales for specific answers. In order to improve this alignment and reinforce commonsense reasons, we propose a tuning paradigm based on human interactions with machine-generated data. Our ILLUME executes the following loop: Given an image-question-answer prompt, the VLM samples multiple candidate rationales, and a human critic provides feedback via preference selection, used for fine-tuning. This loop increases the training data and gradually carves out the VLM’s rationalization capabilities that are aligned with human intent. Our exhaustive experiments demonstrate that ILLUME is competitive with standard supervised finetuning while using significantly fewer training data and only requiring minimal feedback.'
volume: 202
URL: https://proceedings.mlr.press/v202/brack23a.html
PDF: https://proceedings.mlr.press/v202/brack23a/brack23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-brack23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Manuel
family: Brack
- given: Patrick
family: Schramowski
- given: Björn
family: Deiseroth
- given: Kristian
family: Kersting
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3021-3037
id: brack23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3021
lastpage: 3037
published: 2023-07-03 00:00:00 +0000
- title: 'Provably Learning Object-Centric Representations'
abstract: 'Learning structured representations of the visual world in terms of objects promises to significantly improve the generalization abilities of current machine learning models. While recent efforts to this end have shown promising empirical progress, a theoretical account of when unsupervised object-centric representation learning is possible is still lacking. Consequently, understanding the reasons for the success of existing object-centric methods as well as designing new theoretically grounded methods remains challenging. In the present work, we analyze when object-centric representations can provably be learned without supervision. To this end, we first introduce two assumptions on the generative process for scenes comprised of several objects, which we call compositionality and irreducibility. Under this generative process, we prove that the ground-truth object representations can be identified by an invertible and compositional inference model, even in the presence of dependencies between objects. We empirically validate our results through experiments on synthetic data. Finally, we provide evidence that our theory holds predictive power for existing object-centric models by showing a close correspondence between models’ compositionality and invertibility and their empirical identifiability.'
volume: 202
URL: https://proceedings.mlr.press/v202/brady23a.html
PDF: https://proceedings.mlr.press/v202/brady23a/brady23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-brady23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jack
family: Brady
- given: Roland S.
family: Zimmermann
- given: Yash
family: Sharma
- given: Bernhard
family: Schölkopf
- given: Julius
family: Von Kügelgen
- given: Wieland
family: Brendel
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3038-3062
id: brady23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3038
lastpage: 3062
published: 2023-07-03 00:00:00 +0000
- title: 'Quantifying Human Priors over Social and Navigation Networks'
abstract: 'Human knowledge is largely implicit and relational — do we have a friend in common? can I walk from here to there? In this work, we leverage the combinatorial structure of graphs to quantify human priors over such relational data. Our experiments focus on two domains that have been continuously relevant over evolutionary timescales: social interaction and spatial navigation. We find that some features of the inferred priors are remarkably consistent, such as the tendency for sparsity as a function of graph size. Other features are domain-specific, such as the propensity for triadic closure in social interactions. More broadly, our work demonstrates how nonclassical statistical analysis of indirect behavioral experiments can be used to efficiently model latent biases in the data.'
volume: 202
URL: https://proceedings.mlr.press/v202/bravo-hermsdorff23a.html
PDF: https://proceedings.mlr.press/v202/bravo-hermsdorff23a/bravo-hermsdorff23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bravo-hermsdorff23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gecia
family: Bravo-Hermsdorff
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3063-3105
id: bravo-hermsdorff23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3063
lastpage: 3105
published: 2023-07-03 00:00:00 +0000
- title: 'Critical Points and Convergence Analysis of Generative Deep Linear Networks Trained with Bures-Wasserstein Loss'
abstract: 'We consider a deep matrix factorization model of covariance matrices trained with the Bures-Wasserstein distance. While recent works have made advances in the study of the optimization problem for overparametrized low-rank matrix approximation, much emphasis has been placed on discriminative settings and the square loss. In contrast, our model considers another type of loss and connects with the generative setting. We characterize the critical points and minimizers of the Bures-Wasserstein distance over the space of rank-bounded matrices. The Hessian of this loss at low-rank matrices can theoretically blow up, which creates challenges to analyze convergence of gradient optimization methods. We establish convergence results for gradient flow using a smooth perturbative version of the loss as well as convergence results for finite step size gradient descent under certain assumptions on the initial weights.'
volume: 202
URL: https://proceedings.mlr.press/v202/brechet23a.html
PDF: https://proceedings.mlr.press/v202/brechet23a/brechet23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-brechet23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Pierre
family: Bréchet
- given: Katerina
family: Papagiannouli
- given: Jing
family: An
- given: Guido
family: Montufar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3106-3147
id: brechet23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3106
lastpage: 3147
published: 2023-07-03 00:00:00 +0000
- title: 'Emergence of Sparse Representations from Noise'
abstract: 'A hallmark of biological neural networks, which distinguishes them from their artificial counterparts, is the high degree of sparsity in their activations. This discrepancy raises three questions our work helps to answer: (i) Why are biological networks so sparse? (ii) What are the benefits of this sparsity? (iii) How can these benefits be utilized by deep learning models? Our answers to all of these questions center around training networks to handle random noise. Surprisingly, we discover that noisy training introduces three implicit loss terms that result in sparsely firing neurons specializing to high variance features of the dataset. When trained to reconstruct noisy-CIFAR10, neurons learn biological receptive fields. More broadly, noisy training presents a new approach to potentially increase model interpretability with additional benefits to robustness and computational efficiency.'
volume: 202
URL: https://proceedings.mlr.press/v202/bricken23a.html
PDF: https://proceedings.mlr.press/v202/bricken23a/bricken23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bricken23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Trenton
family: Bricken
- given: Rylan
family: Schaeffer
- given: Bruno
family: Olshausen
- given: Gabriel
family: Kreiman
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3148-3191
id: bricken23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3148
lastpage: 3191
published: 2023-07-03 00:00:00 +0000
- title: 'Differentially Private Optimization on Large Model at Small Cost'
abstract: 'Differentially private (DP) optimization is the standard paradigm to learn large neural networks that are accurate and privacy-preserving. The computational cost for DP deep learning, however, is notoriously heavy due to the per-sample gradient clipping. Existing DP implementations are 2$\sim$1000$\times$ more costly in time and space complexity than the standard (non-private) training. In this work, we develop a novel Book-Keeping (BK) technique that implements existing DP optimizers (thus achieving the same accuracy), with a substantial improvement on the computational cost. Specifically, BK enables DP training on large models and high dimensional data to be roughly as fast and memory-saving as the standard training, whereas previous DP algorithms can be inefficient or incapable of training due to memory error. The computational advantage of BK is supported by the complexity analysis as well as extensive experiments on vision and language tasks. Our implementation achieves state-of-the-art (SOTA) accuracy with very small extra cost: on GPT2 and at almost the same memory cost ($<$1% overhead), BK has 1.03$\times$ the time complexity of the standard training (0.83$\times$ training speed in practice), and 0.61$\times$ the time complexity of the most efficient DP implementation (1.36$\times$ training speed in practice). We open-source the codebase for the BK algorithm at https://github.com/awslabs/fast-differential-privacy.'
volume: 202
URL: https://proceedings.mlr.press/v202/bu23a.html
PDF: https://proceedings.mlr.press/v202/bu23a/bu23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bu23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhiqi
family: Bu
- given: Yu-Xiang
family: Wang
- given: Sheng
family: Zha
- given: George
family: Karypis
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3192-3218
id: bu23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3192
lastpage: 3218
published: 2023-07-03 00:00:00 +0000
- title: 'Machine Learning Force Fields with Data Cost Aware Training'
abstract: 'Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation, which finds widespread applications in chemistry and biomedical research. Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as $O(n^3)$ to $O(n^7)$, with $n$ proportional to the number of basis functions. To address this issue, we propose a multi-stage computational framework – ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data. The motivation behind ASTEROID is that inaccurate data, though incurring large bias, can help capture the sophisticated structures of the underlying force field. Therefore, we first train a MLFF model on a large amount of inaccurate training data, employing a bias-aware loss function to prevent the model from overfitting the potential bias of this data. We then fine-tune the obtained model using a small amount of accurate training data, which preserves the knowledge learned from the inaccurate training data while significantly improving the model’s accuracy. Moreover, we propose a variant of ASTEROID based on score matching for the setting where the inaccurate training data are unlabeled. Extensive experiments on MD datasets and downstream tasks validate the efficacy of ASTEROID. Our code and data are available at https://github.com/abukharin3/asteroid.'
volume: 202
URL: https://proceedings.mlr.press/v202/bukharin23a.html
PDF: https://proceedings.mlr.press/v202/bukharin23a/bukharin23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-bukharin23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexander
family: Bukharin
- given: Tianyi
family: Liu
- given: Shengjie
family: Wang
- given: Simiao
family: Zuo
- given: Weihao
family: Gao
- given: Wen
family: Yan
- given: Tuo
family: Zhao
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3219-3232
id: bukharin23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3219
lastpage: 3232
published: 2023-07-03 00:00:00 +0000
- title: 'Label differential privacy and private training data release'
abstract: 'We study differentially private mechanisms for sharing training data in machine learning settings. Our goal is to enable learning of an accurate predictive model while protecting the privacy of each user’s label. Previous work established privacy guarantees that assumed the features are public and given exogenously, a setting known as label differential privacy. In some scenarios, this can be a strong assumption that removes the interplay between features and labels from the privacy analysis. We relax this approach and instead assume the features are drawn from a distribution that depends on the private labels. We first show that simply adding noise to the label, as in previous work, can lead to an arbitrarily weak privacy guarantee, and also present methods for estimating this privacy loss from data. We then present a new mechanism that replaces some training examples with synthetically generated data, and show that our mechanism has a much better privacy-utility tradeoff if the synthetic data is ‘realistic’, in a certain quantifiable sense. Finally, we empirically validate our theoretical analysis.'
volume: 202
URL: https://proceedings.mlr.press/v202/busa-fekete23a.html
PDF: https://proceedings.mlr.press/v202/busa-fekete23a/busa-fekete23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-busa-fekete23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Robert Istvan
family: Busa-Fekete
- given: Andres
family: Munoz Medina
- given: Umar
family: Syed
- given: Sergei
family: Vassilvitskii
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3233-3251
id: busa-fekete23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3233
lastpage: 3251
published: 2023-07-03 00:00:00 +0000
- title: 'The SSL Interplay: Augmentations, Inductive Bias, and Generalization'
abstract: 'Self-supervised learning (SSL) has emerged as a powerful framework to learn representations from raw data without supervision. Yet in practice, engineers face issues such as instability in tuning optimizers and collapse of representations during training. Such challenges motivate the need for a theory to shed light on the complex interplay between the choice of data augmentation, network architecture, and training algorithm. % on the resulting performance in downstream tasks. We study such an interplay with a precise analysis of generalization performance on both pretraining and downstream tasks in kernel regimes, and highlight several insights for SSL practitioners that arise from our theory.'
volume: 202
URL: https://proceedings.mlr.press/v202/cabannes23a.html
PDF: https://proceedings.mlr.press/v202/cabannes23a/cabannes23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cabannes23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vivien
family: Cabannes
- given: Bobak
family: Kiani
- given: Randall
family: Balestriero
- given: Yann
family: Lecun
- given: Alberto
family: Bietti
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3252-3298
id: cabannes23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3252
lastpage: 3298
published: 2023-07-03 00:00:00 +0000
- title: 'Online Mechanism Design for Information Acquisition'
abstract: 'We study the problem of designing mechanisms for information acquisition scenarios. This setting models strategic interactions between a uniformed receiver and a set of informed senders. In our model the senders receive information about the underlying state of nature and communicate their observation (either truthfully or not) to the receiver, which, based on this information, selects an action. Our goal is to design mechanisms maximizing the receiver’s utility while incentivizing the senders to report truthfully their information. First, we provide an algorithm that efficiently computes an optimal incentive compatible (IC) mechanism. Then, we focus on the online problem in which the receiver sequentially interacts in an unknown game, with the objective of minimizing the cumulative regret w.r.t. the optimal IC mechanism, and the cumulative violation of the incentive compatibility constraints. We investigate two different online scenarios, i.e., the full and bandit feedback settings. For the full feedback problem, we propose an algorithm that guarantees $\tilde{O}(\sqrt{T})$ regret and violation, while for the bandit feedback setting we present an algorithm that attains $\tilde{O}(T^{\alpha})$ regret and $\tilde{O}(T^{1-\alpha/2})$ violation for any $\alpha \in [1/2, 1]$. Finally, we complement our results providing a tight lower bound.'
volume: 202
URL: https://proceedings.mlr.press/v202/cacciamani23a.html
PDF: https://proceedings.mlr.press/v202/cacciamani23a/cacciamani23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cacciamani23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Federico
family: Cacciamani
- given: Matteo
family: Castiglioni
- given: Nicola
family: Gatti
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3299-3326
id: cacciamani23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3299
lastpage: 3326
published: 2023-07-03 00:00:00 +0000
- title: 'MyoDex: A Generalizable Prior for Dexterous Manipulation'
abstract: 'Human dexterity is a hallmark of motor control behaviors. Our hands can rapidly synthesize new behaviors despite the complexity (multi-articular and multi-joints, with 23 joints controlled by more than 40 muscles) of mosculoskeletal control. In this work, we take inspiration from how human dexterity builds on a diversity of prior experiences, instead of being acquired through a single task. Motivated by this observation, we set out to develop agents that can build upon previous experience to quickly acquire new (previously unattainable) behaviors. Specifically, our approach leverages multi-task learning to implicitly capture a task-agnostic behavioral priors (MyoDex) for human-like dexterity, using a physiologically realistic human hand model – MyoHand. We demonstrate MyoDex’s effectiveness in few-shot generalization as well as positive transfer to a large repertoire of unseen dexterous manipulation tasks. MyoDex can solve approximately 3x more tasks and it can accelerate the achievement of solutions by about 4x in comparison to a distillation baseline. While prior work has synthesized single musculoskeletal control behaviors, MyoDex is the first generalizable manipulation prior that catalyzes the learning of dexterous physiological control across a large variety of contact-rich behaviors.'
volume: 202
URL: https://proceedings.mlr.press/v202/caggiano23a.html
PDF: https://proceedings.mlr.press/v202/caggiano23a/caggiano23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-caggiano23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vittorio
family: Caggiano
- given: Sudeep
family: Dasari
- given: Vikash
family: Kumar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3327-3346
id: caggiano23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3327
lastpage: 3346
published: 2023-07-03 00:00:00 +0000
- title: 'What Can Be Learnt With Wide Convolutional Neural Networks?'
abstract: 'Understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the local and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performance, e.g., the rate of decay of the generalisation error with the number of training samples. In this paper, we study infinitely-wide deep CNNs in the kernel regime. First, we show that the spectrum of the corresponding kernel inherits the hierarchical structure of the network, and we characterise its asymptotics. Then, we use this result together with generalisation bounds to prove that deep CNNs adapt to the spatial scale of the target function. In particular, we find that if the target function depends on low-dimensional subsets of adjacent input variables, then the decay of the error is controlled by the effective dimensionality of these subsets. Conversely, if the target function depends on the full set of input variables, then the error decay is controlled by the input dimension. We conclude by computing the generalisation error of a deep CNN trained on the output of another deep CNN with randomly-initialised parameters. Interestingly, we find that, despite their hierarchical structure, the functions generated by infinitely-wide deep CNNs are too rich to be efficiently learnable in high dimension.'
volume: 202
URL: https://proceedings.mlr.press/v202/cagnetta23a.html
PDF: https://proceedings.mlr.press/v202/cagnetta23a/cagnetta23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cagnetta23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Francesco
family: Cagnetta
- given: Alessandro
family: Favero
- given: Matthieu
family: Wyart
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3347-3379
id: cagnetta23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3347
lastpage: 3379
published: 2023-07-03 00:00:00 +0000
- title: 'Causal Discovery with Latent Confounders Based on Higher-Order Cumulants'
abstract: 'Causal discovery with latent confounders is an important but challenging task in many scientific areas. Despite the success of some overcomplete independent component analysis (OICA) based methods in certain domains, they are computationally expensive and can easily get stuck into local optima. We notice that interestingly, by making use of higher-order cumulants, there exists a closed-form solution to OICA in specific cases, e.g., when the mixing procedure follows the One-Latent-Component structure. In light of the power of the closed-form solution to OICA corresponding to the One-Latent-Component structure, we formulate a way to estimate the mixing matrix using the higher-order cumulants, and further propose the testable One-Latent-Component condition to identify the latent variables and determine causal orders. By iteratively removing the share identified latent components, we successfully extend the results on the One-Latent-Component structure to the Multi-Latent-Component structure and finally provide a practical and asymptotically correct algorithm to learn the causal structure with latent variables. Experimental results illustrate the asymptotic correctness and effectiveness of the proposed method.'
volume: 202
URL: https://proceedings.mlr.press/v202/cai23a.html
PDF: https://proceedings.mlr.press/v202/cai23a/cai23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cai23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ruichu
family: Cai
- given: Zhiyi
family: Huang
- given: Wei
family: Chen
- given: Zhifeng
family: Hao
- given: Kun
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3380-3407
id: cai23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3380
lastpage: 3407
published: 2023-07-03 00:00:00 +0000
- title: 'On the Connection Between MPNN and Graph Transformer'
abstract: 'Graph Transformer (GT) recently has emerged as a new paradigm of graph learning algorithms, outperforming the previously popular Message Passing Neural Network (MPNN) on multiple benchmarks. Previous work shows that with proper position embedding, GT can approximate MPNN arbitrarily well, implying that GT is at least as powerful as MPNN. In this paper, we study the inverse connection and show that MPNN with virtual node (VN), a commonly used heuristic with little theoretical understanding, is powerful enough to arbitrarily approximate the self-attention layer of GT. In particular, we first show that if we consider one type of linear transformer, the so-called Performer/Linear Transformer, then MPNN + VN with only $\mathcal{O}(1)$ depth and $\mathcal{O}(1)$ width can approximate a self-attention layer in Performer/Linear Transformer. Next, via a connection between MPNN + VN and DeepSets, we prove the MPNN + VN with $\mathcal{O}(n^d)$ width and $\mathcal{O}(1)$ depth can approximate the self-attention layer arbitrarily well, where $d$ is the input feature dimension. Lastly, under some assumptions, we provide an explicit construction of MPNN + VN with $\mathcal{O}(1)$ width and $\mathcal{O}(n)$ depth approximating the self-attention layer in GT arbitrarily well. On the empirical side, we demonstrate that 1) MPNN + VN is a surprisingly strong baseline, outperforming GT on the recently proposed Long Range Graph Benchmark (LRGB) dataset, 2) our MPNN + VN improves over early implementation on a wide range of OGB datasets and 3) MPNN + VN outperforms Linear Transformer and MPNN on the climate modeling task.'
volume: 202
URL: https://proceedings.mlr.press/v202/cai23b.html
PDF: https://proceedings.mlr.press/v202/cai23b/cai23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cai23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chen
family: Cai
- given: Truong Son
family: Hy
- given: Rose
family: Yu
- given: Yusu
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3408-3430
id: cai23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3408
lastpage: 3430
published: 2023-07-03 00:00:00 +0000
- title: 'Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition'
abstract: 'This paper presents Ske2Grid, a new representation learning framework for improved skeleton-based action recognition. In Ske2Grid, we define a regular convolution operation upon a novel grid representation of human skeleton, which is a compact image-like grid patch constructed and learned through three novel designs. Specifically, we propose a graph-node index transform (GIT) to construct a regular grid patch through assigning the nodes in the skeleton graph one by one to the desired grid cells. To ensure that GIT is a bijection and enrich the expressiveness of the grid representation, an up-sampling transform (UPT) is learned to interpolate the skeleton graph nodes for filling the grid patch to the full. To resolve the problem when the one-step UPT is aggressive and further exploit the representation capability of the grid patch with increasing spatial size, a progressive learning strategy (PLS) is proposed which decouples the UPT into multiple steps and aligns them to multiple paired GITs through a compact cascaded design learned progressively. We construct networks upon prevailing graph convolution networks and conduct experiments on six mainstream skeleton-based action recognition datasets. Experiments show that our Ske2Grid significantly outperforms existing GCN-based solutions under different benchmark settings, without bells and whistles. Code and models are available at https://github.com/OSVAI/Ske2Grid.'
volume: 202
URL: https://proceedings.mlr.press/v202/cai23c.html
PDF: https://proceedings.mlr.press/v202/cai23c/cai23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cai23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dongqi
family: Cai
- given: Yangyuxuan
family: Kang
- given: Anbang
family: Yao
- given: Yurong
family: Chen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3431-3441
id: cai23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3431
lastpage: 3441
published: 2023-07-03 00:00:00 +0000
- title: 'Extrapolated Random Tree for Regression'
abstract: 'In this paper, we propose a novel tree-based algorithm named *Extrapolated Random Tree for Regression* (ERTR) that adapts to arbitrary smoothness of the regression function while maintaining the interpretability of the tree. We first put forward the *homothetic random tree for regression* (HRTR) that converges to the target function as the homothetic ratio approaches zero. Then ERTR uses a linear regression model to extrapolate HRTR estimations with different ratios to the ratio zero. From the theoretical perspective, we for the first time establish the optimal convergence rates for ERTR when the target function resides in the general Hölder space $C^{k,\alpha}$ for $k\in \mathbb{N}$, whereas the lower bound of the convergence rate of the random tree for regression (RTR) is strictly slower than ERTR in the space $C^{k,\alpha}$ for $k\geq 1$. This shows that ERTR outperforms RTR for the target function with high-order smoothness due to the extrapolation. In the experiments, we compare ERTR with state-of-the-art tree algorithms on real datasets to show the superior performance of our model. Moreover, promising improvements are brought by using the extrapolated trees as base learners in the extension of ERTR to ensemble methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/cai23d.html
PDF: https://proceedings.mlr.press/v202/cai23d/cai23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cai23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuchao
family: Cai
- given: Yuheng
family: Ma
- given: Yiwei
family: Dong
- given: Hanfang
family: Yang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3442-3468
id: cai23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3442
lastpage: 3468
published: 2023-07-03 00:00:00 +0000
- title: 'Cyclic Block Coordinate Descent With Variance Reduction for Composite Nonconvex Optimization'
abstract: 'Nonconvex optimization is central in solving many machine learning problems, in which block-wise structure is commonly encountered. In this work, we propose cyclic block coordinate methods for nonconvex optimization problems with non-asymptotic gradient norm guarantees. Our convergence analysis is based on a gradient Lipschitz condition with respect to a Mahalanobis norm, inspired by a recent progress on cyclic block coordinate methods. In deterministic settings, our convergence guarantee matches the guarantee of (full-gradient) gradient descent, but with the gradient Lipschitz constant being defined w.r.t. a Mahalanobis norm. In stochastic settings, we use recursive variance reduction to decrease the per-iteration cost and match the arithmetic operation complexity of current optimal stochastic full-gradient methods, with a unified analysis for both finite-sum and infinite-sum cases. We prove a faster linear convergence result when a Polyak-Łojasiewicz (PŁ) condition holds. To our knowledge, this work is the first to provide non-asymptotic convergence guarantees — variance-reduced or not — for a cyclic block coordinate method in general composite (smooth + nonsmooth) nonconvex settings. Our experimental results demonstrate the efficacy of the proposed cyclic scheme in training deep neural nets.'
volume: 202
URL: https://proceedings.mlr.press/v202/cai23e.html
PDF: https://proceedings.mlr.press/v202/cai23e/cai23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cai23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xufeng
family: Cai
- given: Chaobing
family: Song
- given: Stephen
family: Wright
- given: Jelena
family: Diakonikolas
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3469-3494
id: cai23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3469
lastpage: 3494
published: 2023-07-03 00:00:00 +0000
- title: 'Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?'
abstract: 'Given a robust model trained to be resilient to one or multiple types of distribution shifts (e.g., natural image corruptions), how is that "robustness" encoded in the model weights, and how easily can it be disentangled and/or "zero-shot" transferred to some other models? This paper empirically suggests a surprisingly simple answer: linearly - by straightforward model weight arithmetic! We start by drawing several key observations: (i) assuming that we train the same model architecture on both a clean dataset and its corrupted version, a comparison between the two resultant models shows their weights to mostly differ in shallow layers; (ii) the weight difference after projection, which we call "Robust Weight Signature" (RWS), appears to be discriminative and indicative of different corruption types; (iii) perhaps most strikingly, for the same corruption type, the RWSs obtained by one model architecture are highly consistent and transferable across different datasets. Based on those RWS observations, we propose a minimalistic model robustness "patching" framework that carries a model trained on clean data together with its pre-extracted RWSs. In this way, injecting certain robustness to the model is reduced to directly adding the corresponding RWS to its weight. We experimentally verify our proposed framework to be remarkably (1) lightweight. since RWSs concentrate on the shallowest few layers and we further show they can be painlessly quantized, storing an RWS is up to 13 x more compact than storing the full weight copy; (2) in-situ adjustable. RWSs can be appended as needed and later taken off to restore the intact clean model. We further demonstrate one can linearly re-scale the RWS to control the patched robustness strength; (3) composable. Multiple RWSs can be added simultaneously to patch more comprehensive robustness at once; and (4) transferable. Even when the clean model backbone is continually adapted or updated, RWSs remain as effective patches due to their outstanding cross-dataset transferability.'
volume: 202
URL: https://proceedings.mlr.press/v202/cai23f.html
PDF: https://proceedings.mlr.press/v202/cai23f/cai23f.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cai23f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ruisi
family: Cai
- given: Zhenyu
family: Zhang
- given: Zhangyang
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3495-3506
id: cai23f
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3495
lastpage: 3506
published: 2023-07-03 00:00:00 +0000
- title: 'Doubly Optimal No-Regret Learning in Monotone Games'
abstract: 'We consider online learning in multi-player smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow $\mathcal{O}(\frac{1}{\sqrt{T}})$ last-iterate convergence rate to a Nash equilibrium. While the $\mathcal{O}(\frac{1}{\sqrt{T}})$ rate is tight for a large class of algorithms including the well-studied extragradient algorithm and optimistic gradient algorithm, it is not optimal for all gradient-based algorithms. We propose the *accelerated optimistic gradient* (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. Namely, our algorithm achieves both (i) the optimal $\mathcal{O}(\sqrt{T})$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $\mathcal{O}(\frac{1}{T})$ last-iterate convergence rate to a Nash equilibrium in multi-player smooth monotone games. As a byproduct of the accelerated last-iterate convergence rate, we further show that each player suffers only an $\mathcal{O}(\log T)$ individual *worst-case dynamic regret*, providing an exponential improvement over the previous state-of-the-art $\mathcal{O}(\sqrt{T})$ bound.'
volume: 202
URL: https://proceedings.mlr.press/v202/cai23g.html
PDF: https://proceedings.mlr.press/v202/cai23g/cai23g.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cai23g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yang
family: Cai
- given: Weiqiang
family: Zheng
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3507-3524
id: cai23g
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3507
lastpage: 3524
published: 2023-07-03 00:00:00 +0000
- title: 'Multi-Agent Learning from Learners'
abstract: 'A large body of the "Inverse Reinforcement Learning" (IRL) literature focuses on recovering the reward function from a set of demonstrations of an expert agent who acts optimally or noisily optimally. Nevertheless, some recent works move away from the optimality assumption to study the "Learning from a Learner (LfL)" problem, where the challenge is inferring the reward function of a learning agent from a sequence of demonstrations produced by progressively improving policies. In this work, we take one of the initial steps in addressing the multi-agent version of this problem and propose a new algorithm, MA-LfL (Multiagent Learning from a Learner). Unlike the state-of-the-art literature, which recovers the reward functions from trajectories produced by agents in some equilibrium, we study the problem of inferring the reward functions of interacting agents in a general sum stochastic game without assuming any equilibrium state. The MA-LfL algorithm is rigorously built on a theoretical result that ensures its validity in the case of agents learning according to a multi-agent soft policy iteration scheme. We empirically test MA-LfL and we observe high positive correlation between the recovered reward functions and the ground truth.'
volume: 202
URL: https://proceedings.mlr.press/v202/caliskan23a.html
PDF: https://proceedings.mlr.press/v202/caliskan23a/caliskan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-caliskan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mine Melodi
family: Caliskan
- given: Francesco
family: Chini
- given: Setareh
family: Maghsudi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3525-3540
id: caliskan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3525
lastpage: 3540
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient Learning of Mesh-Based Physical Simulation with Bi-Stride Multi-Scale Graph Neural Network'
abstract: 'Learning the long-range interactions on large-scale mesh-based physical systems with flat Graph Neural Networks (GNNs) and stacking Message Passings (MPs) is challenging due to the scaling complexity w.r.t. the number of nodes and over-smoothing. Therefore, there has been growing interest in the community to introduce *multi-scale* structures to GNNs for physics simulation. However, current state-of-the-art methods are limited by their reliance on the labor-heavy drawing of coarser meshes or building coarser levels based on spatial proximity, which can introduce wrong edges across geometry boundaries. Inspired by the bipartite graph determination, we propose a novel pooling strategy, *bi-stride* to tackle the aforementioned limitations. Bi-stride pools nodes on every other frontier of the Breadth-First-Search (BFS), without the need for the manual drawing of coarser meshes and, avoid wrong edges introduced by spatial proximity. Additionally, it enables a reduced number of MP times on each level and the non-parametrized pooling and unpooling by interpolations, similar to convolutional Neural Networks (CNNs), which significantly reduces computational requirements. Experiments show that the proposed framework, *BSMS-GNN*, significantly outperforms existing methods in terms of both accuracy and computational efficiency in representative physics-based simulation scenarios.'
volume: 202
URL: https://proceedings.mlr.press/v202/cao23a.html
PDF: https://proceedings.mlr.press/v202/cao23a/cao23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cao23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yadi
family: Cao
- given: Menglei
family: Chai
- given: Minchen
family: Li
- given: Chenfanfu
family: Jiang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3541-3558
id: cao23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3541
lastpage: 3558
published: 2023-07-03 00:00:00 +0000
- title: 'Variational Sparse Inverse Cholesky Approximation for Latent Gaussian Processes via Double Kullback-Leibler Minimization'
abstract: 'To achieve scalable and accurate inference for latent Gaussian processes, we propose a variational approximation based on a family of Gaussian distributions whose covariance matrices have sparse inverse Cholesky (SIC) factors. We combine this variational approximation of the posterior with a similar and efficient SIC-restricted Kullback-Leibler-optimal approximation of the prior. We then focus on a particular SIC ordering and nearest-neighbor-based sparsity pattern resulting in highly accurate prior and posterior approximations. For this setting, our variational approximation can be computed via stochastic gradient descent in polylogarithmic time per iteration. We provide numerical comparisons showing that the proposed double-Kullback-Leibler-optimal Gaussian-process approximation (DKLGP) can sometimes be vastly more accurate for stationary kernels than alternative approaches such as inducing-point and mean-field approximations at similar computational complexity.'
volume: 202
URL: https://proceedings.mlr.press/v202/cao23b.html
PDF: https://proceedings.mlr.press/v202/cao23b/cao23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cao23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jian
family: Cao
- given: Myeongjong
family: Kang
- given: Felix
family: Jimenez
- given: Huiyan
family: Sang
- given: Florian Tobias
family: Schaefer
- given: Matthias
family: Katzfuss
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3559-3576
id: cao23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3559
lastpage: 3576
published: 2023-07-03 00:00:00 +0000
- title: 'Learning Lightweight Object Detectors via Multi-Teacher Progressive Distillation'
abstract: 'Resource-constrained perception systems such as edge computing and vision-for-robotics require vision models to be both accurate and lightweight in computation and memory usage. While knowledge distillation is a proven strategy to enhance the performance of lightweight classification models, its application to structured outputs like object detection and instance segmentation remains a complicated task, due to the variability in outputs and complex internal network modules involved in the distillation process. In this paper, we propose a simple yet surprisingly effective sequential approach to knowledge distillation that progressively transfers the knowledge of a set of teacher detectors to a given lightweight student. To distill knowledge from a highly accurate but complex teacher model, we construct a sequence of teachers to help the student gradually adapt. Our progressive strategy can be easily combined with existing detection distillation mechanisms to consistently maximize student performance in various settings. To the best of our knowledge, we are the first to successfully distill knowledge from Transformer-based teacher detectors to convolution-based students, and unprecedentedly boost the performance of ResNet-50 based RetinaNet from 36.5% to 42.0% AP and Mask R-CNN from 38.2% to 42.5% AP on the MS COCO benchmark. Code available at https://github.com/Shengcao-Cao/MTPD.'
volume: 202
URL: https://proceedings.mlr.press/v202/cao23c.html
PDF: https://proceedings.mlr.press/v202/cao23c/cao23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cao23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shengcao
family: Cao
- given: Mengtian
family: Li
- given: James
family: Hays
- given: Deva
family: Ramanan
- given: Yu-Xiong
family: Wang
- given: Liangyan
family: Gui
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3577-3598
id: cao23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3577
lastpage: 3598
published: 2023-07-03 00:00:00 +0000
- title: 'One-sided Matrix Completion from Two Observations Per Row'
abstract: 'Given only a few observed entries from a low-rank matrix $X$, matrix completion is the problem of imputing the missing entries, and it formalizes a wide range of real-world settings that involve estimating missing data. However, when there are too few observed entries to complete the matrix, what other aspects of the underlying matrix can be reliably recovered? We study one such problem setting, that of “one-sided” matrix completion, where our goal is to recover the right singular vectors of $X$, even in the regime where recovering the left singular vectors is impossible, which arises when there are more rows than columns and very few observations. We propose a natural algorithm that involves imputing the missing values of the matrix $X^TX$ and show that even with only two observations per row in $X$, we can provably recover $X^TX$ as long as we have at least $\Omega(r^2 d \log d)$ rows, where $r$ is the rank and $d$ is the number of columns. We evaluate our algorithm on one-sided recovery of synthetic data and low-coverage genome sequencing. In these settings, our algorithm substantially outperforms standard matrix completion and a variety of direct factorization methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/cao23d.html
PDF: https://proceedings.mlr.press/v202/cao23d/cao23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cao23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Steven
family: Cao
- given: Percy
family: Liang
- given: Gregory
family: Valiant
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3599-3624
id: cao23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3599
lastpage: 3624
published: 2023-07-03 00:00:00 +0000
- title: 'State and parameter learning with PARIS particle Gibbs'
abstract: 'Non-linear state-space models, also known as general hidden Markov models (HMM), are ubiquitous in statistical machine learning, being the most classical generative models for serial data and sequences. Learning in HMM, either via Maximum Likelihood Estimation (MLE) or Markov Score Climbing (MSC) requires the estimation of the- smoothing expectation of some additive functionals. Controlling the bias and the variance of this estimation is crucial to establish the convergence of learning algorithms. Our first contribution is to design a novel additive smoothing algorithm, the Parisian particle Gibbs (PPG) sampler, which can be viewed as a PaRIS (Olsson, Westerborn 2017) algorithm driven by conditional SMC moves, resulting in bias-reduced estimates of the targeted quantities. We substantiate the PPG algorithm with theoretical results, including new bounds on bias and variance as well as deviation inequalities. We then establish, in the learning context, and under standard assumptions, non-asymptotic bounds highlighting the value of bias reduction and the implicit Rao–Blackwellization of PPG. These are the first non-asymptotic results of this kind in this setting. We illustrate our theoretical results with numerical experiments supporting our claims.'
volume: 202
URL: https://proceedings.mlr.press/v202/cardoso23a.html
PDF: https://proceedings.mlr.press/v202/cardoso23a/cardoso23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cardoso23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gabriel
family: Cardoso
- given: Yazid
family: Janati El Idrissi
- given: Sylvain
family: Le Corff
- given: Eric
family: Moulines
- given: Jimmy
family: Olsson
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3625-3675
id: cardoso23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3625
lastpage: 3675
published: 2023-07-03 00:00:00 +0000
- title: 'Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning'
abstract: 'Recent works successfully leveraged Large Language Models’ (LLM) abilities to capture abstract knowledge about world’s physics to solve decision-making problems. Yet, the alignment between LLMs’ knowledge and the environment can be wrong and limit functional competence due to lack of grounding. In this paper, we study an approach (named GLAM) to achieve this alignment through functional grounding: we consider an agent using an LLM as a policy that is progressively updated as the agent interacts with the environment, leveraging online Reinforcement Learning to improve its performance to solve goals. Using an interactive textual environment designed to study higher-level forms of functional grounding, and a set of spatial and navigation tasks, we study several scientific questions: 1) Can LLMs boost sample efficiency for online learning of various RL tasks? 2) How can it boost different forms of generalization? 3) What is the impact of online learning? We study these questions by functionally grounding several variants (size, architecture) of FLAN-T5.'
volume: 202
URL: https://proceedings.mlr.press/v202/carta23a.html
PDF: https://proceedings.mlr.press/v202/carta23a/carta23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-carta23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Carta
- given: Clément
family: Romac
- given: Thomas
family: Wolf
- given: Sylvain
family: Lamprier
- given: Olivier
family: Sigaud
- given: Pierre-Yves
family: Oudeyer
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3676-3713
id: carta23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3676
lastpage: 3713
published: 2023-07-03 00:00:00 +0000
- title: 'Stein Variational Goal Generation for adaptive Exploration in Multi-Goal Reinforcement Learning'
abstract: 'In multi-goal Reinforcement Learning, an agent can share experience between related training tasks, resulting in better generalization for new tasks at test time. However, when the goal space has discontinuities and the reward is sparse, a majority of goals are difficult to reach. In this context, a curriculum over goals helps agents learn by adapting training tasks to their current capabilities. In this work, we propose Stein Variational Goal Generation (SVGG), which samples goals of intermediate difficulty for the agent, by leveraging a learned predictive model of its goal reaching capabilities. The distribution of goals is modeled with particles that are attracted in areas of appropriate difficulty using Stein Variational Gradient Descent. We show that SVGG outperforms state-of-the-art multi-goal Reinforcement Learning methods in terms of success coverage in hard exploration problems, and demonstrate that it is endowed with a useful recovery property when the environment changes.'
volume: 202
URL: https://proceedings.mlr.press/v202/castanet23a.html
PDF: https://proceedings.mlr.press/v202/castanet23a/castanet23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-castanet23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nicolas
family: Castanet
- given: Olivier
family: Sigaud
- given: Sylvain
family: Lamprier
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3714-3731
id: castanet23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3714
lastpage: 3731
published: 2023-07-03 00:00:00 +0000
- title: 'Scalable Safe Policy Improvement via Monte Carlo Tree Search'
abstract: 'Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent.'
volume: 202
URL: https://proceedings.mlr.press/v202/castellini23a.html
PDF: https://proceedings.mlr.press/v202/castellini23a/castellini23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-castellini23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alberto
family: Castellini
- given: Federico
family: Bianchi
- given: Edoardo
family: Zorzi
- given: Thiago D.
family: Simão
- given: Alessandro
family: Farinelli
- given: Matthijs T. J.
family: Spaan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3732-3756
id: castellini23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3732
lastpage: 3756
published: 2023-07-03 00:00:00 +0000
- title: 'LESS-VFL: Communication-Efficient Feature Selection for Vertical Federated Learning'
abstract: 'We propose LESS-VFL, a communication-efficient feature selection method for distributed systems with vertically partitioned data. We consider a system of a server and several parties with local datasets that share a sample ID space but have different feature sets. The parties wish to collaboratively train a model for a prediction task. As part of the training, the parties wish to remove unimportant features in the system to improve generalization, efficiency, and explainability. In LESS-VFL, after a short pre-training period, the server optimizes its part of the global model to determine the relevant outputs from party models. This information is shared with the parties to then allow local feature selection without communication. We analytically prove that LESS-VFL removes spurious features from model training. We provide extensive empirical evidence that LESS-VFL can achieve high accuracy and remove spurious features at a fraction of the communication cost of other feature selection approaches.'
volume: 202
URL: https://proceedings.mlr.press/v202/castiglia23a.html
PDF: https://proceedings.mlr.press/v202/castiglia23a/castiglia23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-castiglia23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Timothy
family: Castiglia
- given: Yi
family: Zhou
- given: Shiqiang
family: Wang
- given: Swanand
family: Kadhe
- given: Nathalie
family: Baracaldo
- given: Stacy
family: Patterson
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3757-3781
id: castiglia23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3757
lastpage: 3781
published: 2023-07-03 00:00:00 +0000
- title: 'On the Robustness of Text Vectorizers'
abstract: 'A fundamental issue in machine learning is the robustness of the model with respect to changes in the input. In natural language processing, models typically contain a first embedding layer, transforming a sequence of tokens into vector representations. While the robustness with respect to changes of continuous inputs is well-understood, the situation is less clear when considering discrete changes, for instance replacing a word by another in an input sentence. Our work formally proves that popular embedding schemes, such as concatenation, TF-IDF, and Paragraph Vector (a.k.a. doc2vec), exhibit robustness in the Hölder or Lipschitz sense with respect to the Hamming distance. We provide quantitative bounds for these schemes and demonstrate how the constants involved are affected by the length of the document. These findings are exemplified through a series of numerical examples.'
volume: 202
URL: https://proceedings.mlr.press/v202/catellier23a.html
PDF: https://proceedings.mlr.press/v202/catellier23a/catellier23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-catellier23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rémi
family: Catellier
- given: Samuel
family: Vaiter
- given: Damien
family: Garreau
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3782-3814
id: catellier23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3782
lastpage: 3814
published: 2023-07-03 00:00:00 +0000
- title: 'Learning Globally Smooth Functions on Manifolds'
abstract: 'Smoothness and low dimensional structures play central roles in improving generalization and stability in learning and statistics. This work combines techniques from semi-infinite constrained learning and manifold regularization to learn representations that are globally smooth on a manifold. To do so, it shows that under typical conditions the problem of learning a Lipschitz continuous function on a manifold is equivalent to a dynamically weighted manifold regularization problem. This observation leads to a practical algorithm based on a weighted Laplacian penalty whose weights are adapted using stochastic gradient techniques. It is shown that under mild conditions, this method estimates the Lipschitz constant of the solution, learning a globally smooth solution as a byproduct. Experiments on real world data illustrate the advantages of the proposed method relative to existing alternatives. Our code is available at https://github.com/JuanCervino/smoothbench.'
volume: 202
URL: https://proceedings.mlr.press/v202/cervino23a.html
PDF: https://proceedings.mlr.press/v202/cervino23a/cervino23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cervino23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Juan
family: Cervino
- given: Luiz F. O.
family: Chamon
- given: Benjamin David
family: Haeffele
- given: Rene
family: Vidal
- given: Alejandro
family: Ribeiro
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3815-3854
id: cervino23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3815
lastpage: 3854
published: 2023-07-03 00:00:00 +0000
- title: 'Tighter Lower Bounds for Shuffling SGD: Random Permutations and Beyond'
abstract: 'We study convergence lower bounds of without-replacement stochastic gradient descent (SGD) for solving smooth (strongly-)convex finite-sum minimization problems. Unlike most existing results focusing on final iterate lower bounds in terms of the number of components $n$ and the number of epochs $K$, we seek bounds for arbitrary weighted average iterates that are tight in all factors including the condition number $\kappa$. For SGD with Random Reshuffling, we present lower bounds that have tighter $\kappa$ dependencies than existing bounds. Our results are the first to perfectly close the gap between lower and upper bounds for weighted average iterates in both strongly-convex and convex cases. We also prove weighted average iterate lower bounds for arbitrary permutation-based SGD, which apply to all variants that carefully choose the best permutation. Our bounds improve the existing bounds in factors of $n$ and $\kappa$ and thereby match the upper bounds shown for a recently proposed algorithm called GraB.'
volume: 202
URL: https://proceedings.mlr.press/v202/cha23a.html
PDF: https://proceedings.mlr.press/v202/cha23a/cha23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cha23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jaeyoung
family: Cha
- given: Jaewook
family: Lee
- given: Chulhee
family: Yun
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3855-3912
id: cha23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3855
lastpage: 3912
published: 2023-07-03 00:00:00 +0000
- title: 'Orthogonality-Enforced Latent Space in Autoencoders: An Approach to Learning Disentangled Representations'
abstract: 'Noting the importance of factorizing (or disentangling) the latent space, we propose a novel, non-probabilistic disentangling framework for autoencoders, based on the principles of symmetry transformations that are independent of one another. To the best of our knowledge, this is the first deterministic model that is aiming to achieve disentanglement based on autoencoders using only a reconstruction loss without pairs of images or labels, by explicitly introducing inductive biases into a model architecture through Euler encoding. The proposed model is then compared with a number of state-of-the-art models, relevant to disentanglement, including symmetry-based models and generative models. Our evaluation using six different disentanglement metrics, including the unsupervised disentanglement metric we propose here in this paper, shows that the proposed model can offer better disentanglement, especially when variances of the features are different, where other methods may struggle. We believe that this model opens several opportunities for linear disentangled representation learning based on deterministic autoencoders.'
volume: 202
URL: https://proceedings.mlr.press/v202/cha23b.html
PDF: https://proceedings.mlr.press/v202/cha23b/cha23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cha23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jaehoon
family: Cha
- given: Jeyan
family: Thiyagalingam
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3913-3948
id: cha23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3913
lastpage: 3948
published: 2023-07-03 00:00:00 +0000
- title: 'STEERING : Stein Information Directed Exploration for Model-Based Reinforcement Learning'
abstract: 'Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when rewards are sparse. Information-directed sampling (IDS), which optimizes the information ratio, seeks to do so by augmenting regret with information gain. However, estimating information gain is computationally intractable or relies on restrictive assumptions which prohibit its use in many practical instances. In this work, we posit an alternative exploration incentive in terms of the integral probability metric (IPM) between a current estimate of the transition model and the unknown optimal, which under suitable conditions, can be computed in closed form with the kernelized Stein discrepancy (KSD). Based on KSD, we develop a novel algorithm STEERING: STEin information dirEcted exploration for model-based Reinforcement LearnING. To enable its derivation, we develop fundamentally new variants of KSD for discrete conditional distributions. We further establish that STEERING archives sublinear Bayesian regret, improving upon prior learning rates of information-augmented MBRL, IDS included. Experimentally, we show that the proposed algorithm is computationally affordable and outperforms several prior approaches.'
volume: 202
URL: https://proceedings.mlr.press/v202/chakraborty23a.html
PDF: https://proceedings.mlr.press/v202/chakraborty23a/chakraborty23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chakraborty23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Souradip
family: Chakraborty
- given: Amrit
family: Bedi
- given: Alec
family: Koppel
- given: Mengdi
family: Wang
- given: Furong
family: Huang
- given: Dinesh
family: Manocha
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3949-3978
id: chakraborty23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3949
lastpage: 3978
published: 2023-07-03 00:00:00 +0000
- title: 'Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits'
abstract: 'We consider the stochastic linear contextual bandit problem with high-dimensional features. We analyze the Thompson sampling algorithm using special classes of sparsity-inducing priors (e.g., spike-and-slab) to model the unknown parameter and provide a nearly optimal upper bound on the expected cumulative regret. To the best of our knowledge, this is the first work that provides theoretical guarantees of Thompson sampling in high-dimensional and sparse contextual bandits. For faster computation, we use variational inference instead of Markov Chain Monte Carlo (MCMC) to approximate the posterior distribution. Extensive simulations demonstrate the improved performance of our proposed algorithm over existing ones.'
volume: 202
URL: https://proceedings.mlr.press/v202/chakraborty23b.html
PDF: https://proceedings.mlr.press/v202/chakraborty23b/chakraborty23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chakraborty23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sunrit
family: Chakraborty
- given: Saptarshi
family: Roy
- given: Ambuj
family: Tewari
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 3979-4008
id: chakraborty23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 3979
lastpage: 4008
published: 2023-07-03 00:00:00 +0000
- title: 'Representations and Exploration for Deep Reinforcement Learning using Singular Value Decomposition'
abstract: 'Representation learning and exploration are among the key challenges for any deep reinforcement learning agent. In this work, we provide a singular value decomposition based method that can be used to obtain representations that preserve the underlying transition structure in the domain. Perhaps interestingly, we show that these representations also capture the relative frequency of state visitations, thereby providing an estimate for pseudo-counts for free. To scale this decomposition method to large-scale domains, we provide an algorithm that never requires building the transition matrix, can make use of deep networks, and also permits mini-batch training. Further, we draw inspiration from predictive state representations and extend our decomposition method to partially observable environments. With experiments on multi-task settings with partially observable domains, we show that the proposed method can not only learn useful representation on DM-Lab-30 environments (that have inputs involving language instructions, pixel images, rewards, among others) but it can also be effective at hard exploration tasks in DM-Hard-8 environments.'
volume: 202
URL: https://proceedings.mlr.press/v202/chandak23a.html
PDF: https://proceedings.mlr.press/v202/chandak23a/chandak23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chandak23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yash
family: Chandak
- given: Shantanu
family: Thakoor
- given: Zhaohan Daniel
family: Guo
- given: Yunhao
family: Tang
- given: Remi
family: Munos
- given: Will
family: Dabney
- given: Diana L
family: Borsa
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4009-4034
id: chandak23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4009
lastpage: 4034
published: 2023-07-03 00:00:00 +0000
- title: 'Memory-Based Dual Gaussian Processes for Sequential Learning'
abstract: 'Sequential learning with Gaussian processes (GPs) is challenging when access to past data is limited, for example, in continual and active learning. In such cases, errors can accumulate over time due to inaccuracies in the posterior, hyperparameters, and inducing points, making accurate learning challenging. Here, we present a method to keep all such errors in check using the recently proposed dual sparse variational GP. Our method enables accurate inference for generic likelihoods and improves learning by actively building and updating a memory of past data. We demonstrate its effectiveness in several applications involving Bayesian optimization, active learning, and continual learning.'
volume: 202
URL: https://proceedings.mlr.press/v202/chang23a.html
PDF: https://proceedings.mlr.press/v202/chang23a/chang23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chang23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Paul Edmund
family: Chang
- given: Prakhar
family: Verma
- given: S. T.
family: John
- given: Arno
family: Solin
- given: Mohammad Emtiyaz
family: Khan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4035-4054
id: chang23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4035
lastpage: 4054
published: 2023-07-03 00:00:00 +0000
- title: 'Muse: Text-To-Image Generation via Masked Generative Transformers'
abstract: 'We present Muse, a text-to-image Transformermodel that achieves state-of-the-art image genera-tion performance while being significantly moreefficient than diffusion or autoregressive models.Muse is trained on a masked modeling task indiscrete token space: given the text embeddingextracted from a pre-trained large language model(LLM), Muse learns to predict randomly maskedimage tokens. Compared to pixel-space diffusionmodels, such as Imagen and DALL-E 2, Muse issignificantly more efficient due to the use of dis-crete tokens and requires fewer sampling itera-tions; compared to autoregressive models such asParti, Muse is more efficient due to the use of par-allel decoding. The use of a pre-trained LLM en-ables fine-grained language understanding, whichtranslates to high-fidelity image generation andthe understanding of visual concepts such as ob-jects, their spatial relationships, pose, cardinalityetc. Our 900M parameter model achieves a newSOTA on CC3M, with an FID score of 6.06. TheMuse 3B parameter model achieves an FID of7.88 on zero-shot COCO evaluation, along with aCLIP score of 0.32. Muse also directly enables anumber of image editing applications without theneed to fine-tune or invert the model: inpainting,outpainting, and mask-free editing. More resultsand videos demonstrating editing are available at https://muse-icml.github.io/'
volume: 202
URL: https://proceedings.mlr.press/v202/chang23b.html
PDF: https://proceedings.mlr.press/v202/chang23b/chang23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chang23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Huiwen
family: Chang
- given: Han
family: Zhang
- given: Jarred
family: Barber
- given: Aaron
family: Maschinot
- given: Jose
family: Lezama
- given: Lu
family: Jiang
- given: Ming-Hsuan
family: Yang
- given: Kevin Patrick
family: Murphy
- given: William T.
family: Freeman
- given: Michael
family: Rubinstein
- given: Yuanzhen
family: Li
- given: Dilip
family: Krishnan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4055-4075
id: chang23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4055
lastpage: 4075
published: 2023-07-03 00:00:00 +0000
- title: 'On Investigating the Conservative Property of Score-Based Generative Models'
abstract: 'Existing Score-Based Models (SBMs) can be categorized into constrained SBMs (CSBMs) or unconstrained SBMs (USBMs) according to their parameterization approaches. CSBMs model probability density functions as Boltzmann distributions, and assign their predictions as the negative gradients of some scalar-valued energy functions. On the other hand, USBMs employ flexible architectures capable of directly estimating scores without the need to explicitly model energy functions. In this paper, we demonstrate that the architectural constraints of CSBMs may limit their modeling ability. In addition, we show that USBMs’ inability to preserve the property of conservativeness may lead to degraded performance in practice. To address the above issues, we propose Quasi-Conservative Score-Based Models (QCSBMs) for keeping the advantages of both CSBMs and USBMs. Our theoretical derivations demonstrate that the training objective of QCSBMs can be efficiently integrated into the training processes by leveraging the Hutchinson’s trace estimator. In addition, our experimental results on the CIFAR-10, CIFAR-100, ImageNet, and SVHN datasets validate the effectiveness of QCSBMs. Finally, we justify the advantage of QCSBMs using an example of a one-layered autoencoder.'
volume: 202
URL: https://proceedings.mlr.press/v202/chao23a.html
PDF: https://proceedings.mlr.press/v202/chao23a/chao23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chao23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chen-Hao
family: Chao
- given: Wei-Fang
family: Sun
- given: Bo-Wun
family: Cheng
- given: Chun-Yi
family: Lee
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4076-4095
id: chao23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4076
lastpage: 4095
published: 2023-07-03 00:00:00 +0000
- title: 'Robust and private stochastic linear bandits'
abstract: 'In this paper, we study the stochastic linear bandit problem under the additional requirements of *differential privacy*, *robustness* and *batched observations*. In particular, we assume an adversary randomly chooses a constant fraction of the observed rewards in each batch, replacing them with arbitrary numbers. We present differentially private and robust variants of the arm elimination algorithm using logarithmic batch queries under two privacy models and provide regret bounds in both settings. In the first model, every reward in each round is reported by a potentially different client, which reduces to standard local differential privacy (LDP). In the second model, every action is "owned" by a different client, who may aggregate the rewards over multiple queries and privatize the aggregate response instead. To the best of our knowledge, our algorithms are the first simultaneously providing differential privacy and adversarial robustness in the stochastic linear bandits problem.'
volume: 202
URL: https://proceedings.mlr.press/v202/charisopoulos23a.html
PDF: https://proceedings.mlr.press/v202/charisopoulos23a/charisopoulos23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-charisopoulos23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vasileios
family: Charisopoulos
- given: Hossein
family: Esfandiari
- given: Vahab
family: Mirrokni
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4096-4115
id: charisopoulos23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4096
lastpage: 4115
published: 2023-07-03 00:00:00 +0000
- title: 'Streaming Submodular Maximization with Differential Privacy'
abstract: 'In this work, we study the problem of privately maximizing a submodular function in the streaming setting. Extensive work has been done on privately maximizing submodular functions in the general case when the function depends upon the private data of individuals. However, when the size of the data stream drawn from the domain of the objective function is large or arrives very fast, one must privately optimize the objective within the constraints of the streaming setting. We establish fundamental differentially private baselines for this problem and then derive better trade-offs between privacy and utility for the special case of decomposable submodular functions. A submodular function is decomposable when it can be written as a sum of submodular functions; this structure arises naturally when each summand function models the utility of an individual and the goal is to study the total utility of the whole population as in the well-known Combinatorial Public Projects Problem. Finally, we complement our theoretical analysis with experimental corroboration.'
volume: 202
URL: https://proceedings.mlr.press/v202/chaturvedi23a.html
PDF: https://proceedings.mlr.press/v202/chaturvedi23a/chaturvedi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chaturvedi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anamay
family: Chaturvedi
- given: Huy
family: Nguyen
- given: Thy Dinh
family: Nguyen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4116-4143
id: chaturvedi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4116
lastpage: 4143
published: 2023-07-03 00:00:00 +0000
- title: 'Why does Throwing Away Data Improve Worst-Group Error?'
abstract: 'When facing data with imbalanced classes or groups, practitioners follow an intriguing strategy to achieve best results. They throw away examples until the classes or groups are balanced in size, and then perform empirical risk minimization on the reduced training set. This opposes common wisdom in learning theory, where the expected error is supposed to decrease as the dataset grows in size. In this work, we leverage extreme value theory to address this apparent contradiction. Our results show that the tails of the data distribution play an important role in determining the worst-group-accuracy of linear classifiers. When learning on data with heavy tails, throwing away data restores the geometric symmetry of the resulting classifier, and therefore improves its worst-group generalization.'
volume: 202
URL: https://proceedings.mlr.press/v202/chaudhuri23a.html
PDF: https://proceedings.mlr.press/v202/chaudhuri23a/chaudhuri23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chaudhuri23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kamalika
family: Chaudhuri
- given: Kartik
family: Ahuja
- given: Martin
family: Arjovsky
- given: David
family: Lopez-Paz
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4144-4188
id: chaudhuri23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4144
lastpage: 4188
published: 2023-07-03 00:00:00 +0000
- title: 'Collaborative Multi-Agent Heterogeneous Multi-Armed Bandits'
abstract: 'The study of collaborative multi-agent bandits has attracted significant attention recently. In light of this, we initiate the study of a new collaborative setting, consisting of $N$ agents such that each agent is learning one of $M$ stochastic multi-armed bandits to minimize their group cumulative regret. We develop decentralized algorithms which facilitate collaboration between the agents under two scenarios. We characterize the performance of these algorithms by deriving the per agent cumulative regret and group regret upper bounds. We also prove lower bounds for the group regret in this setting, which demonstrates the near-optimal behavior of the proposed algorithms.'
volume: 202
URL: https://proceedings.mlr.press/v202/chawla23a.html
PDF: https://proceedings.mlr.press/v202/chawla23a/chawla23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chawla23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ronshee
family: Chawla
- given: Daniel
family: Vial
- given: Sanjay
family: Shakkottai
- given: R.
family: Srikant
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4189-4217
id: chawla23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4189
lastpage: 4217
published: 2023-07-03 00:00:00 +0000
- title: 'Correcting discount-factor mismatch in on-policy policy gradient methods'
abstract: 'The policy gradient theorem gives a convenient form of the policy gradient in terms of three factors: an action value, a gradient of the action likelihood, and a state distribution involving discounting called the *discounted stationary distribution*. But commonly used on-policy methods based on the policy gradient theorem ignores the discount factor in the state distribution, which is technically incorrect and may even cause degenerate learning behavior in some environments. An existing solution corrects this discrepancy by using $\gamma^t$ as a factor in the gradient estimate. However, this solution is not widely adopted and does not work well in tasks where the later states are similar to earlier states. We introduce a novel distribution correction to account for the discounted stationary distribution that can be plugged into many existing gradient estimators. Our correction circumvents the performance degradation associated with the $\gamma^t$ correction with a lower variance. Importantly, compared to the uncorrected estimators, our algorithm provides improved state emphasis to evade suboptimal policies in certain environments and consistently matches or exceeds the original performance on several OpenAI gym and DeepMind suite benchmarks.'
volume: 202
URL: https://proceedings.mlr.press/v202/che23a.html
PDF: https://proceedings.mlr.press/v202/che23a/che23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-che23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Fengdi
family: Che
- given: Gautham
family: Vasan
- given: A. Rupam
family: Mahmood
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4218-4240
id: che23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4218
lastpage: 4240
published: 2023-07-03 00:00:00 +0000
- title: 'Fast Federated Machine Unlearning with Nonlinear Functional Theory'
abstract: 'Federated machine unlearning (FMU) aims to remove the influence of a specified subset of training data upon request from a trained federated learning model. Despite achieving remarkable performance, existing FMU techniques suffer from inefficiency due to two sequential operations of training and retraining/unlearning on large-scale datasets. Our prior study, PCMU, was proposed to improve the efficiency of centralized machine unlearning (CMU) with certified guarantees, by simultaneously executing the training and unlearning operations. This paper proposes a fast FMU algorithm, FFMU, for improving the FMU efficiency while maintaining the unlearning quality. The PCMU method is leveraged to train a local machine learning (MU) model on each edge device. We propose to employ nonlinear functional analysis techniques to refine the local MU models as output functions of a Nemytskii operator. We conduct theoretical analysis to derive that the Nemytskii operator has a global Lipschitz constant, which allows us to bound the difference between two MU models regarding the distance between their gradients. Based on the Nemytskii operator and average smooth local gradients, the global MU model on the server is guaranteed to achieve close performance to each local MU model with the certified guarantees.'
volume: 202
URL: https://proceedings.mlr.press/v202/che23b.html
PDF: https://proceedings.mlr.press/v202/che23b/che23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-che23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tianshi
family: Che
- given: Yang
family: Zhou
- given: Zijie
family: Zhang
- given: Lingjuan
family: Lyu
- given: Ji
family: Liu
- given: Da
family: Yan
- given: Dejing
family: Dou
- given: Jun
family: Huan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4241-4268
id: che23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4241
lastpage: 4268
published: 2023-07-03 00:00:00 +0000
- title: 'On the Statistical Benefits of Temporal Difference Learning'
abstract: 'Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions that minimize prediction error on the training data. Temporal difference learning (TD) methods instead fit value functions by minimizing the degree of temporal inconsistency between estimates made at successive time-steps. Focusing on finite state Markov chains, we provide a crisp asymptotic theory of the statistical advantages of this approach. First, we show that an intuitive inverse trajectory pooling coefficient completely characterizes the percent reduction in mean-squared error of value estimates. Depending on problem structure, the reduction could be enormous or nonexistent. Next, we prove that there can be dramatic improvements in estimates of the difference in value-to-go for two states: TD’s errors are bounded in terms of a novel measure – the problem’s trajectory crossing time – which can be much smaller than the problem’s time horizon.'
volume: 202
URL: https://proceedings.mlr.press/v202/cheikhi23a.html
PDF: https://proceedings.mlr.press/v202/cheikhi23a/cheikhi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cheikhi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Cheikhi
- given: Daniel
family: Russo
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4269-4293
id: cheikhi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4269
lastpage: 4293
published: 2023-07-03 00:00:00 +0000
- title: 'Multi-Layer Neural Networks as Trainable Ladders of Hilbert Spaces'
abstract: 'To characterize the functions spaces explored by multi-layer neural networks (NNs), we introduce Neural Hilbert Ladders (NHLs), a collection of reproducing kernel Hilbert spaces (RKHSes) that are defined iteratively and adaptive to training. First, we prove a correspondence between functions expressed by L-layer NNs and those belonging to L-level NHLs. Second, we prove generalization guarantees for learning the NHL based on a new complexity measure. Third, corresponding to the training of multi-layer NNs in the infinite-width mean-field limit, we derive an evolution of the NHL characterized by the dynamics of multiple random fields. Finally, we examine linear and shallow NNs from the new perspective and complement the theory with numerical results.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23a.html
PDF: https://proceedings.mlr.press/v202/chen23a/chen23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhengdao
family: Chen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4294-4329
id: chen23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4294
lastpage: 4329
published: 2023-07-03 00:00:00 +0000
- title: 'Beyond the Edge of Stability via Two-step Gradient Updates'
abstract: 'Gradient Descent (GD) is a powerful workhorse of modern machine learning thanks to its scalability and efficiency in high-dimensional spaces. Its ability to find local minimisers is only guaranteed for losses with Lipschitz gradients, where it can be seen as a ’bona-fide’ discretisation of an underlying gradient flow. Yet, many ML setups involving overparametrised models do not fall into this problem class, which has motivated research beyond the so-called ”Edge of Stability” (EoS), where the step-size crosses the admissibility threshold inversely proportional to the Lipschitz constant above. Perhaps surprisingly, GD has been empirically observed to still converge regardless of local instability and oscillatory behavior. The incipient theoretical analysis of this phenomena has mainly focused in the overparametrised regime, where the effect of choosing a large learning rate may be associated to a ‘Sharpness-Minimisation’ implicit regularisation within the manifold of minimisers, under appropriate asymptotic limits. In contrast, in this work we directly examine the conditions for such unstable convergence, focusing on simple, yet representative, learning problems, via analysis of two-step gradient updates. Specifically, we characterize a local condition involving third-order derivatives that guarantees existence and convergence to fixed points of the two-step updates, and leverage such property in a teacher-student setting, under population loss. Finally, starting from Matrix Factorization, we provide observations of period-2 orbit of GD in high-dimensional settings with intuition of its dynamics, along with exploration into more general settings.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23b.html
PDF: https://proceedings.mlr.press/v202/chen23b/chen23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lei
family: Chen
- given: Joan
family: Bruna
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4330-4391
id: chen23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4330
lastpage: 4391
published: 2023-07-03 00:00:00 +0000
- title: 'Trompt: Towards a Better Deep Neural Network for Tabular Data'
abstract: 'Tabular data is arguably one of the most commonly used data structures in various practical domains, including finance, healthcare and e-commerce. The inherent heterogeneity allows tabular data to store rich information. However, based on a recently published tabular benchmark, we can see deep neural networks still fall behind tree-based models on tabular datasets. In this paper, we propose Trompt–which stands for Tabular Prompt–a novel architecture inspired by prompt learning of language models. The essence of prompt learning is to adjust a large pre-trained model through a set of prompts outside the model without directly modifying the model. Based on this idea, Trompt separates the learning strategy of tabular data into two parts. The first part, analogous to pre-trained models, focus on learning the intrinsic information of a table. The second part, analogous to prompts, focus on learning the variations among samples. Trompt is evaluated with the benchmark mentioned above. The experimental results demonstrate that Trompt outperforms state-of-the-art deep neural networks and is comparable to tree-based models.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23c.html
PDF: https://proceedings.mlr.press/v202/chen23c/chen23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kuan-Yu
family: Chen
- given: Ping-Han
family: Chiang
- given: Hsin-Rung
family: Chou
- given: Ting-Wei
family: Chen
- given: Tien-Hao
family: Chang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4392-4434
id: chen23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4392
lastpage: 4434
published: 2023-07-03 00:00:00 +0000
- title: 'Differentially Private Stochastic Convex Optimization under a Quantile Loss Function'
abstract: 'We study $(\varepsilon,\delta)$-differentially private (DP) stochastic convex optimization under an $r$-th quantile loss function taking the form $c(u) = ru^+ + (1-r)(-u)^+$. The function is non-smooth, and we propose to approximate it with a smooth function obtained by convolution smoothing, which enjoys both structure and bandwidth flexibility and can address outliers. This leads to a better approximation than those obtained from existing methods such as Moreau Envelope. We then design private algorithms based on DP stochastic gradient descent and objective perturbation, and show that both algorithms achieve (near) optimal excess generalization risk $O(\max\{\frac{1}{\sqrt{n}}, \frac{\sqrt{d\ln(1/\delta)}}{n\varepsilon}\})$. Through objective perturbation, we further derive an upper bound $O(\max\{\sqrt{\frac{d}{n}}, \sqrt{\frac{d\ln(1/\delta)}{n\varepsilon}}\})$ on the parameter estimation error under mild assumptions on data generating processes. Some applications in private quantile regression and private inventory control will be discussed.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23d.html
PDF: https://proceedings.mlr.press/v202/chen23d/chen23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Du
family: Chen
- given: Geoffrey A.
family: Chua
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4435-4461
id: chen23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4435
lastpage: 4461
published: 2023-07-03 00:00:00 +0000
- title: 'Restoration-Degradation Beyond Linear Diffusions: A Non-Asymptotic Analysis For DDIM-type Samplers'
abstract: 'We develop a framework for non-asymptotic analysis of deterministic samplers used for diffusion generative modeling. Several recent works have analyzed stochastic samplers using tools like Girsanov’s theorem and a chain rule variant of the interpolation argument. Unfortunately, these techniques give vacuous bounds when applied to deterministic samplers. We give a new operational interpretation for deterministic sampling by showing that one step along the probability flow ODE can be expressed as two steps: 1) a restoration step that runs gradient ascent on the conditional log-likelihood at some infinitesimally previous time, and 2) a degradation step that runs the forward process using noise pointing back towards the current iterate. This perspective allows us to extend denoising diffusion implicit models to general, non-linear forward processes. We then develop the first polynomial convergence bounds for these samplers under mild conditions on the data distribution.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23e.html
PDF: https://proceedings.mlr.press/v202/chen23e/chen23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sitan
family: Chen
- given: Giannis
family: Daras
- given: Alex
family: Dimakis
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4462-4484
id: chen23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4462
lastpage: 4484
published: 2023-07-03 00:00:00 +0000
- title: 'Provably Convergent Schrödinger Bridge with Applications to Probabilistic Time Series Imputation'
abstract: 'The Schrödinger bridge problem (SBP) is gaining increasing attention in generative modeling and showing promising potential even in comparison with the score-based generative models (SGMs). SBP can be interpreted as an entropy-regularized optimal transport problem, which conducts projections onto every other marginal alternatingly. However, in practice, only approximated projections are accessible and their convergence is not well understood. To fill this gap, we present a first convergence analysis of the Schrödinger bridge algorithm based on approximated projections. As for its practical applications, we apply SBP to probabilistic time series imputation by generating missing values conditioned on observed data. We show that optimizing the transport cost improves the performance and the proposed algorithm achieves the state-of-the-art result in healthcare and environmental data while exhibiting the advantage of exploring both temporal and feature patterns in probabilistic time series imputation.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23f.html
PDF: https://proceedings.mlr.press/v202/chen23f/chen23f.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yu
family: Chen
- given: Wei
family: Deng
- given: Shikai
family: Fang
- given: Fengpei
family: Li
- given: Nicole Tianjiao
family: Yang
- given: Yikai
family: Zhang
- given: Kashif
family: Rasul
- given: Shandian
family: Zhe
- given: Anderson
family: Schneider
- given: Yuriy
family: Nevmyvaka
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4485-4513
id: chen23f
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4485
lastpage: 4513
published: 2023-07-03 00:00:00 +0000
- title: 'ED-Batch: Efficient Automatic Batching of Dynamic Neural Networks via Learned Finite State Machines'
abstract: 'Batching has a fundamental influence on the efficiency of deep neural network (DNN) execution. However, for dynamic DNNs, efficient batching is particularly challenging as the dataflow graph varies per input instance. As a result, state-of-the-art frameworks use heuristics that result in suboptimal batching decisions. Further, batching puts strict restrictions on memory adjacency and can lead to high data movement costs. In this paper, we provide an approach for batching dynamic DNNs based on finite state machines, which enables the automatic discovery of batching policies specialized for each DNN via reinforcement learning. Moreover, we find that memory planning that is aware of the batching policy can save significant data movement overheads, which is automated by a PQ tree-based algorithm we introduce. Experimental results show that our framework speeds up state-of-the-art frameworks by on average 1.15x, 1.39x, and 2.45x for chain-based, tree-based, and lattice-based DNNs across CPU and GPU. The framework is open-sourced at https://github.com/gulang2019/ED-Batch.git.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23g.html
PDF: https://proceedings.mlr.press/v202/chen23g/chen23g.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siyuan
family: Chen
- given: Pratik Pramod
family: Fegade
- given: Tianqi
family: Chen
- given: Phillip
family: Gibbons
- given: Todd
family: Mowry
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4514-4528
id: chen23g
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4514
lastpage: 4528
published: 2023-07-03 00:00:00 +0000
- title: 'Is Learning Summary Statistics Necessary for Likelihood-free Inference?'
abstract: 'Likelihood-free inference (LFI) is a set of techniques for inference in implicit statistical models. A longstanding question in LFI has been how to design or learn good summary statistics of data, but this might now seem unnecessary due to the advent of recent end-to-end (i.e. neural network-based) LFI methods. In this work, we rethink this question with a new method for learning summary statistics. We show that learning sufficient statistics may be easier than direct posterior inference, as the former problem can be reduced to a set of low-dimensional, easy-to-solve learning problems. This suggests us to explicitly decouple summary statistics learning from posterior inference in LFI. Experiments on diverse inference tasks with different data types validate our hypothesis.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23h.html
PDF: https://proceedings.mlr.press/v202/chen23h/chen23h.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yanzhi
family: Chen
- given: Michael U.
family: Gutmann
- given: Adrian
family: Weller
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4529-4544
id: chen23h
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4529
lastpage: 4544
published: 2023-07-03 00:00:00 +0000
- title: 'Subequivariant Graph Reinforcement Learning in 3D Environments'
abstract: 'Learning a shared policy that guides the locomotion of different agents is of core interest in Reinforcement Learning (RL), which leads to the study of morphology-agnostic RL. However, existing benchmarks are highly restrictive in the choice of starting point and target point, constraining the movement of the agents within 2D space. In this work, we propose a novel setup for morphology-agnostic RL, dubbed Subequivariant Graph RL in 3D environments (3D-SGRL). Specifically, we first introduce a new set of more practical yet challenging benchmarks in 3D space that allows the agent to have full Degree-of-Freedoms to explore in arbitrary directions starting from arbitrary configurations. Moreover, to optimize the policy over the enlarged state-action space, we propose to inject geometric symmetry, i.e., subequivariance, into the modeling of the policy and Q-function such that the policy can generalize to all directions, improving exploration efficiency. This goal is achieved by a novel SubEquivariant Transformer (SET) that permits expressive message exchange. Finally, we evaluate the proposed method on the proposed benchmarks, where our method consistently and significantly outperforms existing approaches on single-task, multi-task, and zero-shot generalization scenarios. Extensive ablations are also conducted to verify our design.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23i.html
PDF: https://proceedings.mlr.press/v202/chen23i/chen23i.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23i.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Runfa
family: Chen
- given: Jiaqi
family: Han
- given: Fuchun
family: Sun
- given: Wenbing
family: Huang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4545-4565
id: chen23i
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4545
lastpage: 4565
published: 2023-07-03 00:00:00 +0000
- title: 'GuardHFL: Privacy Guardian for Heterogeneous Federated Learning'
abstract: 'Heterogeneous federated learning (HFL) enables clients with different computation and communication capabilities to collaboratively train their own customized models via a query-response paradigm on auxiliary datasets. However, such a paradigm raises serious privacy concerns due to the leakage of highly sensitive query samples and response predictions. We put forth GuardHFL, the first-of-its-kind efficient and privacy-preserving HFL framework. GuardHFL is equipped with a novel HFL-friendly secure querying scheme built on lightweight secret sharing and symmetric-key techniques. The core of GuardHFL is two customized multiplication and comparison protocols, which substantially boost the execution efficiency. Extensive evaluations demonstrate that GuardHFL significantly outperforms the alternative instantiations based on existing state-of-the-art techniques in both runtime and communication cost.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23j.html
PDF: https://proceedings.mlr.press/v202/chen23j/chen23j.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23j.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hanxiao
family: Chen
- given: Meng
family: Hao
- given: Hongwei
family: Li
- given: Kangjie
family: Chen
- given: Guowen
family: Xu
- given: Tianwei
family: Zhang
- given: Xilin
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4566-4584
id: chen23j
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4566
lastpage: 4584
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient and Degree-Guided Graph Generation via Discrete Diffusion Modeling'
abstract: 'Diffusion-based generative graph models have been proven effective in generating high-quality small graphs. However, they need to be more scalable for generating large graphs containing thousands of nodes desiring graph statistics. In this work, we propose EDGE, a new diffusion-based generative graph model that addresses generative tasks with large graphs. To improve computation efficiency, we encourage graph sparsity by using a discrete diffusion process that randomly removes edges at each time step and finally obtains an empty graph. EDGE only focuses on a portion of nodes in the graph at each denoising step. It makes much fewer edge predictions than previous diffusion-based models. Moreover, EDGE admits explicitly modeling the node degrees of the graphs, further improving the model performance. The empirical study shows that EDGE is much more efficient than competing methods and can generate large graphs with thousands of nodes. It also outperforms baseline models in generation quality: graphs generated by our approach have more similar graph statistics to those of the training graphs.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23k.html
PDF: https://proceedings.mlr.press/v202/chen23k/chen23k.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23k.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xiaohui
family: Chen
- given: Jiaxing
family: He
- given: Xu
family: Han
- given: Liping
family: Liu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4585-4610
id: chen23k
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4585
lastpage: 4610
published: 2023-07-03 00:00:00 +0000
- title: 'Evolving Semantic Prototype Improves Generative Zero-Shot Learning'
abstract: 'In zero-shot learning (ZSL), generative methods synthesize class-related sample features based on predefined semantic prototypes. They advance the ZSL performance by synthesizing unseen class sample features for better training the classifier. We observe that each class’s predefined semantic prototype (also referred to as semantic embedding or condition) does not accurately match its real semantic prototype. So the synthesized visual sample features do not faithfully represent the real sample features, limiting the classifier training and existing ZSL performance. In this paper, we formulate this mismatch phenomenon as the visual-semantic domain shift problem. We propose a dynamic semantic prototype evolving (DSP) method to align the empirically predefined semantic prototypes and the real prototypes for class-related feature synthesis. The alignment is learned by refining sample features and semantic prototypes in a unified framework and making the synthesized visual sample features approach real sample features. After alignment, synthesized sample features from unseen classes are closer to the real sample features and benefit DSP to improve existing generative ZSL methods by 8.5%, 8.0%, and 9.7% on the standard CUB, SUN AWA2 datasets, the significant performance improvement indicates that evolving semantic prototype explores a virgin field in ZSL.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23l.html
PDF: https://proceedings.mlr.press/v202/chen23l/chen23l.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23l.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shiming
family: Chen
- given: Wenjin
family: Hou
- given: Ziming
family: Hong
- given: Xiaohan
family: Ding
- given: Yibing
family: Song
- given: Xinge
family: You
- given: Tongliang
family: Liu
- given: Kun
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4611-4622
id: chen23l
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4611
lastpage: 4622
published: 2023-07-03 00:00:00 +0000
- title: 'Explore and Exploit the Diverse Knowledge in Model Zoo for Domain Generalization'
abstract: 'The proliferation of pretrained models, as a result of advancements in pretraining techniques, has led to the emergence of a vast zoo of publicly available models. Effectively utilizing these resources to obtain models with robust out-of-distribution generalization capabilities for downstream tasks has become a crucial area of research. Previous research has primarily focused on identifying the most powerful models within the model zoo, neglecting to fully leverage the diverse inductive biases contained within. This paper argues that the knowledge contained in weaker models is valuable and presents a method for leveraging the diversity within the model zoo to improve out-of-distribution generalization capabilities. Specifically, we investigate the behaviors of various pretrained models across different domains of downstream tasks by characterizing the variations in their encoded representations in terms of two dimensions: diversity shift and correlation shift. This characterization enables us to propose a new algorithm for integrating diverse pretrained models, not limited to the strongest models, in order to achieve enhanced out-of-distribution generalization performance. Our proposed method demonstrates state-of-the-art empirical results on a variety of datasets, thus validating the benefits of utilizing diverse knowledge.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23m.html
PDF: https://proceedings.mlr.press/v202/chen23m/chen23m.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23m.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yimeng
family: Chen
- given: Tianyang
family: Hu
- given: Fengwei
family: Zhou
- given: Zhenguo
family: Li
- given: Zhi-Ming
family: Ma
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4623-4640
id: chen23m
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4623
lastpage: 4640
published: 2023-07-03 00:00:00 +0000
- title: 'Decentralized Stochastic Bilevel Optimization with Improved per-Iteration Complexity'
abstract: 'Bilevel optimization recently has received tremendous attention due to its great success in solving important machine learning problems like meta learning, reinforcement learning, and hyperparameter optimization. Extending single-agent training on bilevel problems to the decentralized setting is a natural generalization, and there has been a flurry of work studying decentralized bilevel optimization algorithms. However, it remains unknown how to design the distributed algorithm with sample complexity and convergence rate comparable to SGD for stochastic optimization, and at the same time without directly computing the exact Hessian or Jacobian matrices. In this paper we propose such an algorithm. More specifically, we propose a novel decentralized stochastic bilevel optimization (DSBO) algorithm that only requires first order stochastic oracle, Hessian-vector product and Jacobian-vector product oracle. The sample complexity of our algorithm matches the currently best known results for DSBO, while our algorithm does not require estimating the full Hessian and Jacobian matrices, thereby possessing to improved per-iteration complexity.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23n.html
PDF: https://proceedings.mlr.press/v202/chen23n/chen23n.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23n.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xuxing
family: Chen
- given: Minhui
family: Huang
- given: Shiqian
family: Ma
- given: Krishna
family: Balasubramanian
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4641-4671
id: chen23n
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4641
lastpage: 4671
published: 2023-07-03 00:00:00 +0000
- title: 'Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data'
abstract: 'Diffusion models achieve state-of-the-art performance in various generation tasks. However, their theoretical foundations fall far behind. This paper studies score approximation, estimation, and distribution recovery of diffusion models, when data are supported on an unknown low-dimensional linear subspace. Our result provides sample complexity bounds for distribution estimation using diffusion models. We show that with a properly chosen neural network architecture, the score function can be both accurately approximated and efficiently estimated. Further, the generated distribution based on the estimated score function captures the data geometric structures and converges to a close vicinity of the data distribution. The convergence rate depends on subspace dimension, implying that diffusion models can circumvent the curse of data ambient dimensionality.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23o.html
PDF: https://proceedings.mlr.press/v202/chen23o/chen23o.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23o.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Minshuo
family: Chen
- given: Kaixuan
family: Huang
- given: Tuo
family: Zhao
- given: Mengdi
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4672-4712
id: chen23o
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4672
lastpage: 4712
published: 2023-07-03 00:00:00 +0000
- title: 'Sample Complexity of Probability Divergences under Group Symmetry'
abstract: 'We rigorously quantify the improvement in the sample complexity of variational divergence estimations for group-invariant distributions. In the cases of the Wasserstein-1 metric and the Lipschitz-regularized $\alpha$-divergences, the reduction of sample complexity is proportional to an ambient-dimension-dependent power of the group size. For the maximum mean discrepancy (MMD), the improvement of sample complexity is more nuanced, as it depends on not only the group size but also the choice of kernel. Numerical simulations verify our theories.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23p.html
PDF: https://proceedings.mlr.press/v202/chen23p/chen23p.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23p.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ziyu
family: Chen
- given: Markos
family: Katsoulakis
- given: Luc
family: Rey-Bellet
- given: Wei
family: Zhu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4713-4734
id: chen23p
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4713
lastpage: 4734
published: 2023-07-03 00:00:00 +0000
- title: 'Improved Analysis of Score-based Generative Modeling: User-Friendly Bounds under Minimal Smoothness Assumptions'
abstract: 'We give an improved theoretical analysis of score-based generative modeling. Under a score estimate with small $L^2$ error (averaged across timesteps), we provide efficient convergence guarantees for any data distribution with second-order moment, by either employing early stopping or assuming smoothness condition on the score function of the data distribution. Our result does not rely on any log-concavity or functional inequality assumption and has a logarithmic dependence on the smoothness. In particular, we show that under only a finite second moment condition, approximating the following in reverse KL divergence in $\epsilon$-accuracy can be done in $\tilde O\left(\frac{d \log (1/\delta)}{\epsilon}\right)$ steps: 1) the variance-$\delta$ Gaussian perturbation of any data distribution; 2) data distributions with $1/\delta$-smooth score functions. Our analysis also provides a quantitative comparison between different discrete approximations and may guide the choice of discretization points in practice.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23q.html
PDF: https://proceedings.mlr.press/v202/chen23q/chen23q.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23q.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hongrui
family: Chen
- given: Holden
family: Lee
- given: Jianfeng
family: Lu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4735-4763
id: chen23q
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4735
lastpage: 4763
published: 2023-07-03 00:00:00 +0000
- title: 'Bidirectional Looking with A Novel Double Exponential Moving Average to Adaptive and Non-adaptive Momentum Optimizers'
abstract: 'Optimizer is an essential component for the success of deep learning, which guides the neural network to update the parameters according to the loss on the training set. SGD and Adam are two classical and effective optimizers on which researchers have proposed many variants, such as SGDM and RAdam. In this paper, we innovatively combine the backward-looking and forward-looking aspects of the optimizer algorithm and propose a novel Admeta (**A** **D**ouble exponential **M**oving averag**E** **T**o **A**daptive and non-adaptive momentum) optimizer framework. For backward-looking part, we propose a DEMA variant scheme, which is motivated by a metric in the stock market, to replace the common exponential moving average scheme. While in the forward-looking part, we present a dynamic lookahead strategy which asymptotically approaches a set value, maintaining its speed at early stage and high convergence performance at final stage. Based on this idea, we provide two optimizer implementations, AdmetaR and AdmetaS, the former based on RAdam and the latter based on SGDM. Through extensive experiments on diverse tasks, we find that the proposed Admeta optimizer outperforms our base optimizers and shows advantages over recently proposed competitive optimizers. We also provide theoretical proof of these two algorithms, which verifies the convergence of our proposed Admeta.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23r.html
PDF: https://proceedings.mlr.press/v202/chen23r/chen23r.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23r.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yineng
family: Chen
- given: Zuchao
family: Li
- given: Lefei
family: Zhang
- given: Bo
family: Du
- given: Hai
family: Zhao
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4764-4803
id: chen23r
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4764
lastpage: 4803
published: 2023-07-03 00:00:00 +0000
- title: 'HarsanyiNet: Computing Accurate Shapley Values in a Single Forward Propagation'
abstract: 'The Shapley value is widely regarded as a trustworthy attribution metric. However, when people use Shapley values to explain the attribution of input variables of a deep neural network (DNN), it usually requires a very high computational cost to approximate relatively accurate Shapley values in real-world applications. Therefore, we propose a novel network architecture, the HarsanyiNet, which makes inferences on the input sample and simultaneously computes the exact Shapley values of the input variables in a single forward propagation. The HarsanyiNet is designed on the theoretical foundation that the Shapley value can be reformulated as the redistribution of Harsanyi interactions encoded by the network.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23s.html
PDF: https://proceedings.mlr.press/v202/chen23s/chen23s.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23s.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lu
family: Chen
- given: Siyu
family: Lou
- given: Keyan
family: Zhang
- given: Jin
family: Huang
- given: Quanshi
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4804-4825
id: chen23s
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4804
lastpage: 4825
published: 2023-07-03 00:00:00 +0000
- title: 'Generalized Implicit Follow-The-Regularized-Leader'
abstract: 'We propose a new class of online learning algorithms, generalized implicit Follow-The-Regularized-Leader (FTRL), that expands the scope of FTRL framework. Generalized implicit FTRL can recover known algorithms, such as FTRL with linearized losses and implicit FTRL, and it allows the design of new update rules, as extensions of aProx and Mirror-Prox to FTRL. Our theory is constructive in the sense that it provides a simple unifying framework to design updates that directly improve the worst-case upper bound on the regret. The key idea is substituting the linearization of the losses with a Fenchel-Young inequality. We show the flexibility of the framework by proving that some known algorithms, like the Mirror-Prox updates, are instantiations of the generalized implicit FTRL. Finally, the new framework allows us to recover the temporal variation bound of implicit OMD, with the same computational complexity.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23t.html
PDF: https://proceedings.mlr.press/v202/chen23t/chen23t.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23t.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Keyi
family: Chen
- given: Francesco
family: Orabona
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4826-4838
id: chen23t
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4826
lastpage: 4838
published: 2023-07-03 00:00:00 +0000
- title: 'Fisher Information Embedding for Node and Graph Learning'
abstract: 'Attention-based graph neural networks (GNNs), such as graph attention networks (GATs), have become popular neural architectures for processing graph-structured data and learning node embeddings. Despite their empirical success, these models rely on labeled data and the theoretical properties of these models have yet to be fully understood. In this work, we propose a novel attention-based node embedding framework for graphs. Our framework builds upon a hierarchical kernel for multisets of subgraphs around nodes (e.g. neighborhoods) and each kernel leverages the geometry of a smooth statistical manifold to compare pairs of multisets, by “projecting” the multisets onto the manifold. By explicitly computing node embeddings with a manifold of Gaussian mixtures, our method leads to a new attention mechanism for neighborhood aggregation. We provide theoretical insights into generalizability and expressivity of our embeddings, contributing to a deeper understanding of attention-based GNNs. We propose both efficient unsupervised and supervised methods for learning the embeddings. Through experiments on several node classification benchmarks, we demonstrate that our proposed method outperforms existing attention-based graph models like GATs. Our code is available at https://github.com/BorgwardtLab/fisher_information_embedding.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23u.html
PDF: https://proceedings.mlr.press/v202/chen23u/chen23u.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23u.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dexiong
family: Chen
- given: Paolo
family: Pellizzoni
- given: Karsten
family: Borgwardt
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4839-4855
id: chen23u
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4839
lastpage: 4855
published: 2023-07-03 00:00:00 +0000
- title: 'Rethinking Visual Reconstruction: Experience-Based Content Completion Guided by Visual Cues'
abstract: 'Decoding seen images from brain activities has been an absorbing field. However, the reconstructed images still suffer from low quality with existing studies. This can be because our visual system is not like a camera that ”remembers” every pixel. Instead, only part of the information can be perceived with our selective attention, and the brain ”guesses” the rest to form what we think we see. Most existing approaches ignored the brain completion mechanism. In this work, we propose to reconstruct seen images with both the visual perception and the brain completion process, and design a simple, yet effective visual decoding framework to achieve this goal. Specifically, we first construct a shared discrete representation space for both brain signals and images. Then, a novel self-supervised token-to-token inpainting network is designed to implement visual content completion by building context and prior knowledge about the visual objects from the discrete latent space. Our approach improved the quality of visual reconstruction significantly and achieved state-of-the-art.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23v.html
PDF: https://proceedings.mlr.press/v202/chen23v/chen23v.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23v.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiaxuan
family: Chen
- given: Yu
family: Qi
- given: Gang
family: Pan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4856-4866
id: chen23v
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4856
lastpage: 4866
published: 2023-07-03 00:00:00 +0000
- title: 'Stratified Adversarial Robustness with Rejection'
abstract: 'Recently, there is an emerging interest in adversarially training a classifier with a rejection option (also known as a selective classifier) for boosting adversarial robustness. While rejection can incur a cost in many applications, existing studies typically associate zero cost with rejecting perturbed inputs, which can result in the rejection of numerous slightly-perturbed inputs that could be correctly classified. In this work, we study adversarially-robust classification with rejection in the stratified rejection setting, where the rejection cost is modeled by rejection loss functions monotonically non-increasing in the perturbation magnitude. We theoretically analyze the stratified rejection setting and propose a novel defense method – Adversarial Training with Consistent Prediction-based Rejection (CPR) – for building a robust selective classifier. Experiments on image datasets demonstrate that the proposed method significantly outperforms existing methods under strong adaptive attacks. For instance, on CIFAR-10, CPR reduces the total robust loss (for different rejection losses) by at least 7.3% under both seen and unseen attacks.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23w.html
PDF: https://proceedings.mlr.press/v202/chen23w/chen23w.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23w.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiefeng
family: Chen
- given: Jayaram
family: Raghuram
- given: Jihye
family: Choi
- given: Xi
family: Wu
- given: Yingyu
family: Liang
- given: Somesh
family: Jha
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4867-4894
id: chen23w
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4867
lastpage: 4894
published: 2023-07-03 00:00:00 +0000
- title: 'Multi-task Hierarchical Adversarial Inverse Reinforcement Learning'
abstract: 'Multi-task Imitation Learning (MIL) aims to train a policy capable of performing a distribution of tasks based on multi-task expert demonstrations, which is essential for general-purpose robots. Existing MIL algorithms suffer from low data efficiency and poor performance on complex long-horizontal tasks. We develop Multi-task Hierarchical Adversarial Inverse Reinforcement Learning (MH-AIRL) to learn hierarchically-structured multi-task policies, which is more beneficial for compositional tasks with long horizons and has higher expert data efficiency through identifying and transferring reusable basic skills across tasks. To realize this, MH-AIRL effectively synthesizes context-based multi-task learning, AIRL (an IL approach), and hierarchical policy learning. Further, MH-AIRL can be adopted to demonstrations without the task or skill annotations (i.e., state-action pairs only) which are more accessible in practice. Theoretical justifications are provided for each module of MH-AIRL, and evaluations on challenging multi-task settings demonstrate superior performance and transferability of the multi-task policies learned with MH-AIRL as compared to SOTA MIL baselines.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23x.html
PDF: https://proceedings.mlr.press/v202/chen23x/chen23x.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23x.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiayu
family: Chen
- given: Dipesh
family: Tamboli
- given: Tian
family: Lan
- given: Vaneet
family: Aggarwal
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4895-4920
id: chen23x
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4895
lastpage: 4920
published: 2023-07-03 00:00:00 +0000
- title: 'Model Transferability with Responsive Decision Subjects'
abstract: 'Given an algorithmic predictor that is accurate on some source population consisting of strategic human decision subjects, will it remain accurate if the population respond to it? In our setting, an agent or a user corresponds to a sample $(X,Y)$ drawn from a distribution $\cal{D}$ and will face a model $h$ and its classification result $h(X)$. Agents can modify $X$ to adapt to $h$, which will incur a distribution shift on $(X,Y)$. Our formulation is motivated by applications where the deployed machine learning models are subjected to human agents, and will ultimately face responsive and interactive data distributions. We formalize the discussions of the transferability of a model by studying how the performance of the model trained on the available source distribution (data) would translate to the performance on its induced domain. We provide both upper bounds for the performance gap due to the induced domain shift, as well as lower bounds for the trade-offs that a classifier has to suffer on either the source training distribution or the induced target distribution. We provide further instantiated analysis for two popular domain adaptation settings, including covariate shift and target shift.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23y.html
PDF: https://proceedings.mlr.press/v202/chen23y/chen23y.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23y.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yatong
family: Chen
- given: Zeyu
family: Tang
- given: Kun
family: Zhang
- given: Yang
family: Liu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4921-4952
id: chen23y
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4921
lastpage: 4952
published: 2023-07-03 00:00:00 +0000
- title: 'Layered State Discovery for Incremental Autonomous Exploration'
abstract: 'We study the autonomous exploration (AX) problem proposed by Lim & Auer (2012). In this setting, the objective is to discover a set of $\epsilon$-optimal policies reaching a set $\mathcal{S}_L^{\rightarrow}$ of incrementally $L$-controllable states. We introduce a novel layered decomposition of the set of incrementally $L$-controllable states that is based on the iterative application of a state-expansion operator. We leverage these results to design Layered Autonomous Exploration (LAE), a novel algorithm for AX that attains a sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L(1+\epsilon)}\Gamma_{L(1+\epsilon)} A \ln^{12}(S^{\rightarrow}_{L(1+\epsilon)})/\epsilon^2)$, where $S^{\rightarrow}_{L(1+\epsilon)}$ is the number of states that are incrementally $L(1+\epsilon)$-controllable, $A$ is the number of actions, and $\Gamma_{L(1+\epsilon)}$ is the branching factor of the transitions over such states. LAE improves over the algorithm of Tarbouriech et al. (2020a) by a factor of $L^2$ and it is the first algorithm for AX that works in a countably-infinite state space. Moreover, we show that, under a certain identifiability assumption, LAE achieves minimax-optimal sample complexity of $\tilde{\mathcal{O}}(LS^{\rightarrow}_{L}A\ln^{12}(S^{\rightarrow}_{L})/\epsilon^2)$, outperforming existing algorithms and matching for the first time the lower bound proved by Cai et al. (2022) up to logarithmic factors.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23z.html
PDF: https://proceedings.mlr.press/v202/chen23z/chen23z.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23z.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Liyu
family: Chen
- given: Andrea
family: Tirinzoni
- given: Alessandro
family: Lazaric
- given: Matteo
family: Pirotta
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 4953-5001
id: chen23z
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 4953
lastpage: 5001
published: 2023-07-03 00:00:00 +0000
- title: 'Optimistic Online Mirror Descent for Bridging Stochastic and Adversarial Online Convex Optimization'
abstract: 'Stochastically Extended Adversarial (SEA) model is introduced by Sachs et al. (2022) as an interpolation between stochastic and adversarial online convex optimization. Under the smoothness condition, they demonstrate that the expected regret of optimistic follow-the-regularized-leader (FTRL) depends on the cumulative stochastic variance $\sigma_{1:T}^2$ and the cumulative adversarial variation $\Sigma_{1:T}^2$ for convex functions. They also provide a slightly weaker bound based on the maximal stochastic variance $\sigma_{\max}^2$ and the maximal adversarial variation $\Sigma_{\max}^2$ for strongly convex functions. Inspired by their work, we investigate the theoretical guarantees of optimistic online mirror descent (OMD) for the SEA model. For convex and smooth functions, we obtain the same $\mathcal{O}(\sqrt{\sigma_{1:T}^2}+\sqrt{\Sigma_{1:T}^2})$ regret bound, without the convexity requirement of individual functions. For strongly convex and smooth functions, we establish an $\mathcal{O}(\min\{\log (\sigma_{1:T}^2+\Sigma_{1:T}^2), (\sigma_{\max}^2 + \Sigma_{\max}^2) \log T\})$ bound, better than their $\mathcal{O}((\sigma_{\max}^2 + \Sigma_{\max}^2) \log T)$ result. For exp-concave and smooth functions, we achieve a new $\mathcal{O}(d\log(\sigma_{1:T}^2+\Sigma_{1:T}^2))$ bound. Owing to the OMD framework, we further establish dynamic regret for convex and smooth functions, which is more favorable in non-stationary online scenarios.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23aa.html
PDF: https://proceedings.mlr.press/v202/chen23aa/chen23aa.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23aa.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sijia
family: Chen
- given: Wei-Wei
family: Tu
- given: Peng
family: Zhao
- given: Lijun
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5002-5035
id: chen23aa
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5002
lastpage: 5035
published: 2023-07-03 00:00:00 +0000
- title: 'Learning to Optimize Differentiable Games'
abstract: 'Many machine learning problems can be abstracted in solving game theory formulations and boil down to optimizing nested objectives, such as generative adversarial networks (GANs) and multi-agent reinforcement learning. Solving these games requires finding their stable fixed points or Nash equilibrium. However, existing algorithms for solving games suffer from empirical instability, hence demanding heavy ad-hoc tuning in practice. To tackle these challenges, we resort to the emerging scheme of Learning to Optimize (L2O), which discovers problem-specific efficient optimization algorithms through data-driven training. Our customized L2O framework for differentiable game theory problems, dubbed “Learning to Play Games" (L2PG), seeks a stable fixed point solution, by predicting the fast update direction from the past trajectory, with a novel gradient stability-aware, sign-based loss function. We further incorporate curriculum learning and self-learning to strengthen the empirical training stability and generalization of L2PG. On test problems including quadratic games and GANs, L2PG can substantially accelerate the convergence, and demonstrates a remarkably more stable trajectory. Codes are available at https://github.com/VITA-Group/L2PG.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ab.html
PDF: https://proceedings.mlr.press/v202/chen23ab/chen23ab.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ab.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xuxi
family: Chen
- given: Nelson
family: Vadori
- given: Tianlong
family: Chen
- given: Zhangyang
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5036-5051
id: chen23ab
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5036
lastpage: 5051
published: 2023-07-03 00:00:00 +0000
- title: 'Coordinated Dynamic Bidding in Repeated Second-Price Auctions with Budgets'
abstract: 'In online ad markets, a rising number of advertisers are employing bidding agencies to participate in ad auctions. These agencies are specialized in designing online algorithms and bidding on behalf of their clients. Typically, an agency usually has information on multiple advertisers, so she can potentially coordinate bids to help her clients achieve higher utilities than those under independent bidding. In this paper, we study coordinated online bidding algorithms in repeated second-price auctions with budgets. We propose algorithms that guarantee every client a higher utility than the best she can get under independent bidding. We show that these algorithms achieve maximal social welfare and discuss bidders’ incentives to misreport their budgets, in symmetric cases. Our proofs combine the techniques of online learning and equilibrium analysis, overcoming the difficulty of competing with a multi-dimensional benchmark. The performance of our algorithms is further evaluated by experiments on both synthetic and real data. To the best of our knowledge, we are the first to consider bidder coordination in online repeated auctions with constraints.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ac.html
PDF: https://proceedings.mlr.press/v202/chen23ac/chen23ac.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ac.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yurong
family: Chen
- given: Qian
family: Wang
- given: Zhijian
family: Duan
- given: Haoran
family: Sun
- given: Zhaohua
family: Chen
- given: Xiang
family: Yan
- given: Xiaotie
family: Deng
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5052-5086
id: chen23ac
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5052
lastpage: 5086
published: 2023-07-03 00:00:00 +0000
- title: 'Semi-Offline Reinforcement Learning for Optimized Text Generation'
abstract: 'Existing reinforcement learning (RL) mainly utilize online or offline settings. The online methods explore the environment with expensive time cost, and the offline methods efficiently obtain reward signals by sacrificing the exploration capability. We propose semi-offline RL, a novel paradigm that can smoothly transit from the offline setting to the online setting, balances the exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline MDP formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline RL approach is effective in various text generation tasks and datasets, and yields comparable or usually better performance compared with the state-of-the-art methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ad.html
PDF: https://proceedings.mlr.press/v202/chen23ad/chen23ad.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ad.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Changyu
family: Chen
- given: Xiting
family: Wang
- given: Yiqiao
family: Jin
- given: Victor Ye
family: Dong
- given: Li
family: Dong
- given: Jie
family: Cao
- given: Yi
family: Liu
- given: Rui
family: Yan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5087-5103
id: chen23ad
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5087
lastpage: 5103
published: 2023-07-03 00:00:00 +0000
- title: 'Lower Bounds for Learning in Revealing POMDPs'
abstract: 'This paper studies the fundamental limits of reinforcement learning (RL) in the challenging *partially observable* setting. While it is well-established that learning in Partially Observable Markov Decision Processes (POMDPs) requires exponentially many samples in the worst case, a surge of recent work shows that polynomial sample complexities are achievable under the *revealing condition*—A natural condition that requires the observables to reveal some information about the unobserved latent states. However, the fundamental limits for learning in revealing POMDPs are much less understood, with existing lower bounds being rather preliminary and having substantial gaps from the current best upper bounds. We establish strong PAC and regret lower bounds for learning in revealing POMDPs. Our lower bounds scale polynomially in all relevant problem parameters in a multiplicative fashion, and achieve significantly smaller gaps against the current best upper bounds, providing a solid starting point for future studies. In particular, for *multi-step* revealing POMDPs, we show that (1) the latent state-space dependence is at least $\Omega(S^{1.5})$ in the PAC sample complexity, which is notably harder than the $\widetilde{\Theta}(S)$ scaling for fully-observable MDPs; (2) Any polynomial sublinear regret is at least $\Omega(T^{2/3})$, suggesting its fundamental difference from the *single-step* case where $\widetilde{\mathcal{O}}(\sqrt{T})$ regret is achievable. Technically, our hard instance construction adapts techniques in *distribution testing*, which is new to the RL literature and may be of independent interest. We also complement our results with new sharp regret upper bounds for *strongly B-stable PSRs*, which include single-step revealing POMDPs as a special case.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ae.html
PDF: https://proceedings.mlr.press/v202/chen23ae/chen23ae.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ae.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Fan
family: Chen
- given: Huan
family: Wang
- given: Caiming
family: Xiong
- given: Song
family: Mei
- given: Yu
family: Bai
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5104-5161
id: chen23ae
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5104
lastpage: 5161
published: 2023-07-03 00:00:00 +0000
- title: 'Implicit Neural Spatial Representations for Time-dependent PDEs'
abstract: 'Implicit Neural Spatial Representation (INSR) has emerged as an effective representation of spatially-dependent vector fields. This work explores solving time-dependent PDEs with INSR. Classical PDE solvers introduce both temporal and spatial discretizations. Common spatial discretizations include meshes and meshless point clouds, where each degree-of-freedom corresponds to a location in space. While these explicit spatial correspondences are intuitive to model and understand, these representations are not necessarily optimal for accuracy, memory usage, or adaptivity. Keeping the classical temporal discretization unchanged (e.g., explicit/implicit Euler), we explore INSR as an alternative spatial discretization, where spatial information is implicitly stored in the neural network weights. The network weights then evolve over time via time integration. Our approach does not require any training data generated by existing solvers because our approach is the solver itself. We validate our approach on various PDEs with examples involving large elastic deformations, turbulent fluids, and multi-scale phenomena. While slower to compute than traditional representations, our approach exhibits higher accuracy and lower memory consumption. Whereas classical solvers can dynamically adapt their spatial representation only by resorting to complex remeshing algorithms, our INSR approach is intrinsically adaptive. By tapping into the rich literature of classic time integrators, e.g., operator-splitting schemes, our method enables challenging simulations in contact mechanics and turbulent flows where previous neural-physics approaches struggle. Videos and codes are available on the project page: http://www.cs.columbia.edu/cg/INSR-PDE/'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23af.html
PDF: https://proceedings.mlr.press/v202/chen23af/chen23af.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23af.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Honglin
family: Chen
- given: Rundi
family: Wu
- given: Eitan
family: Grinspun
- given: Changxi
family: Zheng
- given: Peter Yichen
family: Chen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5162-5177
id: chen23af
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5162
lastpage: 5177
published: 2023-07-03 00:00:00 +0000
- title: 'BEATs: Audio Pre-Training with Acoustic Tokenizers'
abstract: 'We introduce a self-supervised learning (SSL) framework BEATs for general audio representation pre-training, where we optimize an acoustic tokenizer and an audio SSL model by iterations. Unlike the previous audio SSL models that employ reconstruction loss for pre-training, our audio SSL model is trained with the discrete label prediction task, where the labels are generated by a semantic-rich acoustic tokenizer. We propose an iterative pipeline to jointly optimize the tokenizer and the pre-trained model, aiming to abstract high-level semantics and discard the redundant details for audio. The experimental results demonstrate our acoustic tokenizers can generate discrete labels with rich audio semantics and our audio SSL models achieve state-of-the-art (SOTA) results across various audio classification benchmarks, even outperforming previous models that use more training data and model parameters significantly. Specifically, we set a new SOTA mAP 50.6% on AudioSet-2M without using any external data, and 98.1% accuracy on ESC-50. The code and pre-trained models are available at https://aka.ms/beats.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ag.html
PDF: https://proceedings.mlr.press/v202/chen23ag/chen23ag.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ag.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sanyuan
family: Chen
- given: Yu
family: Wu
- given: Chengyi
family: Wang
- given: Shujie
family: Liu
- given: Daniel
family: Tompkins
- given: Zhuo
family: Chen
- given: Wanxiang
family: Che
- given: Xiangzhan
family: Yu
- given: Furu
family: Wei
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5178-5193
id: chen23ag
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5178
lastpage: 5193
published: 2023-07-03 00:00:00 +0000
- title: 'Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model'
abstract: 'We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf. Such a problem is modeled as a Stackelberg game between the principal and the agent, where the principal announces a scoring rule that specifies the payment, and then the agent then chooses an effort level that maximizes her own profit and reports the information. We study the online setting of such a problem from the principal’s perspective, i.e., designing the optimal scoring rule by repeatedly interacting with the strategic agent. We design a provably sample efficient algorithm that tailors the UCB algorithm (Auer et al., 2002) to our model, which achieves a $\mathcal{O} (K^2\cdot T^{2/3})$ regret after $T$ iterations, where $K$ is the number of effort levels of the agent. Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent’s actions are incentivized. Furthermore, a key feature of our regret bound is that it is independent of the number of states of the environment.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ah.html
PDF: https://proceedings.mlr.press/v202/chen23ah/chen23ah.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ah.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siyu
family: Chen
- given: Jibang
family: Wu
- given: Yifan
family: Wu
- given: Zhuoran
family: Yang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5194-5218
id: chen23ah
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5194
lastpage: 5218
published: 2023-07-03 00:00:00 +0000
- title: 'Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization'
abstract: 'We consider the optimization problem of the form $\min_{x \in \mathbb{R}^d} f(x) \triangleq \mathbb{E}[F(x;\xi)]$ , where the component $F(x;\xi)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth.The recently proposed gradient-free method requires at most $\mathcal{O}( L^4 d^{3/2} \epsilon^{-4} + \Delta L^3 d^{3/2} \delta^{-1} \epsilon^{-4})$ stochastic zeroth-order oracle complexity to find a $(\delta,\epsilon)$-Goldstein stationary point of objective function, where $\Delta = f(x_0) - \inf_{x \in \mathbb{R}^d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L^3 d^{3/2} \epsilon^{-3}+ \Delta L^2 d^{3/2} \delta^{-1} \epsilon^{-3})$.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ai.html
PDF: https://proceedings.mlr.press/v202/chen23ai/chen23ai.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ai.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lesi
family: Chen
- given: Jing
family: Xu
- given: Luo
family: Luo
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5219-5233
id: chen23ai
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5219
lastpage: 5233
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient Personalized Federated Learning via Sparse Model-Adaptation'
abstract: 'Federated Learning (FL) aims to train machine learning models for multiple clients without sharing their own private data. Due to the heterogeneity of clients’ local data distribution, recent studies explore the personalized FL that learns and deploys distinct local models with the help of auxiliary global models. However, the clients can be heterogeneous in terms of not only local data distribution, but also their computation and communication resources. The capacity and efficiency of personalized models are restricted by the lowest-resource clients, leading to sub-optimal performance and limited practicality of personalized FL. To overcome these challenges, we propose a novel approach named pFedGate for efficient personalized FL by adaptively and efficiently learning sparse local models. With a lightweight trainable gating layer, pFedGate enables clients to reach their full potential in model capacity by generating different sparse models accounting for both the heterogeneous data distributions and resource constraints. Meanwhile, the computation and communication efficiency are both improved thanks to the adaptability between the model sparsity and clients’ resources. Further, we theoretically show that the proposed pFedGate has superior complexity with guaranteed convergence and generalization error. Extensive experiments show that pFedGate achieves superior global accuracy, individual accuracy and efficiency simultaneously over state-of-the-art methods. We also demonstrate that pFedGate performs better than competitors in the novel clients participation and partial clients participation scenarios, and can learn meaningful sparse local models adapted to different data distributions.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23aj.html
PDF: https://proceedings.mlr.press/v202/chen23aj/chen23aj.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23aj.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daoyuan
family: Chen
- given: Liuyi
family: Yao
- given: Dawei
family: Gao
- given: Bolin
family: Ding
- given: Yaliang
family: Li
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5234-5256
id: chen23aj
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5234
lastpage: 5256
published: 2023-07-03 00:00:00 +0000
- title: 'A Gromov-Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening'
abstract: 'Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a different perspective, developing a theory for preserving graph distances and proposing a method to achieve this. The geometric approach is useful when working with a collection of graphs, such as in graph classification and regression. In this study, we consider a graph as an element on a metric space equipped with the Gromov–Wasserstein (GW) distance, and bound the difference between the distance of two graphs and their coarsened versions. Minimizing this difference can be done using the popular weighted kernel $K$-means method, which improves existing spectrum-preserving methods with the proper choice of the kernel. The study includes a set of experiments to support the theory and method, including approximating the GW distance, preserving the graph spectrum, classifying graphs using spectral information, and performing regression using graph convolutional networks. Code is available at https://github.com/ychen-stat-ml/GW-Graph-Coarsening.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ak.html
PDF: https://proceedings.mlr.press/v202/chen23ak/chen23ak.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ak.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yifan
family: Chen
- given: Rentian
family: Yao
- given: Yun
family: Yang
- given: Jie
family: Chen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5257-5281
id: chen23ak
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5257
lastpage: 5281
published: 2023-07-03 00:00:00 +0000
- title: 'How to address monotonicity for model risk management?'
abstract: 'In this paper, we study the problem of establishing the accountability and fairness of transparent machine learning models through monotonicity. Although there have been numerous studies on individual monotonicity, pairwise monotonicity is often overlooked in the existing literature. This paper studies transparent neural networks in the presence of three types of monotonicity: individual monotonicity, weak pairwise monotonicity, and strong pairwise monotonicity. As a means of achieving monotonicity while maintaining transparency, we propose the monotonic groves of neural additive models. As a result of empirical examples, we demonstrate that monotonicity is often violated in practice and that monotonic groves of neural additive models are transparent, accountable, and fair.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23al.html
PDF: https://proceedings.mlr.press/v202/chen23al/chen23al.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23al.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dangxing
family: Chen
- given: Weicheng
family: Ye
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5282-5295
id: chen23al
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5282
lastpage: 5295
published: 2023-07-03 00:00:00 +0000
- title: 'Sketched Ridgeless Linear Regression: The Role of Downsampling'
abstract: 'Overparametrization often helps improve the generalization performance. This paper presents a dual view of overparametrization suggesting that downsampling may also help generalize. Focusing on the proportional regime $m\asymp n \asymp p$, where $m$ represents the sketching size, $n$ is the sample size, and $p$ is the feature dimensionality, we investigate two out-of-sample prediction risks of the sketched ridgeless least square estimator. Our findings challenge conventional beliefs by showing that downsampling does not always harm generalization but can actually improve it in certain cases. We identify the optimal sketching size that minimizes out-of-sample prediction risks and demonstrate that the optimally sketched estimator exhibits stabler risk curves, eliminating the peaks of those for the full-sample estimator. To facilitate practical implementation, we propose an empirical procedure to determine the optimal sketching size. Finally, we extend our analysis to cover central limit theorems and misspecified models. Numerical studies strongly support our theory.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23am.html
PDF: https://proceedings.mlr.press/v202/chen23am/chen23am.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23am.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xin
family: Chen
- given: Yicheng
family: Zeng
- given: Siyue
family: Yang
- given: Qiang
family: Sun
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5296-5326
id: chen23am
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5296
lastpage: 5326
published: 2023-07-03 00:00:00 +0000
- title: 'Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning'
abstract: 'Executing actions in a correlated manner is a common strategy for human coordination that often leads to better cooperation, which is also potentially beneficial for cooperative multi-agent reinforcement learning (MARL). However, the recent success of MARL relies heavily on the convenient paradigm of purely decentralized execution, where there is no action correlation among agents for scalability considerations. In this work, we introduce a Bayesian network to inaugurate correlations between agents’ action selections in their joint policy. Theoretically, we establish a theoretical justification for why action dependencies are beneficial by deriving the multi-agent policy gradient formula under such a Bayesian network joint policy and proving its global convergence to Nash equilibria under tabular softmax policy parameterization in cooperative Markov games. Further, by equipping existing MARL algorithms with a recent method of differentiable directed acyclic graphs (DAGs), we develop practical algorithms to learn the context-aware Bayesian network policies in scenarios with partial observability and various difficulty. We also dynamically decrease the sparsity of the learned DAG throughout the training process, which leads to weakly or even purely independent policies for decentralized execution. Empirical results on a range of MARL benchmarks show the benefits of our approach.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23an.html
PDF: https://proceedings.mlr.press/v202/chen23an/chen23an.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23an.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dingyang
family: Chen
- given: Qi
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5327-5350
id: chen23an
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5327
lastpage: 5350
published: 2023-07-03 00:00:00 +0000
- title: 'Bidirectional Learning for Offline Model-based Biological Sequence Design'
abstract: 'Offline model-based optimization aims to maximize a black-box objective function with a static dataset of designs and their scores. In this paper, we focus on biological sequence design to maximize some sequence score. A recent approach employs bidirectional learning, combining a forward mapping for exploitation and a backward mapping for constraint, and it relies on the neural tangent kernel (NTK) of an infinitely wide network to build a proxy model. Though effective, the NTK cannot learn features because of its parametrization, and its use prevents the incorporation of powerful pre-trained Language Models (LMs) that can capture the rich biophysical information in millions of biological sequences. We adopt an alternative proxy model, adding a linear head to a pre-trained LM, and propose a linearization scheme. This yields a closed-form loss and also takes into account the biophysical information in the pre-trained LM. In addition, the forward mapping and the backward mapping play different roles and thus deserve different weights during sequence optimization. To achieve this, we train an auxiliary model and leverage its weak supervision signal via a bi-level optimization framework to effectively learn how to balance the two mappings. Further, by extending the framework, we develop the first learning rate adaptation module *Adaptive*-$\eta$, which is compatible with all gradient-based algorithms for offline model-based optimization. Experimental results on DNA/protein sequence design tasks verify the effectiveness of our algorithm. Our code is available at https://github.com/GGchen1997/BIB-ICML2023-Submission.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ao.html
PDF: https://proceedings.mlr.press/v202/chen23ao/chen23ao.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ao.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Can
family: Chen
- given: Yingxue
family: Zhang
- given: Xue
family: Liu
- given: Mark
family: Coates
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5351-5366
id: chen23ao
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5351
lastpage: 5366
published: 2023-07-03 00:00:00 +0000
- title: 'Learning to Jump: Thinning and Thickening Latent Counts for Generative Modeling'
abstract: 'Learning to denoise has emerged as a prominent paradigm to design state-of-the-art deep generative models for natural images. How to use it to model the distributions of both continuous real-valued data and categorical data has been well studied in recently proposed diffusion models. However, it is found in this paper to have limited ability in modeling some other types of data, such as count and non-negative continuous data, that are often highly sparse, skewed, heavy-tailed, and/or overdispersed. To this end, we propose learning to jump as a general recipe for generative modeling of various types of data. Using a forward count thinning process to construct learning objectives to train a deep neural network, it employs a reverse count thickening process to iteratively refine its generation through that network. We demonstrate when learning to jump is expected to perform comparably to learning to denoise, and when it is expected to perform better. For example, learning to jump is recommended when the training data is non-negative and exhibits strong sparsity, skewness, heavy-tailedness, and/or heterogeneity.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ap.html
PDF: https://proceedings.mlr.press/v202/chen23ap/chen23ap.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ap.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tianqi
family: Chen
- given: Mingyuan
family: Zhou
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5367-5382
id: chen23ap
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5367
lastpage: 5382
published: 2023-07-03 00:00:00 +0000
- title: 'Lifelong Language Pretraining with Distribution-Specialized Experts'
abstract: 'Pretraining on a large-scale corpus has become a standard method to build general language models (LMs). Adapting a model to new data distributions targeting different downstream tasks poses significant challenges. Naive fine-tuning may incur catastrophic forgetting when the over-parameterized LMs overfit the new data but fail to preserve the pretrained features. Lifelong learning (LLL) aims to enable information systems to learn from a continuous data stream across time. However, most prior work modifies the training recipe assuming a static fixed network architecture. We find that additional model capacity and proper regularization are key elements to achieving strong LLL performance. Thus, we propose Lifelong-MoE, an extensible MoE (Mixture-of-Experts) architecture that dynamically adds model capacity via adding experts with regularized pretaining. Our results show that by only introducing a limited number of extra experts while keeping the computation cost constant, our model can steadily adapt to data distribution shifts while preserving the previous knowledge. Compared to existing lifelong learning approaches, Lifelong-MoE achieves better few-shot performance on NLP tasks. More impressively, Lifelong-MoE surpasses multi-task learning on 19 downstream NLU tasks.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23aq.html
PDF: https://proceedings.mlr.press/v202/chen23aq/chen23aq.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23aq.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wuyang
family: Chen
- given: Yanqi
family: Zhou
- given: Nan
family: Du
- given: Yanping
family: Huang
- given: James
family: Laudon
- given: Zhifeng
family: Chen
- given: Claire
family: Cui
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5383-5395
id: chen23aq
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5383
lastpage: 5395
published: 2023-07-03 00:00:00 +0000
- title: 'Generalized-Smooth Nonconvex Optimization is As Efficient As Smooth Nonconvex Optimization'
abstract: 'Various optimal gradient-based algorithms have been developed for smooth nonconvex optimization. However, many nonconvex machine learning problems do not belong to the class of smooth functions and therefore the existing algorithms are sub-optimal. Instead, these problems have been shown to satisfy certain generalized-smooth conditions, which have not been well understood in the existing literature. In this paper, we propose a notion of $\alpha$-symmetric generalized-smoothness that substantially extends the existing notions and covers many important functions such as high-order polynomials and exponential functions. We study the fundamental properties and establish descent lemmas for the functions in this class. Then, to solve such a large class of nonconvex problems, we design a special deterministic normalized gradient descent algorithm that achieves the optimal iteration complexity $\mathcal{O}(\epsilon^{-2})$, and also prove that the popular SPIDER variance reduction algorithm achieves the optimal sample complexity $\mathcal{O}(\epsilon^{-3})$. Our results show that solving generalized-smooth nonconvex problems is as efficient as solving smooth nonconvex problems.'
volume: 202
URL: https://proceedings.mlr.press/v202/chen23ar.html
PDF: https://proceedings.mlr.press/v202/chen23ar/chen23ar.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chen23ar.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ziyi
family: Chen
- given: Yi
family: Zhou
- given: Yingbin
family: Liang
- given: Zhaosong
family: Lu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5396-5427
id: chen23ar
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5396
lastpage: 5427
published: 2023-07-03 00:00:00 +0000
- title: 'Weakly Supervised Regression with Interval Targets'
abstract: 'This paper investigates an interesting weakly supervised regression setting called regression with interval targets (RIT). Although some of the previous methods on relevant regression settings can be adapted to RIT, they are not statistically consistent, and thus their empirical performance is not guaranteed. In this paper, we provide a thorough study on RIT. First, we proposed a novel statistical model to describe the data generation process for RIT and demonstrate its validity. Second, we analyze a simple selecting method for RIT, which selects a particular value in the interval as the target value to train the model. Third, we propose a statistically consistent limiting method for RIT to train the model by limiting the predictions to the interval. We further derive an estimation error bound for our limiting method. Finally, extensive experiments on various datasets demonstrate the effectiveness of our proposed method.'
volume: 202
URL: https://proceedings.mlr.press/v202/cheng23a.html
PDF: https://proceedings.mlr.press/v202/cheng23a/cheng23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cheng23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xin
family: Cheng
- given: Yuzhou
family: Cao
- given: Ximing
family: Li
- given: Bo
family: An
- given: Lei
family: Feng
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5428-5448
id: cheng23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5428
lastpage: 5448
published: 2023-07-03 00:00:00 +0000
- title: 'PLay: Parametrically Conditioned Layout Generation using Latent Diffusion'
abstract: 'Layout design is an important task in various design fields, including user interfaces, document, and graphic design. As this task requires tedious manual effort by designers, prior works have attempted to automate this process using generative models, but commonly fell short of providing intuitive user controls and achieving design objectives. In this paper, we build a conditional latent diffusion model, PLay, that generates parametrically conditioned layouts in vector graphic space from user-specified guidelines, which are commonly used by designers for representing their design intents in current practices. Our method outperforms prior works across three datasets on metrics including FID and FD-VG, and in user test. Moreover, it brings a novel and interactive experience to professional layout design processes.'
volume: 202
URL: https://proceedings.mlr.press/v202/cheng23b.html
PDF: https://proceedings.mlr.press/v202/cheng23b/cheng23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cheng23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chin-Yi
family: Cheng
- given: Forrest
family: Huang
- given: Gang
family: Li
- given: Yang
family: Li
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5449-5471
id: cheng23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5449
lastpage: 5471
published: 2023-07-03 00:00:00 +0000
- title: 'Identification of the Adversary from a Single Adversarial Example'
abstract: 'Deep neural networks have been shown vulnerable to adversarial examples. Even though many defense methods have been proposed to enhance the robustness, it is still a long way toward providing an attack-free method to build a trustworthy machine learning system. In this paper, instead of enhancing the robustness, we take the investigator’s perspective and propose a new framework to trace the first compromised model copy in a forensic investigation manner. Specifically, we focus on the following setting: the machine learning service provider provides model copies for a set of customers. However, one of the customers conducted adversarial attacks to fool the system. Therefore, the investigator’s objective is to identify the first compromised copy by collecting and analyzing evidence from only available adversarial examples. To make the tracing viable, we design a random mask watermarking mechanism to differentiate adversarial examples from different copies. First, we propose a tracing approach in the data-limited case where the original example is also available. Then, we design a data-free approach to identify the adversary without accessing the original example. Finally, the effectiveness of our proposed framework is evaluated by extensive experiments with different model architectures, adversarial attacks, and datasets.'
volume: 202
URL: https://proceedings.mlr.press/v202/cheng23c.html
PDF: https://proceedings.mlr.press/v202/cheng23c/cheng23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cheng23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Minhao
family: Cheng
- given: Rui
family: Min
- given: Haochen
family: Sun
- given: Pin-Yu
family: Chen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5472-5484
id: cheng23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5472
lastpage: 5484
published: 2023-07-03 00:00:00 +0000
- title: 'Parallel Online Clustering of Bandits via Hedonic Game'
abstract: 'Contextual bandit algorithms appear in several applications, such as online advertisement and recommendation systems like personalized education or personalized medicine. Individually-tailored recommendations boost the performance of the underlying application; nevertheless, providing individual suggestions becomes costly and even implausible as the number of users grows. As such, to efficiently serve the demands of several users in modern applications, it is imperative to identify the underlying users’ clusters, i.e., the groups of users for which a single recommendation might be (near-)optimal. We propose CLUB-HG, a novel algorithm that integrates a game-theoretic approach into clustering inference. Our algorithm achieves Nash equilibrium at each inference step and discovers the underlying clusters. We also provide regret analysis within a standard linear stochastic noise setting. Finally, experiments on synthetic and real-world datasets show the superior performance of our proposed algorithm compared to the state-of-the-art algorithms.'
volume: 202
URL: https://proceedings.mlr.press/v202/cheng23d.html
PDF: https://proceedings.mlr.press/v202/cheng23d/cheng23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cheng23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xiaotong
family: Cheng
- given: Cheng
family: Pan
- given: Setareh
family: Maghsudi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5485-5503
id: cheng23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5485
lastpage: 5503
published: 2023-07-03 00:00:00 +0000
- title: 'Mu$^2$SLAM: Multitask, Multilingual Speech and Language Models'
abstract: 'We present Mu$^2$SLAM, a multilingual sequence-to-sequence model pre-trained jointly on unlabeled speech, unlabeled text and supervised data spanning Automatic Speech Recognition (ASR), Automatic Speech Translation (AST) and Machine Translation (MT), in over 100 languages. By leveraging a quantized representation of speech as a target, Mu$^2$SLAM trains the speech-text models with a sequence-to-sequence masked denoising objective similar to T5 on the decoder and a masked language modeling objective (MLM) on the encoder, for both unlabeled speech and text, while utilizing the supervised tasks to improve cross-lingual and cross-modal representation alignment within the model. On CoVoST AST, Mu$^2$SLAM establishes a new state-of-the-art for models trained on public datasets, improving on xx-en translation over the previous best by 1.9 BLEU points and on en-xx translation by 1.1 BLEU points. On Voxpopuli ASR, our model matches the performance of an mSLAM model fine-tuned with an RNN-T decoder, despite using a relatively weaker Transformer decoder. On text understanding tasks, our model improves by more than 6% over mSLAM on XNLI, getting closer to the performance of mT5 models of comparable capacity on XNLI and TydiQA, paving the way towards a single model for all speech and text understanding tasks.'
volume: 202
URL: https://proceedings.mlr.press/v202/cheng23e.html
PDF: https://proceedings.mlr.press/v202/cheng23e/cheng23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cheng23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yong
family: Cheng
- given: Yu
family: Zhang
- given: Melvin
family: Johnson
- given: Wolfgang
family: Macherey
- given: Ankur
family: Bapna
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5504-5520
id: cheng23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5504
lastpage: 5520
published: 2023-07-03 00:00:00 +0000
- title: 'Understanding the Role of Feedback in Online Learning with Switching Costs'
abstract: 'In this paper, we study the role of feedback in online learning with switching costs. It has been shown that the minimax regret is $\widetilde{\Theta}(T^{2/3})$ under bandit feedback and improves to $\widetilde{\Theta}(\sqrt{T})$ under full-information feedback, where $T$ is the length of the time horizon. However, it remains largely unknown how the amount and type of feedback generally impact regret. To this end, we first consider the setting of bandit learning with extra observations; that is, in addition to the typical bandit feedback, the learner can freely make a total of $B_{\mathrm{ex}}$ *extra observations*. We fully characterize the minimax regret in this setting, which exhibits an interesting *phase-transition phenomenon*: when $B_{\mathrm{ex}} = O(T^{2/3})$, the regret remains $\widetilde{\Theta}(T^{2/3})$, but when $B_{\mathrm{ex}} = \Omega(T^{2/3})$, it becomes $\widetilde{\Theta}(T/\sqrt{B_{\mathrm{ex}}})$, which improves as the budget $B_{\mathrm{ex}}$ increases. To design algorithms that can achieve the minimax regret, it is instructive to consider a more general setting where the learner has a budget of $B$ *total* observations. We fully characterize the minimax regret in this setting as well and show that it is $\widetilde{\Theta}(T/\sqrt{B})$, which scales smoothly with the total budget $B$. Furthermore, we propose a generic algorithmic framework, which enables us to design different learning algorithms that can achieve matching upper bounds for both settings based on the amount and type of feedback. One interesting finding is that while bandit feedback can still guarantee optimal regret when the budget is relatively limited, it no longer suffices to achieve optimal regret when the budget is relatively large.'
volume: 202
URL: https://proceedings.mlr.press/v202/cheng23f.html
PDF: https://proceedings.mlr.press/v202/cheng23f/cheng23f.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cheng23f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Duo
family: Cheng
- given: Xingyu
family: Zhou
- given: Bo
family: Ji
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5521-5543
id: cheng23f
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5521
lastpage: 5543
published: 2023-07-03 00:00:00 +0000
- title: 'Tighter Bounds on the Expressivity of Transformer Encoders'
abstract: 'Characterizing neural networks in terms of better-understood formal systems has the potential to yield new insights into the power and limitations of these networks. Doing so for transformers remains an active area of research. Bhattamishra and others have shown that transformer encoders are at least as expressive as a certain kind of counter machine, while Merrill and Sabharwal have shown that fixed-precision transformer encoders recognize only languages in uniform $TC^0$. We connect and strengthen these results by identifying a variant of first-order logic with counting quantifiers that is simultaneously an upper bound for fixed-precision transformer encoders and a lower bound for transformer encoders. This brings us much closer than before to an exact characterization of the languages that transformer encoders recognize.'
volume: 202
URL: https://proceedings.mlr.press/v202/chiang23a.html
PDF: https://proceedings.mlr.press/v202/chiang23a/chiang23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chiang23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Chiang
- given: Peter
family: Cholak
- given: Anand
family: Pillay
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5544-5562
id: chiang23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5544
lastpage: 5562
published: 2023-07-03 00:00:00 +0000
- title: 'Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup'
abstract: 'Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain some of this success from a feature learning perspective. We focus our attention on classification problems in which each class may have multiple associated features (or $\textit{views}$) that can be used to predict the class correctly. Our main theoretical results demonstrate that, for a non-trivial class of data distributions with two features per class, training a 2-layer convolutional network using empirical risk minimization can lead to learning only one feature for almost all classes while training with a specific instantiation of Mixup succeeds in learning both features for every class. We also show empirically that these theoretical insights extend to the practical settings of image benchmarks modified to have multiple features.'
volume: 202
URL: https://proceedings.mlr.press/v202/chidambaram23a.html
PDF: https://proceedings.mlr.press/v202/chidambaram23a/chidambaram23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chidambaram23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Muthu
family: Chidambaram
- given: Xiang
family: Wang
- given: Chenwei
family: Wu
- given: Rong
family: Ge
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5563-5599
id: chidambaram23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5563
lastpage: 5599
published: 2023-07-03 00:00:00 +0000
- title: 'Hiding Data Helps: On the Benefits of Masking for Sparse Coding'
abstract: 'Sparse coding, which refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary, has proven to be a successful (and interpretable) approach in applications such as signal processing, computer vision, and medical imaging. While this success has spurred much work on provable guarantees for dictionary recovery when the learned dictionary is the same size as the ground-truth dictionary, work on the setting where the learned dictionary is larger (or $\textit{over-realized}$) with respect to the ground truth is comparatively nascent. Existing theoretical results in this setting have been constrained to the case of noise-less data. We show in this work that, in the presence of noise, minimizing the standard dictionary learning objective can fail to recover the elements of the ground-truth dictionary in the over-realized regime, regardless of the magnitude of the signal in the data-generating process. Furthermore, drawing from the growing body of work on self-supervised learning, we propose a novel masking objective for which recovering the ground-truth dictionary is in fact optimal as the signal increases for a large class of data-generating processes. We corroborate our theoretical results with experiments across several parameter regimes showing that our proposed objective also enjoys better empirical performance than the standard reconstruction objective.'
volume: 202
URL: https://proceedings.mlr.press/v202/chidambaram23b.html
PDF: https://proceedings.mlr.press/v202/chidambaram23b/chidambaram23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chidambaram23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Muthu
family: Chidambaram
- given: Chenwei
family: Wu
- given: Yu
family: Cheng
- given: Rong
family: Ge
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5600-5615
id: chidambaram23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5600
lastpage: 5615
published: 2023-07-03 00:00:00 +0000
- title: 'PINA: Leveraging Side Information in eXtreme Multi-label Classification via Predicted Instance Neighborhood Aggregation'
abstract: 'The eXtreme Multi-label Classification (XMC) problem seeks to find relevant labels from an exceptionally large label space. Most of the existing XMC learners focus on the extraction of semantic features from input query text. However, conventional XMC studies usually neglect the side information of instances and labels, which can be of use in many real-world applications such as recommendation systems and e-commerce product search. We propose Predicted Instance Neighborhood Aggregation (PINA), a data augmentation method for the general XMC problem that leverages beneficial side information. Unlike most existing XMC frameworks that treat labels and input instances as featureless indicators and independent entries, PINA extracts information from the label metadata and the correlations among training instances. Extensive experimental results demonstrate the consistent gain of PINA on various XMC tasks compared to the state-of-the-art methods: PINA offers a gain in accuracy compared to standard XR-Transformers on five public benchmark datasets. Moreover, PINA achieves a $\sim 5$% gain in accuracy on the largest dataset LF-AmazonTitles-1.3M.'
volume: 202
URL: https://proceedings.mlr.press/v202/chien23a.html
PDF: https://proceedings.mlr.press/v202/chien23a/chien23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chien23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eli
family: Chien
- given: Jiong
family: Zhang
- given: Cho-Jui
family: Hsieh
- given: Jyun-Yu
family: Jiang
- given: Wei-Cheng
family: Chang
- given: Olgica
family: Milenkovic
- given: Hsiang-Fu
family: Yu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5616-5630
id: chien23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5616
lastpage: 5630
published: 2023-07-03 00:00:00 +0000
- title: 'Tight Certification of Adversarially Trained Neural Networks via Nonconvex Low-Rank Semidefinite Relaxations'
abstract: 'Adversarial training is well-known to produce high-quality neural network models that are empirically robust against adversarial perturbations. Nevertheless, once a model has been adversarially trained, one often desires a certification that the model is truly robust against all future attacks. Unfortunately, when faced with adversarially trained models, all existing approaches have significant trouble making certifications that are strong enough to be practically useful. Linear programming (LP) techniques in particular face a “convex relaxation barrier” that prevent them from making high-quality certifications, even after refinement with mixed-integer linear programming (MILP) and branch-and-bound (BnB) techniques. In this paper, we propose a nonconvex certification technique, based on a low-rank restriction of a semidefinite programming (SDP) relaxation. The nonconvex relaxation makes strong certifications comparable to much more expensive SDP methods, while optimizing over dramatically fewer variables comparable to much weaker LP methods. Despite nonconvexity, we show how off-the-shelf local optimization algorithms can be used to achieve and to certify global optimality in polynomial time. Our experiments find that the nonconvex relaxation almost completely closes the gap towards exact certification of adversarially trained models.'
volume: 202
URL: https://proceedings.mlr.press/v202/chiu23a.html
PDF: https://proceedings.mlr.press/v202/chiu23a/chiu23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chiu23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hong-Ming
family: Chiu
- given: Richard Y.
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5631-5660
id: chiu23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5631
lastpage: 5660
published: 2023-07-03 00:00:00 +0000
- title: 'Neural Latent Aligner: Cross-trial Alignment for Learning Representations of Complex, Naturalistic Neural Data'
abstract: 'Understanding the neural implementation of complex human behaviors is one of the major goals in neuroscience. To this end, it is crucial to find a true representation of the neural data, which is challenging due to the high complexity of behaviors and the low signal-to-ratio (SNR) of the signals. Here, we propose a novel unsupervised learning framework, Neural Latent Aligner (NLA), to find well-constrained, behaviorally relevant neural representations of complex behaviors. The key idea is to align representations across repeated trials to learn cross-trial consistent information. Furthermore, we propose a novel, fully differentiable time warping model (TWM) to resolve the temporal misalignment of trials. When applied to intracranial electrocorticography (ECoG) of natural speaking, our model learns better representations for decoding behaviors than the baseline models, especially in lower dimensional space. The TWM is empirically validated by measuring behavioral coherence between aligned trials. The proposed framework learns more cross-trial consistent representations than the baselines, and when visualized, the manifold reveals shared neural trajectories across trials.'
volume: 202
URL: https://proceedings.mlr.press/v202/cho23a.html
PDF: https://proceedings.mlr.press/v202/cho23a/cho23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cho23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Cheol Jun
family: Cho
- given: Edward
family: Chang
- given: Gopala
family: Anumanchipalli
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5661-5676
id: cho23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5661
lastpage: 5676
published: 2023-07-03 00:00:00 +0000
- title: 'On the Convergence of Federated Averaging with Cyclic Client Participation'
abstract: 'Federated Averaging (FedAvg) and its variants are the most popular optimization algorithms in federated learning (FL). Previous convergence analyses of FedAvg either assume full client participation or partial client participation where the clients can be uniformly sampled. However, in practical cross-device FL systems, only a subset of clients that satisfy local criteria such as battery status, network connectivity, and maximum participation frequency requirements (to ensure privacy) are available for training at a given time. As a result, client availability follows a *natural cyclic pattern*. We provide (to our knowledge) the first theoretical framework to analyze the convergence of FedAvg with cyclic client participation with several different client optimizers such as GD, SGD, and shuffled SGD. Our analysis discovers that cyclic client participation can achieve a faster asymptotic convergence rate than vanilla FedAvg with uniform client participation under suitable conditions, providing valuable insights into the design of client sampling protocols.'
volume: 202
URL: https://proceedings.mlr.press/v202/cho23b.html
PDF: https://proceedings.mlr.press/v202/cho23b/cho23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cho23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yae Jee
family: Cho
- given: Pranay
family: Sharma
- given: Gauri
family: Joshi
- given: Zheng
family: Xu
- given: Satyen
family: Kale
- given: Tong
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5677-5721
id: cho23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5677
lastpage: 5721
published: 2023-07-03 00:00:00 +0000
- title: 'GREAD: Graph Neural Reaction-Diffusion Networks'
abstract: 'Graph neural networks (GNNs) are one of the most popular research topics for deep learning. GNN methods typically have been designed on top of the graph signal processing theory. In particular, diffusion equations have been widely used for designing the core processing layer of GNNs, and therefore they are inevitably vulnerable to the notorious oversmoothing problem. Recently, a couple of papers paid attention to reaction equations in conjunctions with diffusion equations. However, they all consider limited forms of reaction equations. To this end, we present a reaction-diffusion equation-based GNN method that considers all popular types of reaction equations in addition to one special reaction equation designed by us. To our knowledge, our paper is one of the most comprehensive studies on reaction-diffusion equation-based GNNs. In our experiments with 9 datasets and 28 baselines, our method, called GREAD, outperforms them in a majority of cases. Further synthetic data experiments show that it mitigates the oversmoothing problem and works well for various homophily rates.'
volume: 202
URL: https://proceedings.mlr.press/v202/choi23a.html
PDF: https://proceedings.mlr.press/v202/choi23a/choi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jeongwhan
family: Choi
- given: Seoyoung
family: Hong
- given: Noseong
family: Park
- given: Sung-Bae
family: Cho
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5722-5747
id: choi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5722
lastpage: 5747
published: 2023-07-03 00:00:00 +0000
- title: 'Is Overfitting Necessary for Implicit Video Representation?'
abstract: 'Compact representation of multimedia signals using implicit neural representations (INRs) has advanced significantly over the past few years, and recent works address their applications to video. Existing studies on video INR have focused on network architecture design as all video information is contained within network parameters. Here, we propose a new paradigm in efficient INR for videos based on the idea of strong lottery ticket (SLT) hypothesis (Zhou et al., 2019), which demonstrates the possibility of finding an accurate subnetwork mask, called supermask, for a randomly initialized classification network without weight training. Specifically, we train multiple supermasks with a hierarchical structure for a randomly initialized image-wise video representation model without weight updates. Different from a previous approach employing hierarchical supermasks (Okoshi et al., 2022), a trainable scale parameter for each mask is used instead of multiplying by the same fixed scale for all levels. This simple modification widens the parameter search space to sufficiently explore various sparsity patterns, leading the proposed algorithm to find stronger subnetworks. Moreover, extensive experiments on popular UVG benchmark show that random subnetworks obtained from our framework achieve higher reconstruction and visual quality than fully trained models with similar encoding sizes. Our study is the first to demonstrate the existence of SLTs in video INR models and propose an efficient method for finding them.'
volume: 202
URL: https://proceedings.mlr.press/v202/choi23b.html
PDF: https://proceedings.mlr.press/v202/choi23b/choi23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choi23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hee Min
family: Choi
- given: Hyoa
family: Kang
- given: Dokwan
family: Oh
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5748-5770
id: choi23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5748
lastpage: 5770
published: 2023-07-03 00:00:00 +0000
- title: 'Semi-Parametric Contextual Pricing Algorithm using Cox Proportional Hazards Model'
abstract: 'Contextual dynamic pricing is a problem of setting prices based on current contextual information and previous sales history to maximize revenue. A popular approach is to postulate a distribution of customer valuation as a function of contextual information and the baseline valuation. A semi-parametric setting, where the context effect is parametric and the baseline is nonparametric, is of growing interest due to its flexibility. A challenge is that customer valuation is almost never observable in practice and is instead *type-I interval censored* by the offered price. To address this challenge, we propose a novel semi-parametric contextual pricing algorithm for stochastic contexts, called the epoch-based Cox proportional hazards Contextual Pricing (CoxCP) algorithm. To our best knowledge, our work is the first to employ the Cox model for customer valuation. The CoxCP algorithm has a high-probability regret upper bound of $\tilde{O}( T^{\frac{2}{3}}d )$, where $T$ is the length of horizon and $d$ is the dimension of context. In addition, if the baseline is known, the regret bound can improve to $O( d \log T )$ under certain assumptions. We demonstrate empirically the proposed algorithm performs better than existing semi-parametric contextual pricing algorithms when the model assumptions of all algorithms are correct.'
volume: 202
URL: https://proceedings.mlr.press/v202/choi23c.html
PDF: https://proceedings.mlr.press/v202/choi23c/choi23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choi23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Young-Geun
family: Choi
- given: Gi-Soo
family: Kim
- given: Yunseo
family: Choi
- given: Wooseong
family: Cho
- given: Myunghee Cho
family: Paik
- given: Min-Hwan
family: Oh
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5771-5786
id: choi23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5771
lastpage: 5786
published: 2023-07-03 00:00:00 +0000
- title: 'Restoration based Generative Models'
abstract: 'Denoising diffusion models (DDMs) have recently attracted increasing attention by showing impressive synthesis quality. DDMs are built on a diffusion process that pushes data to the noise distribution and the models learn to denoise. In this paper, we establish the interpretation of DDMs in terms of image restoration (IR). Integrating IR literature allows us to use an alternative objective and diverse forward processes, not confining to the diffusion process. By imposing prior knowledge on the loss function grounded on MAP-based estimation, we eliminate the need for the expensive sampling of DDMs. Also, we propose a multi-scale training, which improves the performance compared to the diffusion process, by taking advantage of the flexibility of the forward process. Experimental results demonstrate that our model improves the quality and efficiency of both training and inference. Furthermore, we show the applicability of our model to inverse problems. We believe that our framework paves the way for designing a new type of flexible general generative model.'
volume: 202
URL: https://proceedings.mlr.press/v202/choi23d.html
PDF: https://proceedings.mlr.press/v202/choi23d/choi23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choi23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jaemoo
family: Choi
- given: Yesom
family: Park
- given: Myungjoo
family: Kang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5787-5816
id: choi23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5787
lastpage: 5816
published: 2023-07-03 00:00:00 +0000
- title: 'Concept-based Explanations for Out-of-Distribution Detectors'
abstract: 'Out-of-distribution (OOD) detection plays a crucial role in ensuring the safe deployment of deep neural network (DNN) classifiers. While a myriad of methods have focused on improving the performance of OOD detectors, a critical gap remains in interpreting their decisions. We help bridge this gap by providing explanations for OOD detectors based on learned high-level concepts. We first propose two new metrics for assessing the effectiveness of a particular set of concepts for explaining OOD detectors: 1) detection completeness, which quantifies the sufficiency of concepts for explaining an OOD-detector’s decisions, and 2) concept separability, which captures the distributional separation between in-distribution and OOD data in the concept space. Based on these metrics, we propose an unsupervised framework for learning a set of concepts that satisfy the desired properties of high detection completeness and concept separability, and demonstrate its effectiveness in providing concept-based explanations for diverse off-the-shelf OOD detectors. We also show how to identify prominent concepts contributing to the detection results, and provide further reasoning about their decisions.'
volume: 202
URL: https://proceedings.mlr.press/v202/choi23e.html
PDF: https://proceedings.mlr.press/v202/choi23e/choi23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choi23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jihye
family: Choi
- given: Jayaram
family: Raghuram
- given: Ryan
family: Feng
- given: Jiefeng
family: Chen
- given: Somesh
family: Jha
- given: Atul
family: Prakash
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5817-5837
id: choi23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5817
lastpage: 5837
published: 2023-07-03 00:00:00 +0000
- title: 'Active causal structure learning with advice'
abstract: 'We introduce the problem of active causal structure learning with advice. In the typical well-studied setting, the learning algorithm is given the essential graph for the observational distribution and is asked to recover the underlying causal directed acyclic graph (DAG) $G^*$ while minimizing the number of interventions made. In our setting, we are additionally given side information about $G^*$ as advice, e.g. a DAG $G$ purported to be $G^*$. We ask whether the learning algorithm can benefit from the advice when it is close to being correct, while still having worst-case guarantees even when the advice is arbitrarily bad. Our work is in the same space as the growing body of research on *algorithms with predictions*. When the advice is a DAG $G$, we design an adaptive search algorithm to recover $G^*$ whose intervention cost is at most $\mathcal{O}(\max\{1, \log \psi\})$ times the cost for verifying $G^*$; here, $\psi$ is a distance measure between $G$ and $G^*$ that is upper bounded by the number of variables $n$, and is exactly 0 when $G=G^*$. Our approximation factor matches the state-of-the-art for the advice-less setting.'
volume: 202
URL: https://proceedings.mlr.press/v202/choo23a.html
PDF: https://proceedings.mlr.press/v202/choo23a/choo23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choo23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Davin
family: Choo
- given: Themistoklis
family: Gouleakis
- given: Arnab
family: Bhattacharyya
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5838-5867
id: choo23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5838
lastpage: 5867
published: 2023-07-03 00:00:00 +0000
- title: 'New metrics and search algorithms for weighted causal DAGs'
abstract: 'Recovering causal relationships from data is an important problem. Using observational data, one can typically only recover causal graphs up to a Markov equivalence class and additional assumptions or interventional data are needed for complete recovery. In this work, under some standard assumptions, we study causal graph discovery via *adaptive interventions with node-dependent interventional costs*. For this setting, we show that no algorithm can achieve an approximation guarantee that is asymptotically better than linear in the number of vertices with respect to the verification number; a well-established benchmark for adaptive search algorithms. Motivated by this negative result, we define a *new benchmark* that captures the worst-case interventional cost for any search algorithm. Furthermore, with respect to this new benchmark, we provide adaptive search algorithms that achieve logarithmic approximations under various settings: atomic, bounded size interventions and generalized cost objectives.'
volume: 202
URL: https://proceedings.mlr.press/v202/choo23b.html
PDF: https://proceedings.mlr.press/v202/choo23b/choo23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choo23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Davin
family: Choo
- given: Kirankumar
family: Shiragur
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5868-5903
id: choo23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5868
lastpage: 5903
published: 2023-07-03 00:00:00 +0000
- title: 'Computational Doob h-transforms for Online Filtering of Discretely Observed Diffusions'
abstract: 'This paper is concerned with online filtering of discretely observed nonlinear diffusion processes. Our approach is based on the fully adapted auxiliary particle filter, which involves Doob’s $h$-transforms that are typically intractable. We propose a computational framework to approximate these $h$-transforms by solving the underlying backward Kolmogorov equations using nonlinear Feynman-Kac formulas and neural networks. The methodology allows one to train a locally optimal particle filter prior to the data-assimilation procedure. Numerical experiments illustrate that the proposed approach can be orders of magnitude more efficient than state-of-the-art particle filters in the regime of highly informative observations, when the observations are extreme under the model, and if the state dimension is large.'
volume: 202
URL: https://proceedings.mlr.press/v202/chopin23a.html
PDF: https://proceedings.mlr.press/v202/chopin23a/chopin23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chopin23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nicolas
family: Chopin
- given: Andras
family: Fulop
- given: Jeremy
family: Heng
- given: Alexandre H.
family: Thiery
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5904-5923
id: chopin23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5904
lastpage: 5923
published: 2023-07-03 00:00:00 +0000
- title: 'Multi-Epoch Matrix Factorization Mechanisms for Private Machine Learning'
abstract: 'We introduce new differentially private (DP) mechanisms for gradient-based machine learning (ML) with multiple passes (epochs) over a dataset, substantially improving the achievable privacy-utility-computation tradeoffs. We formalize the problem of DP mechanisms for adaptive streams with multiple participations and introduce a non-trivial extension of online matrix factorization DP mechanisms to our setting. This includes establishing the necessary theory for sensitivity calculations and efficient computation of optimal matrices. For some applications like $>\!\! 10,000$ SGD steps, applying these optimal techniques becomes computationally expensive. We thus design an efficient Fourier-transform-based mechanism with only a minor utility loss. Extensive empirical evaluation on both example-level DP for image classification and user-level DP for language modeling demonstrate substantial improvements over all previous methods, including the widely-used DP-SGD. Though our primary application is to ML, our main DP results are applicable to arbitrary linear queries and hence may have much broader applicability.'
volume: 202
URL: https://proceedings.mlr.press/v202/choquette-choo23a.html
PDF: https://proceedings.mlr.press/v202/choquette-choo23a/choquette-choo23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choquette-choo23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christopher A.
family: Choquette-Choo
- given: Hugh Brendan
family: Mcmahan
- given: J Keith
family: Rush
- given: Abhradeep
family: Guha Thakurta
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5924-5963
id: choquette-choo23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5924
lastpage: 5963
published: 2023-07-03 00:00:00 +0000
- title: 'Taming graph kernels with random features'
abstract: 'We introduce in this paper the mechanism of graph random features (GRFs). GRFs can be used to construct unbiased randomized estimators of several important kernels defined on graphs’ nodes, in particular the regularized Laplacian kernel. As regular RFs for non-graph kernels, they provide means to scale up kernel methods defined on graphs to larger networks. Importantly, they give substantial computational gains also for smaller graphs, while applied in downstream applications. Consequently, GRFs address the notoriously difficult problem of cubic (in the number of the nodes of the graph) time complexity of graph kernels algorithms. We provide a detailed theoretical analysis of GRFs and an extensive empirical evaluation: from speed tests, through Frobenius relative error analysis to kmeans graph-clustering with graph kernels. We show that the computation of GRFs admits an embarrassingly simple distributed algorithm that can be applied if the graph under consideration needs to be split across several machines. We also introduce a (still unbiased) quasi Monte Carlo variant of GRFs, q-GRFs, relying on the so-called reinforced random walks that might be used to optimize the variance of GRFs. As a byproduct, we obtain a novel approach to solve certain classes of linear equations with positive and symmetric matrices.'
volume: 202
URL: https://proceedings.mlr.press/v202/choromanski23a.html
PDF: https://proceedings.mlr.press/v202/choromanski23a/choromanski23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choromanski23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Krzysztof Marcin
family: Choromanski
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5964-5977
id: choromanski23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5964
lastpage: 5977
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient Graph Field Integrators Meet Point Clouds'
abstract: 'We present two new classes of algorithms for efficient field integration on graphs encoding point cloud data. The first class, $\mathrm{SeparatorFactorization}$ (SF), leverages the bounded genus of point cloud mesh graphs, while the second class, $\mathrm{RFDiffusion}$ (RFD), uses popular $\epsilon$-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Methods (FMMs), which have had a tremendous impact on efficient integration, but for non-Euclidean spaces. We focus on geometries induced by distributions of walk lengths between points (e.g. shortest-path distance). We provide an extensive theoretical analysis of our algorithms, obtaining new results in structural graph theory as a byproduct. We also perform exhaustive empirical evaluation, including on-surface interpolation for rigid and deformable objects (in particular for mesh-dynamics modeling) as well as Wasserstein distance computations for point clouds, including the Gromov-Wasserstein variant.'
volume: 202
URL: https://proceedings.mlr.press/v202/choromanski23b.html
PDF: https://proceedings.mlr.press/v202/choromanski23b/choromanski23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choromanski23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Krzysztof Marcin
family: Choromanski
- given: Arijit
family: Sehanobish
- given: Han
family: Lin
- given: Yunfan
family: Zhao
- given: Eli
family: Berger
- given: Tetiana
family: Parshakova
- given: Alvin
family: Pan
- given: David
family: Watkins
- given: Tianyi
family: Zhang
- given: Valerii
family: Likhosherstov
- given: Somnath
family: Basu Roy Chowdhury
- given: Kumar Avinava
family: Dubey
- given: Deepali
family: Jain
- given: Tamas
family: Sarlos
- given: Snigdha
family: Chaturvedi
- given: Adrian
family: Weller
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 5978-6004
id: choromanski23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 5978
lastpage: 6004
published: 2023-07-03 00:00:00 +0000
- title: 'ContraBAR: Contrastive Bayes-Adaptive Deep RL'
abstract: 'In meta reinforcement learning (meta RL), an agent seeks a Bayes-optimal policy – the optimal policy when facing an unknown task that is sampled from some known task distribution. Previous approaches tackled this problem by inferring a $\textit{belief}$ over task parameters, using variational inference methods. Motivated by recent successes of contrastive learning approaches in RL, such as contrastive predictive coding (CPC), we investigate whether contrastive methods can be used for learning Bayes-optimal behavior. We begin by proving that representations learned by CPC are indeed sufficient for Bayes optimality. Based on this observation, we propose a simple meta RL algorithm that uses CPC in lieu of variational belief inference. Our method, $\textit{ContraBAR}$, achieves comparable performance to state-of-the-art in domains with state-based observation and circumvents the computational toll of future observation reconstruction, enabling learning in domains with image-based observations. It can also be combined with image augmentations for domain randomization and used seamlessly in both online and offline meta RL settings.'
volume: 202
URL: https://proceedings.mlr.press/v202/choshen23a.html
PDF: https://proceedings.mlr.press/v202/choshen23a/choshen23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-choshen23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Era
family: Choshen
- given: Aviv
family: Tamar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6005-6027
id: choshen23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6005
lastpage: 6027
published: 2023-07-03 00:00:00 +0000
- title: 'Forget Unlearning: Towards True Data-Deletion in Machine Learning'
abstract: 'Unlearning algorithms aim to remove deleted data’s influence from trained models at a cost lower than full retraining. However, prior guarantees of unlearning in literature are flawed and don’t protect the privacy of deleted records. We show that when people delete their data as a function of published models, records in a database become interdependent. So, even retraining a fresh model after deletion of a record doesn’t ensure its privacy. Secondly, unlearning algorithms that cache partial computations to speed up the processing can leak deleted information over a series of releases, violating the privacy of deleted records in the long run. To address these, we propose a sound deletion guarantee and show that ensuring the privacy of existing records is necessary for the privacy of deleted records. Under this notion, we propose an optimal, computationally efficient, and sound machine unlearning algorithm based on noisy gradient descent.'
volume: 202
URL: https://proceedings.mlr.press/v202/chourasia23a.html
PDF: https://proceedings.mlr.press/v202/chourasia23a/chourasia23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chourasia23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rishav
family: Chourasia
- given: Neil
family: Shah
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6028-6073
id: chourasia23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6028
lastpage: 6073
published: 2023-07-03 00:00:00 +0000
- title: 'Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks'
abstract: 'In deep learning, mixture-of-experts (MoE) activates one or few experts (sub-networks) on a per-sample or per-token basis, resulting in significant computation reduction. The recently proposed patch-level routing in MoE (pMoE) divides each input into $n$ patches (or tokens) and sends $l$ patches ($l\ll n$) to each expert through prioritized routing. pMoE has demonstrated great empirical success in reducing training and inference costs while maintaining test accuracy. However, the theoretical explanation of pMoE and the general MoE remains elusive. Focusing on a supervised classification task using a mixture of two-layer convolutional neural networks (CNNs), we show for the first time that pMoE provably reduces the required number of training samples to achieve desirable generalization (referred to as the sample complexity) by a factor in the polynomial order of $n/l$, and outperforms its single-expert counterpart of the same or even larger capacity. The advantage results from the discriminative routing property, which is justified in both theory and practice that pMoE routers can filter label-irrelevant patches and route similar class-discriminative patches to the same expert. Our experimental results on MNIST, CIFAR-10, and CelebA support our theoretical findings on pMoE’s generalization and show that pMoE can avoid learning spurious correlations.'
volume: 202
URL: https://proceedings.mlr.press/v202/chowdhury23a.html
PDF: https://proceedings.mlr.press/v202/chowdhury23a/chowdhury23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chowdhury23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mohammed Nowaz Rabbani
family: Chowdhury
- given: Shuai
family: Zhang
- given: Meng
family: Wang
- given: Sijia
family: Liu
- given: Pin-Yu
family: Chen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6074-6114
id: chowdhury23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6074
lastpage: 6114
published: 2023-07-03 00:00:00 +0000
- title: 'What do CNNs Learn in the First Layer and Why? A Linear Systems Perspective'
abstract: 'It has previously been reported that the representation that is learned in the first layer of deep Convolutional Neural Networks (CNNs) is highly consistent across initializations and architectures. In this work, we quantify this consistency by considering the first layer as a filter bank and measuring its energy distribution. We find that the energy distribution is very different from that of the initial weights and is remarkably consistent across random initializations, datasets, architectures and even when the CNNs are trained with *random labels*. In order to explain this consistency, we derive an analytical formula for the energy profile of linear CNNs and show that this profile is mostly dictated by the second order statistics of image patches in the training set and it will approach a whitening transformation when the number of iterations goes to infinity. Finally, we show that this formula for linear CNNs also gives an excellent fit for the energy profiles learned by commonly used *nonlinear* CNNs such as ResNet and VGG, and that the first layer of these CNNs indeed performs approximate whitening of their inputs.'
volume: 202
URL: https://proceedings.mlr.press/v202/chowers23a.html
PDF: https://proceedings.mlr.press/v202/chowers23a/chowers23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chowers23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rhea
family: Chowers
- given: Yair
family: Weiss
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6115-6139
id: chowers23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6115
lastpage: 6139
published: 2023-07-03 00:00:00 +0000
- title: 'Unifying Molecular and Textual Representations via Multi-task Language Modelling'
abstract: 'The recent advances in neural language models have also been successfully applied to the field of chemistry, offering generative solutions for classical problems in molecular design and synthesis planning. These new methods have the potential to fuel a new era of data-driven automation in scientific discovery. However, specialized models are still typically required for each task, leading to the need for problem-specific fine-tuning and neglecting task interrelations. The main obstacle in this field is the lack of a unified representation between natural language and chemical representations, complicating and limiting human-machine interaction. Here, we propose the first multi-domain, multi-task language model that can solve a wide range of tasks in both the chemical and natural language domains. Our model can handle chemical and natural language concurrently, without requiring expensive pre-training on single domains or task-specific models. Interestingly, sharing weights across domains remarkably improves our model when benchmarked against state-of-the-art baselines on single-domain and cross-domain tasks. In particular, sharing information across domains and tasks gives rise to large improvements in cross-domain tasks, the magnitude of which increase with scale, as measured by more than a dozen of relevant metrics. Our work suggests that such models can robustly and efficiently accelerate discovery in physical sciences by superseding problem-specific fine-tuning and enhancing human-model interactions.'
volume: 202
URL: https://proceedings.mlr.press/v202/christofidellis23a.html
PDF: https://proceedings.mlr.press/v202/christofidellis23a/christofidellis23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-christofidellis23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dimitrios
family: Christofidellis
- given: Giorgio
family: Giannone
- given: Jannis
family: Born
- given: Ole
family: Winther
- given: Teodoro
family: Laino
- given: Matteo
family: Manica
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6140-6157
id: christofidellis23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6140
lastpage: 6157
published: 2023-07-03 00:00:00 +0000
- title: 'Wasserstein Barycenter Matching for Graph Size Generalization of Message Passing Neural Networks'
abstract: 'Graph size generalization is hard for Message passing neural networks (MPNNs). The graph-level classification performance of MPNNs degrades across various graph sizes. Recently, theoretical studies reveal that a slow uncontrollable convergence rate w.r.t. graph size could adversely affect the size generalization. To address the uncontrollable convergence rate caused by correlations across nodes in the underlying dimensional signal-generating space, we propose to use Wasserstein barycenters as graph-level consensus to combat node-level correlations. Methodologically, we propose a Wasserstein barycenter matching (WBM) layer that represents an input graph by Wasserstein distances between its MPNN-filtered node embeddings versus some learned class-wise barycenters. Theoretically, we show that the convergence rate of an MPNN with a WBM layer is controllable and independent to the dimensionality of the signal-generating space. Thus MPNNs with WBM layers are less susceptible to slow uncontrollable convergence rate and size variations. Empirically, the WBM layer improves the size generalization over vanilla MPNNs with different backbones (e.g., GCN, GIN, and PNA) significantly on real-world graph datasets.'
volume: 202
URL: https://proceedings.mlr.press/v202/chu23a.html
PDF: https://proceedings.mlr.press/v202/chu23a/chu23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chu23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xu
family: Chu
- given: Yujie
family: Jin
- given: Xin
family: Wang
- given: Shanghang
family: Zhang
- given: Yasha
family: Wang
- given: Wenwu
family: Zhu
- given: Hong
family: Mei
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6158-6184
id: chu23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6158
lastpage: 6184
published: 2023-07-03 00:00:00 +0000
- title: 'Shape-Guided Dual-Memory Learning for 3D Anomaly Detection'
abstract: 'We present a shape-guided expert-learning framework to tackle the problem of unsupervised 3D anomaly detection. Our method is established on the effectiveness of two specialized expert models and their synergy to localize anomalous regions from color and shape modalities. The first expert utilizes geometric information to probe 3D structural anomalies by modeling the implicit distance fields around local shapes. The second expert considers the 2D RGB features associated with the first expert to identify color appearance irregularities on the local shapes. We use the two experts to build the dual memory banks from the anomaly-free training samples and perform shape-guided inference to pinpoint the defects in the testing samples. Owing to the per-point 3D representation and the effective fusion scheme of complementary modalities, our method efficiently achieves state-of-the-art performance on the MVTec 3D-AD dataset with better recall and lower false positive rates, as preferred in real applications.'
volume: 202
URL: https://proceedings.mlr.press/v202/chu23b.html
PDF: https://proceedings.mlr.press/v202/chu23b/chu23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chu23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yu-Min
family: Chu
- given: Chieh
family: Liu
- given: Ting-I
family: Hsieh
- given: Hwann-Tzong
family: Chen
- given: Tyng-Luh
family: Liu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6185-6194
id: chu23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6185
lastpage: 6194
published: 2023-07-03 00:00:00 +0000
- title: 'Multiply Robust Off-policy Evaluation and Learning under Truncation by Death'
abstract: 'Typical off-policy evaluation (OPE) and off-policy learning (OPL) are not well-defined problems under "truncation by death", where the outcome of interest is not defined after some events, such as death. The standard OPE no longer yields consistent estimators, and the standard OPL results in suboptimal policies. In this paper, we formulate OPE and OPL using principal stratification under "truncation by death". We propose a survivor value function for a subpopulation whose outcomes are always defined regardless of treatment conditions. We establish a novel identification strategy under principal ignorability, and derive the semiparametric efficiency bound of an OPE estimator. Then, we propose multiply robust estimators for OPE and OPL. We show that the proposed estimators are consistent and asymptotically normal even with flexible semi/nonparametric models for nuisance functions approximation. Moreover, under mild rate conditions of nuisance functions approximation, the estimators achieve the semiparametric efficiency bound. Finally, we conduct experiments to demonstrate the empirical performance of the proposed estimators.'
volume: 202
URL: https://proceedings.mlr.press/v202/chu23c.html
PDF: https://proceedings.mlr.press/v202/chu23c/chu23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chu23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jianing
family: Chu
- given: Shu
family: Yang
- given: Wenbin
family: Lu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6195-6227
id: chu23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6195
lastpage: 6227
published: 2023-07-03 00:00:00 +0000
- title: 'InfoOT: Information Maximizing Optimal Transport'
abstract: 'Optimal transport aligns samples across distributions by minimizing the transportation cost between them, e.g., the geometric distances. Yet, it ignores coherence structure in the data such as clusters, does not handle outliers well, and cannot integrate new data points. To address these drawbacks, we propose InfoOT, an information-theoretic extension of optimal transport that maximizes the mutual information between domains while minimizing geometric distances. The resulting objective can still be formulated as a (generalized) optimal transport problem, and can be efficiently solved by projected gradient descent. This formulation yields a new projection method that is robust to outliers and generalizes to unseen samples. Empirically, InfoOT improves the quality of alignments across benchmarks in domain adaptation, cross-domain retrieval, and single-cell alignment.'
volume: 202
URL: https://proceedings.mlr.press/v202/chuang23a.html
PDF: https://proceedings.mlr.press/v202/chuang23a/chuang23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chuang23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ching-Yao
family: Chuang
- given: Stefanie
family: Jegelka
- given: David
family: Alvarez-Melis
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6228-6242
id: chuang23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6228
lastpage: 6242
published: 2023-07-03 00:00:00 +0000
- title: 'A Toy Model of Universality: Reverse Engineering how Networks Learn Group Operations'
abstract: 'Universality is a key hypothesis in mechanistic interpretability – that different models learn similar features and circuits when trained on similar tasks. In this work, we study the universality hypothesis by examining how small networks learn to implement group compositions. We present a novel algorithm by which neural networks may implement composition for any finite group via mathematical representation theory. We then show that these networks consistently learn this algorithm by reverse engineering model logits and weights, and confirm our understanding using ablations. By studying networks trained on various groups and architectures, we find mixed evidence for universality: using our algorithm, we can completely characterize the family of circuits and features that networks learn on this task, but for a given network the precise circuits learned – as well as the order they develop – are arbitrary.'
volume: 202
URL: https://proceedings.mlr.press/v202/chughtai23a.html
PDF: https://proceedings.mlr.press/v202/chughtai23a/chughtai23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-chughtai23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bilal
family: Chughtai
- given: Lawrence
family: Chan
- given: Neel
family: Nanda
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6243-6267
id: chughtai23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6243
lastpage: 6267
published: 2023-07-03 00:00:00 +0000
- title: 'Distribution Free Prediction Sets for Node Classification'
abstract: 'Graph Neural Networks (GNNs) are able to achieve high classification accuracy on many important real world datasets, but provide no rigorous notion of predictive uncertainty. Quantifying the confidence of GNN models is difficult due to the dependence between datapoints induced by the graph structure. We leverage recent advances in conformal prediction to construct prediction sets for node classification in inductive learning scenarios. We do this by taking an existing approach for conformal classification that relies on *exchangeable* data and modifying it by appropriately weighting the conformal scores to reflect the network structure. We show through experiments on standard benchmark datasets using popular GNN models that our approach provides tighter and better calibrated prediction sets than a naive application of conformal prediction.'
volume: 202
URL: https://proceedings.mlr.press/v202/clarkson23a.html
PDF: https://proceedings.mlr.press/v202/clarkson23a/clarkson23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-clarkson23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jase
family: Clarkson
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6268-6278
id: clarkson23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6268
lastpage: 6278
published: 2023-07-03 00:00:00 +0000
- title: 'Sequential Strategic Screening'
abstract: 'We initiate the study of strategic behavior in screening processes with multiple classifiers. We focus on two contrasting settings: a "conjunctive” setting in which an individual must satisfy all classifiers simultaneously, and a sequential setting in which an individual to succeed must satisfy classifiers one at a time. In other words, we introduce the combination of strategic classificationwith screening processes. We show that sequential screening pipelines exhibit new and surprising behavior where individuals can exploit the sequential ordering of the tests to "zig-zag” between classifiers without having to simultaneously satisfy all of them. We demonstrate an individual can obtain a positive outcome using a limited manipulation budget even when far from the intersection of the positive regions of every classifier. Finally, we consider a learner whose goal is to design a sequential screening process that is robust to such manipulations, and provide a construction for the learner that optimizes a natural objective.'
volume: 202
URL: https://proceedings.mlr.press/v202/cohen23a.html
PDF: https://proceedings.mlr.press/v202/cohen23a/cohen23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cohen23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lee
family: Cohen
- given: Saeed
family: Sharifi -Malvajerdi
- given: Kevin
family: Stangl
- given: Ali
family: Vakilian
- given: Juba
family: Ziani
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6279-6295
id: cohen23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6279
lastpage: 6295
published: 2023-07-03 00:00:00 +0000
- title: 'Few-Sample Feature Selection via Feature Manifold Learning'
abstract: 'In this paper, we present a new method for few-sample supervised feature selection (FS). Our method first learns the manifold of the feature space of each class using kernels capturing multi-feature associations. Then, based on Riemannian geometry, a composite kernel is computed, extracting the differences between the learned feature associations. Finally, a FS score based on spectral analysis is proposed. Considering multi-feature associations makes our method multivariate by design. This in turn allows for the extraction of the hidden manifold underlying the features and avoids overfitting, facilitating few-sample FS. We showcase the efficacy of our method on illustrative examples and several benchmarks, where our method demonstrates higher accuracy in selecting the informative features compared to competing methods. In addition, we show that our FS leads to improved classification and better generalization when applied to test data.'
volume: 202
URL: https://proceedings.mlr.press/v202/cohen23b.html
PDF: https://proceedings.mlr.press/v202/cohen23b/cohen23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cohen23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: David
family: Cohen
- given: Tal
family: Shnitzer
- given: Yuval
family: Kluger
- given: Ronen
family: Talmon
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6296-6319
id: cohen23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6296
lastpage: 6319
published: 2023-07-03 00:00:00 +0000
- title: 'Spatial Implicit Neural Representations for Global-Scale Species Mapping'
abstract: 'Estimating the geographical range of a species from sparse observations is a challenging and important geospatial prediction problem. Given a set of locations where a species has been observed, the goal is to build a model to predict whether the species is present or absent at any location. This problem has a long history in ecology, but traditional methods struggle to take advantage of emerging large-scale crowdsourced datasets which can include tens of millions of records for hundreds of thousands of species. In this work, we use Spatial Implicit Neural Representations (SINRs) to jointly estimate the geographical range of 47k species simultaneously. We find that our approach scales gracefully, making increasingly better predictions as we increase the number of species and the amount of data per species when training. To make this problem accessible to machine learning researchers, we provide four new benchmarks that measure different aspects of species range estimation and spatial representation learning. Using these benchmarks, we demonstrate that noisy and biased crowdsourced data can be combined with implicit neural representations to approximate expert-developed range maps for many species.'
volume: 202
URL: https://proceedings.mlr.press/v202/cole23a.html
PDF: https://proceedings.mlr.press/v202/cole23a/cole23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cole23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elijah
family: Cole
- given: Grant Van
family: Horn
- given: Christian
family: Lange
- given: Alexander
family: Shepard
- given: Patrick
family: Leary
- given: Pietro
family: Perona
- given: Scott
family: Loarie
- given: Oisin
family: Mac Aodha
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6320-6342
id: cole23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6320
lastpage: 6342
published: 2023-07-03 00:00:00 +0000
- title: 'K-SHAP: Policy Clustering Algorithm for Anonymous Multi-Agent State-Action Pairs'
abstract: 'Learning agent behaviors from observational data has shown to improve our understanding of their decision-making processes, advancing our ability to explain their interactions with the environment and other agents. While multiple learning techniques have been proposed in the literature, there is one particular setting that has not been explored yet: multi agent systems where agent identities remain anonymous. For instance, in financial markets labeled data that identifies market participant strategies is typically proprietary, and only the anonymous state-action pairs that result from the interaction of multiple market participants are publicly available. As a result, sequences of agent actions are not observable, restricting the applicability of existing work. In this paper, we propose a Policy Clustering algorithm, called K-SHAP, that learns to group anonymous state-action pairs according to the agent policies. We frame the problem as an Imitation Learning (IL) task, and we learn a world-policy able to mimic all the agent behaviors upon different environmental states. We leverage the world-policy to explain each anonymous observation through an additive feature attribution method called SHAP (SHapley Additive exPlanations). Finally, by clustering the explanations we show that we are able to identify different agent policies and group observations accordingly. We evaluate our approach on simulated synthetic market data and a real-world financial dataset. We show that our proposal significantly and consistently outperforms the existing methods, identifying different agent strategies.'
volume: 202
URL: https://proceedings.mlr.press/v202/coletta23a.html
PDF: https://proceedings.mlr.press/v202/coletta23a/coletta23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-coletta23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Andrea
family: Coletta
- given: Svitlana
family: Vyetrenko
- given: Tucker
family: Balch
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6343-6363
id: coletta23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6343
lastpage: 6363
published: 2023-07-03 00:00:00 +0000
- title: 'Inferring Relational Potentials in Interacting Systems'
abstract: 'Systems consisting of interacting agents are prevalent in the world, ranging from dynamical systems in physics to complex biological networks. To build systems which can interact robustly in the real world, it is thus important to be able to infer the precise interactions governing such systems. Existing approaches typically discover such interactions by explicitly modeling the feed-forward dynamics of the trajectories. In this work, we propose Neural Interaction Inference with Potentials (NIIP) as an alternative approach to discover such interactions that enables greater flexibility in trajectory modeling: it discovers a set of relational potentials, represented as energy functions, which when minimized reconstruct the original trajectory. NIIP assigns low energy to the subset of trajectories which respect the relational constraints observed. We illustrate that with these representations NIIP displays unique capabilities in test-time. First, it allows trajectory manipulation, such as interchanging interaction types across separately trained models, as well as trajectory forecasting. Additionally, it allows adding external hand-crafted potentials at test-time. Finally, NIIP enables the detection of out-of-distribution samples and anomalies without explicit training.'
volume: 202
URL: https://proceedings.mlr.press/v202/comas23a.html
PDF: https://proceedings.mlr.press/v202/comas23a/comas23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-comas23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Armand
family: Comas
- given: Yilun
family: Du
- given: Christian
family: Fernandez Lopez
- given: Sandesh
family: Ghimire
- given: Mario
family: Sznaier
- given: Joshua B.
family: Tenenbaum
- given: Octavia
family: Camps
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6364-6383
id: comas23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6364
lastpage: 6383
published: 2023-07-03 00:00:00 +0000
- title: 'Task-specific experimental design for treatment effect estimation'
abstract: 'Understanding causality should be a core requirement of any attempt to build real impact through AI. Due to the inherent unobservability of counterfactuals, large randomised trials (RCTs) are the standard for causal inference. But large experiments are generically expensive, and randomisation carries its own costs, e.g. when suboptimal decisions are trialed. Recent work has proposed more sample-efficient alternatives to RCTs, but these are not adaptable to the downstream application for which the causal effect is sought. In this work, we develop a task-specific approach to experimental design and derive sampling strategies customised to particular downstream applications. Across a range of important tasks, real-world datasets, and sample sizes, our method outperforms other benchmarks, e.g. requiring an order-of-magnitude less data to match RCT performance on targeted marketing tasks.'
volume: 202
URL: https://proceedings.mlr.press/v202/connolly23a.html
PDF: https://proceedings.mlr.press/v202/connolly23a/connolly23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-connolly23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Bethany
family: Connolly
- given: Kim
family: Moore
- given: Tobias
family: Schwedes
- given: Alexander
family: Adam
- given: Gary
family: Willis
- given: Ilya
family: Feige
- given: Christopher
family: Frye
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6384-6401
id: connolly23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6384
lastpage: 6401
published: 2023-07-03 00:00:00 +0000
- title: 'A Mathematical Model for Curriculum Learning for Parities'
abstract: 'Curriculum learning (CL)- training using samples that are generated and presented in a meaningful order - was introduced in the machine learning context around a decade ago. While CL has been extensively used and analysed empirically, there has been very little mathematical justification for its advantages. We introduce a CL model for learning the class of k-parities on d bits of a binary string with a neural network trained by stochastic gradient descent (SGD). We show that a wise choice of training examples, involving two or more product distributions, allows to reduce significantly the computational cost of learning this class of functions, compared to learning under the uniform distribution. We conduct experiments to support our analysis. Furthermore, we show that for another class of functions - namely the ‘Hamming mixtures’ - CL strategies involving a bounded number of product distributions are not beneficial.'
volume: 202
URL: https://proceedings.mlr.press/v202/cornacchia23a.html
PDF: https://proceedings.mlr.press/v202/cornacchia23a/cornacchia23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cornacchia23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elisabetta
family: Cornacchia
- given: Elchanan
family: Mossel
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6402-6423
id: cornacchia23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6402
lastpage: 6423
published: 2023-07-03 00:00:00 +0000
- title: 'Learning to Maximize Mutual Information for Dynamic Feature Selection'
abstract: 'Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning, but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality, and it outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.'
volume: 202
URL: https://proceedings.mlr.press/v202/covert23a.html
PDF: https://proceedings.mlr.press/v202/covert23a/covert23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-covert23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ian Connick
family: Covert
- given: Wei
family: Qiu
- given: Mingyu
family: Lu
- given: Na Yoon
family: Kim
- given: Nathan J
family: White
- given: Su-In
family: Lee
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6424-6447
id: covert23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6424
lastpage: 6447
published: 2023-07-03 00:00:00 +0000
- title: 'Rethinking Weak Supervision in Helping Contrastive Learning'
abstract: 'Contrastive learning has shown outstanding performances in both supervised and unsupervised learning, and has recently been introduced to solve weakly supervised learning problems such as semi-supervised learning and noisy label learning. Despite the empirical evidence showing that semi-supervised labels improve the representations of contrastive learning, it remains unknown if noisy supervised information can be directly used in training instead of after manual denoising. Therefore, to explore the mechanical differences between semi-supervised and noisy-labeled information in helping contrastive learning, we establish a unified theoretical framework of contrastive learning under weak supervision. Specifically, we investigate the most intuitive paradigm of jointly training supervised and unsupervised contrastive losses. By translating the weakly supervised information into a similarity graph under the framework of spectral clustering based on the posterior probability of weak labels, we establish the downstream classification error bound. We prove that semi-supervised labels improve the downstream error bound whereas noisy labels have limited effects under such a paradigm. Our theoretical findings here provide new insights for the community to rethink the role of weak supervision in helping contrastive learning.'
volume: 202
URL: https://proceedings.mlr.press/v202/cui23a.html
PDF: https://proceedings.mlr.press/v202/cui23a/cui23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cui23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jingyi
family: Cui
- given: Weiran
family: Huang
- given: Yifei
family: Wang
- given: Yisen
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6448-6467
id: cui23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6448
lastpage: 6467
published: 2023-07-03 00:00:00 +0000
- title: 'Bayes-optimal Learning of Deep Random Networks of Extensive-width'
abstract: 'We consider the problem of learning a target function corresponding to a deep, extensive-width, non-linear neural network with random Gaussian weights. We consider the asymptotic limit where the number of samples, the input dimension and the network width are proportionally large and propose a closed-form expression for the Bayes-optimal test error, for regression and classification tasks. We further compute closed-form expressions for the test errors of ridge regression, kernel and random features regression. We find, in particular, that optimally regularized ridge regression, as well as kernel regression, achieve Bayes-optimal performances, while the logistic loss yields a near-optimal test error for classification. We further show numerically that when the number of samples grows faster than the dimension, ridge and kernel methods become suboptimal, while neural networks achieve test error close to zero from quadratically many samples.'
volume: 202
URL: https://proceedings.mlr.press/v202/cui23b.html
PDF: https://proceedings.mlr.press/v202/cui23b/cui23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cui23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hugo
family: Cui
- given: Florent
family: Krzakala
- given: Lenka
family: Zdeborova
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6468-6521
id: cui23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6468
lastpage: 6521
published: 2023-07-03 00:00:00 +0000
- title: 'A General Representation Learning Framework with Generalization Performance Guarantees'
abstract: 'The generalization performance of machine learning methods depends heavily on the quality of data representation. However, existing researches rarely consider representation learning from the perspective of generalization error. In this paper, we prove that generalization error of representation learning function can be estimated effectively by solving two convex optimization problems. Based on it, we propose a general representation learning framework. And then, we apply the proposed framework to two most commonly used nonlinear mapping methods, i.e., kernel based method and deep neural network (DNN), and thus design a kernel selection method and a DNN boosting framework, correspondingly. Finally, extensive experiments verify the effectiveness of the proposed methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/cui23c.html
PDF: https://proceedings.mlr.press/v202/cui23c/cui23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cui23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Junbiao
family: Cui
- given: Jianqing
family: Liang
- given: Qin
family: Yue
- given: Jiye
family: Liang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6522-6544
id: cui23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6522
lastpage: 6544
published: 2023-07-03 00:00:00 +0000
- title: 'IRNeXt: Rethinking Convolutional Network Design for Image Restoration'
abstract: 'We present IRNeXt, a simple yet effective convolutional network architecture for image restoration. Recently, Transformer models have dominated the field of image restoration due to the powerful ability of modeling long-range pixels interactions. In this paper, we excavate the potential of the convolutional neural network (CNN) and show that our CNN-based model can receive comparable or better performance than Transformer models with low computation overhead on several image restoration tasks. By re-examining the characteristics possessed by advanced image restoration algorithms, we discover several key factors leading to the performance improvement of restoration models. This motivates us to develop a novel network for image restoration based on cheap convolution operators. Comprehensive experiments demonstrate that IRNeXt delivers state-of-the-art performance among numerous datasets on a range of image restoration tasks with low computational complexity, including image dehazing, single-image defocus/motion deblurring, image deraining, and image desnowing. https://github.com/c-yn/IRNeXt.'
volume: 202
URL: https://proceedings.mlr.press/v202/cui23d.html
PDF: https://proceedings.mlr.press/v202/cui23d/cui23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cui23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuning
family: Cui
- given: Wenqi
family: Ren
- given: Sining
family: Yang
- given: Xiaochun
family: Cao
- given: Alois
family: Knoll
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6545-6564
id: cui23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6545
lastpage: 6564
published: 2023-07-03 00:00:00 +0000
- title: 'Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory'
abstract: 'Dataset Distillation is a newly emerging area that aims to distill large datasets into much smaller and highly informative synthetic ones to accelerate training and reduce storage. Among various dataset distillation methods, trajectory-matching-based methods (MTT) have achieved SOTA performance in many tasks, e.g., on CIFAR-10/100. However, due to exorbitant memory consumption when unrolling optimization through SGD steps, MTT fails to scale to large-scale datasets such as ImageNet-1K. Can we scale this SOTA method to ImageNet-1K and does its effectiveness on CIFAR transfer to ImageNet-1K? To answer these questions, we first propose a procedure to exactly compute the unrolled gradient with constant memory complexity, which allows us to scale MTT to ImageNet-1K seamlessly with $\sim 6$x reduction in memory footprint. We further discover that it is challenging for MTT to handle datasets with a large number of classes, and propose a novel soft label assignment that drastically improves its convergence. The resulting algorithm sets new SOTA on ImageNet-1K: we can scale up to 50 IPCs (Image Per Class) on ImageNet-1K on a single GPU (all previous methods can only scale to 2 IPCs on ImageNet-1K), leading to the best accuracy (only 5.9% accuracy drop against full dataset training) while utilizing only 4.2% of the number of data points - an 18.2% absolute gain over prior SOTA.'
volume: 202
URL: https://proceedings.mlr.press/v202/cui23e.html
PDF: https://proceedings.mlr.press/v202/cui23e/cui23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cui23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Justin
family: Cui
- given: Ruochen
family: Wang
- given: Si
family: Si
- given: Cho-Jui
family: Hsieh
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6565-6590
id: cui23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6565
lastpage: 6590
published: 2023-07-03 00:00:00 +0000
- title: 'Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation'
abstract: 'Transformer-based detection and segmentation methods use a list of learned detection queries to retrieve information from the transformer network and learn to predict the location and category of one specific object from each query. We empirically find that random convex combinations of the learned queries are still good for the corresponding models. We then propose to learn a convex combination with dynamic coefficients based on the high-level semantics of the image. The generated dynamic queries, named as modulated queries, better capture the prior of object locations and categories in the different images. Equipped with our modulated queries, a wide range of DETR-based models achieve consistent and superior performance across multiple tasks (object detection, instance segmentation, panoptic segmentation) and on different benchmarks (MS COCO, CityScapes, YoutubeVIS).'
volume: 202
URL: https://proceedings.mlr.press/v202/cui23f.html
PDF: https://proceedings.mlr.press/v202/cui23f/cui23f.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cui23f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yiming
family: Cui
- given: Linjie
family: Yang
- given: Haichao
family: Yu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6591-6602
id: cui23f
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6591
lastpage: 6602
published: 2023-07-03 00:00:00 +0000
- title: 'Adaptive Identification of Populations with Treatment Benefit in Clinical Trials: Machine Learning Challenges and Solutions'
abstract: 'We study the problem of adaptively identifying patient subpopulations that benefit from a given treatment during a confirmatory clinical trial. This type of adaptive clinical trial has been thoroughly studied in biostatistics, but has been allowed only limited adaptivity so far. Here, we aim to relax classical restrictions on such designs and investigate how to incorporate ideas from the recent machine learning literature on adaptive and online experimentation to make trials more flexible and efficient. We find that the unique characteristics of the subpopulation selection problem – most importantly that (i) one is usually interested in finding subpopulations with any treatment benefit (and not necessarily the single subgroup with largest effect) given a limited budget and that (ii) effectiveness only has to be demonstrated across the subpopulation on average – give rise to interesting challenges and new desiderata when designing algorithmic solutions. Building on these findings, we propose AdaGGI and AdaGCPI, two meta-algorithms for subpopulation construction. We empirically investigate their performance across a range of simulation scenarios and derive insights into their (dis)advantages across different settings.'
volume: 202
URL: https://proceedings.mlr.press/v202/curth23a.html
PDF: https://proceedings.mlr.press/v202/curth23a/curth23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-curth23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alicia
family: Curth
- given: Alihan
family: Hüyük
- given: Mihaela
family: Van Der Schaar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6603-6622
id: curth23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6603
lastpage: 6622
published: 2023-07-03 00:00:00 +0000
- title: 'In Search of Insights, Not Magic Bullets: Towards Demystification of the Model Selection Dilemma in Heterogeneous Treatment Effect Estimation'
abstract: 'Personalized treatment effect estimates are often of interest in high-stakes applications – thus, before deploying a model estimating such effects in practice, one needs to be sure that the best candidate from the ever-growing machine learning toolbox for this task was chosen. Unfortunately, due to the absence of counterfactual information in practice, it is usually not possible to rely on standard validation metrics for doing so, leading to a well-known model selection dilemma in the treatment effect estimation literature. While some solutions have recently been investigated, systematic understanding of the strengths and weaknesses of different model selection criteria is still lacking. In this paper, instead of attempting to declare a global ‘winner’, we therefore empirically investigate success- and failure modes of different selection criteria. We highlight that there is a complex interplay between selection strategies, candidate estimators and the data used for comparing them, and provide interesting insights into the relative (dis)advantages of different criteria alongside desiderata for the design of further illuminating empirical studies in this context.'
volume: 202
URL: https://proceedings.mlr.press/v202/curth23b.html
PDF: https://proceedings.mlr.press/v202/curth23b/curth23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-curth23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alicia
family: Curth
- given: Mihaela
family: Van Der Schaar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6623-6642
id: curth23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6623
lastpage: 6642
published: 2023-07-03 00:00:00 +0000
- title: 'Optimal Stochastic Non-smooth Non-convex Optimization through Online-to-Non-convex Conversion'
abstract: 'We present new algorithms for optimizing non-smooth, non-convex stochastic objectives based on a novel analysis technique. This improves the current best-known complexity for finding a $(\delta,\epsilon)$-stationary point from $O(\epsilon^{-4}\delta^{-1})$ stochastic gradient queries to $O(\epsilon^{-3}\delta^{-1})$, which we also show to be optimal. Our primary technique is a reduction from non-smooth non-convex optimization to *online learning*, after which our results follow from standard regret bounds in online learning. For *deterministic and second-order smooth* objectives, applying more advanced optimistic online learning techniques enables a new complexity of $O(\epsilon^{-1.5}\delta^{-0.5})$. Our improved non-smooth analysis also immediately recovers all optimal or best-known results for finding $\epsilon$ stationary points of smooth or second-order smooth objectives in both stochastic and deterministic settings.'
volume: 202
URL: https://proceedings.mlr.press/v202/cutkosky23a.html
PDF: https://proceedings.mlr.press/v202/cutkosky23a/cutkosky23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cutkosky23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ashok
family: Cutkosky
- given: Harsh
family: Mehta
- given: Francesco
family: Orabona
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6643-6670
id: cutkosky23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6643
lastpage: 6670
published: 2023-07-03 00:00:00 +0000
- title: 'Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps'
abstract: 'Optimal transport (OT) theory focuses, among all maps $T:\mathbb{R}^d\rightarrow \mathbb{R}^d$ that can morph a probability measure $\mu$ onto another $\nu$, on those that are the “thriftiest”, i.e. such that the average cost $c(x, T(x))$ between $x$ and its image $T(x)$ is as small as possible. Many computational approaches have been proposed to estimate such *Monge* maps when $c$ is the squared-Euclidean distance, e.g., using entropic maps [Pooladian+2021], or input convex neural networks [Makkuva+2020, Korotin+2020]. We propose a new research direction, that leverages a specific translation invariant cost $c(x, y):=h(x-y)$ inspired by the elastic net. Here, $h:=\tfrac{1}{2}\|\cdot\|_2^2+\tau(\cdot)$, where $\tau$ is a convex function. We highlight a surprising link tying together a generalized entropic map for $h$, *Bregman* centroids induced by $h$, and the proximal operator of $\tau$. We show how setting $\tau$ to be a sparsity-inducing norm results in the first application of *Occam*’s razor to transport. These maps yield, mechanically, displacement vectors $\Delta(x):= T(x)-x$ that are sparse, with sparsity patterns that vary depending on $x$. We showcase the ability of our method to estimate meaningful OT maps for high-dimensional single-cell transcription data. We use our methods in the $34000$-d space of gene counts for cells, *without* using a prior dimensionality reduction, thus retaining the ability to interpret all displacements at the gene level.'
volume: 202
URL: https://proceedings.mlr.press/v202/cuturi23a.html
PDF: https://proceedings.mlr.press/v202/cuturi23a/cuturi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cuturi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marco
family: Cuturi
- given: Michal
family: Klein
- given: Pierre
family: Ablin
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6671-6682
id: cuturi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6671
lastpage: 6682
published: 2023-07-03 00:00:00 +0000
- title: 'From Noisy Fixed-Point Iterations to Private ADMM for Centralized and Federated Learning'
abstract: 'We study differentially private (DP) machine learning algorithms as instances of noisy fixed-point iterations, in order to derive privacy and utility results from this well-studied framework. We show that this new perspective recovers popular private gradient-based methods like DP-SGD and provides a principled way to design and analyze new private optimization algorithms in a flexible manner. Focusing on the widely-used Alternating Directions Method of Multipliers (ADMM) method, we use our general framework derive novel private ADMM algorithms for centralized, federated and fully decentralized learning. We establish strong privacy guarantees for these algorithms, leveraging privacy amplification by iteration and by subsampling. Finally, we provide utility guarantees for the three algorithms using a unified analysis that exploits a recent linear convergence result for noisy fixed-point iterations.'
volume: 202
URL: https://proceedings.mlr.press/v202/cyffers23a.html
PDF: https://proceedings.mlr.press/v202/cyffers23a/cyffers23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-cyffers23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Edwige
family: Cyffers
- given: Aurélien
family: Bellet
- given: Debabrota
family: Basu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6683-6711
id: cyffers23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6683
lastpage: 6711
published: 2023-07-03 00:00:00 +0000
- title: 'Chameleon: Adapting to Peer Images for Planting Durable Backdoors in Federated Learning'
abstract: 'In a federated learning (FL) system, distributed clients upload their local models to a central server to aggregate into a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing images with specific patterns to be misclassified into some target labels. Backdoors planted by current attacks are not durable, and vanish quickly once the attackers stop model poisoning. In this paper, we investigate the connection between the durability of FL backdoors and the relationships between benign images and poisoned images (i.e., the images whose labels are flipped to the target label during local training). Specifically, benign images with the original and the target labels of the poisoned images are found to have key effects on backdoor durability. Consequently, we propose a novel attack, Chameleon, which utilizes contrastive learning to further amplify such effects towards a more durable backdoor. Extensive experiments demonstrate that Chameleon significantly extends the backdoor lifespan over baselines by $1.2\times \sim 4\times$, for a wide range of image datasets, backdoor types, and model architectures.'
volume: 202
URL: https://proceedings.mlr.press/v202/dai23a.html
PDF: https://proceedings.mlr.press/v202/dai23a/dai23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dai23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yanbo
family: Dai
- given: Songze
family: Li
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6712-6725
id: dai23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6712
lastpage: 6725
published: 2023-07-03 00:00:00 +0000
- title: 'Refined Regret for Adversarial MDPs with Linear Function Approximation'
abstract: 'We consider learning in an adversarial Markov Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes and the state space can be arbitrarily large. We assume that the Q-function of any policy is linear in some known features, that is, a linear function approximation exists. The best existing regret upper bound for this setting (Luo et al., 2021) is of order $\tilde{\mathcal O}(K^{2/3})$ (omitting all other dependencies), given access to a simulator. This paper provides two algorithms that improve the regret to $\tilde{\mathcal O}(\sqrt K)$ in the same setting. Our first algorithm makes use of a refined analysis of the Follow-the-Regularized-Leader (FTRL) algorithm with the log-barrier regularizer. This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest. Our second algorithm develops a magnitude-reduced loss estimator, further removing the polynomial dependency on the number of actions in the first algorithm and leading to the optimal regret bound (up to logarithmic terms and dependency on the horizon). Moreover, we also extend the first algorithm to simulator-free linear MDPs, which achieves $\tilde{\mathcal O}(K^{8/9})$ regret and greatly improves over the best existing bound $\tilde{\mathcal O}(K^{14/15})$. This algorithm relies on a better alternative to the Matrix Geometric Resampling procedure by Neu & Olkhovskaya (2020), which could again be of independent interest.'
volume: 202
URL: https://proceedings.mlr.press/v202/dai23b.html
PDF: https://proceedings.mlr.press/v202/dai23b/dai23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dai23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yan
family: Dai
- given: Haipeng
family: Luo
- given: Chen-Yu
family: Wei
- given: Julian
family: Zimmert
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6726-6759
id: dai23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6726
lastpage: 6759
published: 2023-07-03 00:00:00 +0000
- title: 'MultiRobustBench: Benchmarking Robustness Against Multiple Attacks'
abstract: 'The bulk of existing research in defending against adversarial examples focuses on defending against a single (typically bounded $\ell_p$-norm) attack, but for a practical setting, machine learning (ML) models should be robust to a wide variety of attacks. In this paper, we present the first unified framework for considering multiple attacks against ML models. Our framework is able to model different levels of learner’s knowledge about the test-time adversary, allowing us to model robustness against unforeseen attacks and robustness against unions of attacks. Using our framework, we present the first leaderboard, MultiRobustBench (https://multirobustbench.github.io), for benchmarking multiattack evaluation which captures performance across attack types and attack strengths. We evaluate the performance of 16 defended models for robustness against a set of 9 different attack types, including $\ell_p$-based threat models, spatial transformations, and color changes, at 20 different attack strengths (180 attacks total). Additionally, we analyze the state of current defenses against multiple attacks. Our analysis shows that while existing defenses have made progress in terms of average robustness across the set of attacks used, robustness against the worst-case attack is still a big open problem as all existing models perform worse than random guessing.'
volume: 202
URL: https://proceedings.mlr.press/v202/dai23c.html
PDF: https://proceedings.mlr.press/v202/dai23c/dai23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dai23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sihui
family: Dai
- given: Saeed
family: Mahloujifar
- given: Chong
family: Xiang
- given: Vikash
family: Sehwag
- given: Pin-Yu
family: Chen
- given: Prateek
family: Mittal
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6760-6785
id: dai23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6760
lastpage: 6785
published: 2023-07-03 00:00:00 +0000
- title: 'Moderately Distributional Exploration for Domain Generalization'
abstract: 'Domain generalization (DG) aims to tackle the distribution shift between training domains and unknown target domains. Generating new domains is one of the most effective approaches, yet its performance gain depends on the distribution discrepancy between the generated and target domains. Distributionally robust optimization is promising to tackle distribution discrepancy by exploring domains in an uncertainty set. However, the uncertainty set may be overwhelmingly large, leading to low-confidence prediction in DG. It is because a large uncertainty set could introduce domains containing semantically different factors from training domains. To address this issue, we propose to perform a $\textit{mo}$derately $\textit{d}$istributional $\textit{e}$xploration (MODE) for domain generalization. Specifically, MODE performs distribution exploration in an uncertainty $\textit{subset}$ that shares the same semantic factors with the training domains. We show that MODE can endow models with provable generalization performance on unknown target domains. The experimental results show that MODE achieves competitive performance compared to state-of-the-art baselines.'
volume: 202
URL: https://proceedings.mlr.press/v202/dai23d.html
PDF: https://proceedings.mlr.press/v202/dai23d/dai23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dai23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rui
family: Dai
- given: Yonggang
family: Zhang
- given: Zhen
family: Fang
- given: Bo
family: Han
- given: Xinmei
family: Tian
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6786-6817
id: dai23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6786
lastpage: 6817
published: 2023-07-03 00:00:00 +0000
- title: 'Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning'
abstract: 'Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, but counteracting off-policy bias without exacerbating variance is challenging. Classically, off-policy bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by the instantaneous Importance Sampling (IS) ratio after each action via eligibility traces. Many off-policy algorithms rely on this mechanism, along with differing protocols for cutting the IS ratios (traces) to combat the variance of the IS estimator. Unfortunately, once a trace has been cut, the effect cannot be easily reversed. This has led to the development of credit-assignment strategies that account for multiple past experiences at a time. These trajectory-aware methods have not been extensively analyzed, and their theoretical justification remains uncertain. In this paper, we propose a multistep operator that unifies per-decision and trajectory-aware methods. We prove convergence conditions for our operator in the tabular setting, establishing the first guarantees for several existing methods as well as many new ones. Finally, we introduce Recency-Bounded Importance Sampling (RBIS), which leverages trajectory awareness to perform robustly across $\lambda$-values in an off-policy control task.'
volume: 202
URL: https://proceedings.mlr.press/v202/daley23a.html
PDF: https://proceedings.mlr.press/v202/daley23a/daley23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-daley23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Brett
family: Daley
- given: Martha
family: White
- given: Christopher
family: Amato
- given: Marlos
family: C. Machado
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6818-6835
id: daley23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6818
lastpage: 6835
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient displacement convex optimization with particle gradient descent'
abstract: 'Particle gradient descent, which uses particles to represent a probability measure and performs gradient descent on particles in parallel, is widely used to optimize functions of probability measures. This paper considers particle gradient descent with a finite number of particles and establishes its theoretical guarantees to optimize functions that are *displacement convex* in measures. Concretely, for Lipschitz displacement convex functions defined on probability over $R^d$, we prove that $O(1/\epsilon^2)$ particles and $O(d/\epsilon^4)$ iterations are sufficient to find the $\epsilon$-optimal solutions. We further provide improved complexity bounds for optimizing smooth displacement convex functions. An application of our results proves the conjecture of *no optimization-barrier up to permutation invariance*, proposed by Entezari et al. (2022), for specific two-layer neural networks with two-dimensional inputs uniformly drawn from unit circle.'
volume: 202
URL: https://proceedings.mlr.press/v202/daneshmand23a.html
PDF: https://proceedings.mlr.press/v202/daneshmand23a/daneshmand23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-daneshmand23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hadi
family: Daneshmand
- given: Jason D.
family: Lee
- given: Chi
family: Jin
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6836-6854
id: daneshmand23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6836
lastpage: 6854
published: 2023-07-03 00:00:00 +0000
- title: 'Multiple Thinking Achieving Meta-Ability Decoupling for Object Navigation'
abstract: 'We propose a meta-ability decoupling (MAD) paradigm, which brings together various object navigation methods in an architecture system, allowing them to mutually enhance each other and evolve together. Based on the MAD paradigm, we design a multiple thinking (MT) model that leverages distinct thinking to abstract various meta-abilities. Our method decouples meta-abilities from three aspects: input, encoding, and reward while employing the multiple thinking collaboration (MTC) module to promote mutual cooperation between thinking. MAD introduces a novel qualitative and quantitative interpretability system for object navigation. Through extensive experiments on AI2-Thor and RoboTHOR, we demonstrate that our method outperforms state-of-the-art (SOTA) methods on both typical and zero-shot object navigation tasks.'
volume: 202
URL: https://proceedings.mlr.press/v202/dang23a.html
PDF: https://proceedings.mlr.press/v202/dang23a/dang23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dang23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ronghao
family: Dang
- given: Lu
family: Chen
- given: Liuyi
family: Wang
- given: Zongtao
family: He
- given: Chengju
family: Liu
- given: Qijun
family: Chen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6855-6872
id: dang23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6855
lastpage: 6872
published: 2023-07-03 00:00:00 +0000
- title: 'Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data'
abstract: 'Modern deep neural networks have achieved impressive performance on tasks from image classification to natural language processing. Surprisingly, these complex systems with massive amounts of parameters exhibit the same structural properties in their last-layer features and classifiers across canonical datasets when training until convergence. In particular, it has been observed that the last-layer features collapse to their class-means, and those class-means are the vertices of a simplex Equiangular Tight Frame (ETF). This phenomenon is known as Neural Collapse (NC). Recent papers have theoretically shown that NC emerges in the global minimizers of training problems with the simplified “unconstrained feature model”. In this context, we take a step further and prove the NC occurrences in deep linear networks for the popular mean squared error (MSE) and cross entropy (CE) losses, showing that global solutions exhibit NC properties across the linear layers. Furthermore, we extend our study to imbalanced data for MSE loss and present the first geometric analysis of NC under bias-free setting. Our results demonstrate the convergence of the last-layer features and classifiers to a geometry consisting of orthogonal vectors, whose lengths depend on the amount of data in their corresponding classes. Finally, we empirically validate our theoretical analyses on synthetic and practical network architectures with both balanced and imbalanced scenarios.'
volume: 202
URL: https://proceedings.mlr.press/v202/dang23b.html
PDF: https://proceedings.mlr.press/v202/dang23b/dang23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dang23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hien
family: Dang
- given: Tho Tran
family: Huu
- given: Stanley
family: Osher
- given: Hung The
family: Tran
- given: Nhat
family: Ho
- given: Tan Minh
family: Nguyen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6873-6947
id: dang23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6873
lastpage: 6947
published: 2023-07-03 00:00:00 +0000
- title: 'Reinforcement Learning Can Be More Efficient with Multiple Rewards'
abstract: 'Reward design is one of the most critical and challenging aspects when formulating a task as a reinforcement learning (RL) problem. In practice, it often takes several attempts of reward specification and learning with it in order to find one that leads to sample-efficient learning of the desired behavior. Instead, in this work, we study whether directly incorporating multiple alternate reward formulations of the same task in a single agent can lead to faster learning. We analyze multi-reward extensions of action-elimination algorithms and prove more favorable instance-dependent regret bounds compared to their single-reward counterparts, both in multi-armed bandits and in tabular Markov decision processes. Our bounds scale for each state-action pair with the inverse of the largest gap among all reward functions. This suggests that learning with multiple rewards can indeed be more sample-efficient, as long as the rewards agree on an optimal policy. We further prove that when rewards do not agree, multi-reward action elimination in multi-armed bandits still learns a policy that is good across all reward functions.'
volume: 202
URL: https://proceedings.mlr.press/v202/dann23a.html
PDF: https://proceedings.mlr.press/v202/dann23a/dann23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dann23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christoph
family: Dann
- given: Yishay
family: Mansour
- given: Mehryar
family: Mohri
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6948-6967
id: dann23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6948
lastpage: 6967
published: 2023-07-03 00:00:00 +0000
- title: 'Best of Both Worlds Policy Optimization'
abstract: 'Policy optimization methods are popular reinforcement learning algorithms in practice and recent works have build theoretical foundation for them by proving $\sqrt{T}$ regret bounds even when the losses are adversarial. Such bounds are tight in the worst case but often overly pessimistic. In this work, we show that by carefully designing the regularizer, bonus terms, and learning rates, one can achieve a more favorable $\text{polylog}(T)$ regret bound when the losses are stochastic, without sacrificing the worst-case guarantee in the adversarial regime. Specifically, we show the first best of both worlds guarantee for policy optimization in tabular MDPs by leveraging either a Tsallis entropy or a Shannon entropy regularizer. Then we show that under known transitions, we can further obtain a first-order regret bound in the adversarial regime by leveraging the log barrier regularizer.'
volume: 202
URL: https://proceedings.mlr.press/v202/dann23b.html
PDF: https://proceedings.mlr.press/v202/dann23b/dann23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dann23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Christoph
family: Dann
- given: Chen-Yu
family: Wei
- given: Julian
family: Zimmert
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 6968-7008
id: dann23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 6968
lastpage: 7008
published: 2023-07-03 00:00:00 +0000
- title: 'Image generation with shortest path diffusion'
abstract: 'The field of image generation has made significant progress thanks to the introduction of Diffusion Models, which learn to progressively reverse a given image corruption. Recently, a few studies introduced alternative ways of corrupting images in Diffusion Models, with an emphasis on blurring. However, these studies are purely empirical and it remains unclear what is the optimal procedure for corrupting an image. In this work, we hypothesize that the optimal procedure minimizes the length of the path taken when corrupting an image towards a given final state. We propose the Fisher metric for the path length, measured in the space of probability distributions. We compute the shortest path according to this metric, and we show that it corresponds to a combination of image sharpening, rather than blurring, and noise deblurring. While the corruption was chosen arbitrarily in previous work, our Shortest Path Diffusion (SPD) determines uniquely the entire spatiotemporal structure of the corruption. We show that SPD improves on strong baselines without any hyperparameter tuning, and outperforms all previous Diffusion Models based on image blurring. Furthermore, any small deviation from the shortest path leads to worse performance, suggesting that SPD provides the optimal procedure to corrupt images. Our work sheds new light on observations made in recent works and provides a new approach to improve diffusion models on images and other types of data.'
volume: 202
URL: https://proceedings.mlr.press/v202/das23a.html
PDF: https://proceedings.mlr.press/v202/das23a/das23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-das23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ayan
family: Das
- given: Stathi
family: Fotiadis
- given: Anil
family: Batra
- given: Farhang
family: Nabiei
- given: Fengting
family: Liao
- given: Sattar
family: Vakili
- given: Da-Shan
family: Shiu
- given: Alberto
family: Bernacchia
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7009-7024
id: das23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7009
lastpage: 7024
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient List-Decodable Regression using Batches'
abstract: 'We demonstrate the use of batches in studying list-decodable linear regression, in which only $\alpha\in (0,1]$ fraction of batches contain genuine samples from a common distribution and the rest can contain arbitrary or even adversarial samples. When genuine batches have $\ge \tilde\Omega(1/\alpha)$ samples each, our algorithm can efficiently find a small list of potential regression parameters, with a high probability that one of them is close to the true parameter. This is the first polynomial time algorithm for list-decodable linear regression, and its sample complexity scales nearly linearly with the dimension of the covariates. The polynomial time algorithm is made possible by the batch structure and may not be feasible without it, as suggested by a recent Statistical Query lower bound (Diakonikolas et al., 2021b).'
volume: 202
URL: https://proceedings.mlr.press/v202/das23b.html
PDF: https://proceedings.mlr.press/v202/das23b/das23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-das23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Abhimanyu
family: Das
- given: Ayush
family: Jain
- given: Weihao
family: Kong
- given: Rajat
family: Sen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7025-7065
id: das23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7025
lastpage: 7065
published: 2023-07-03 00:00:00 +0000
- title: 'Beyond Uniform Lipschitz Condition in Differentially Private Optimization'
abstract: 'Most prior results on differentially private stochastic gradient descent (DP-SGD) are derived under the simplistic assumption of uniform Lipschitzness, i.e., the per-sample gradients are uniformly bounded. We generalize uniform Lipschitzness by assuming that the per-sample gradients have sample-dependent upper bounds, i.e., per-sample Lipschitz constants, which themselves may be unbounded. We provide principled guidance on choosing the clip norm in DP-SGD for convex over-parameterized settings satisfying our general version of Lipschitzness when the per-sample Lipschitz constants are bounded; specifically, we recommend tuning the clip norm only till values up to the minimum per-sample Lipschitz constant. This finds application in the private training of a softmax layer on top of a deep network pre-trained on public data. We verify the efficacy of our recommendation via experiments on 8 datasets. Furthermore, we provide new convergence results for DP-SGD on convex and nonconvex functions when the Lipschitz constants are unbounded but have bounded moments, i.e., they are heavy-tailed.'
volume: 202
URL: https://proceedings.mlr.press/v202/das23c.html
PDF: https://proceedings.mlr.press/v202/das23c/das23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-das23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rudrajit
family: Das
- given: Satyen
family: Kale
- given: Zheng
family: Xu
- given: Tong
family: Zhang
- given: Sujay
family: Sanghavi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7066-7101
id: das23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7066
lastpage: 7101
published: 2023-07-03 00:00:00 +0000
- title: 'Understanding Self-Distillation in the Presence of Label Noise'
abstract: 'Self-distillation (SD) is the process of first training a "teacher" model and then using its predictions to train a "student" model that has the *same* architecture. Specifically, the student’s loss is $\big(\xi*\ell(\text{teacher’s predictions}, \text{ student’s predictions}) + (1-\xi)*\ell(\text{given labels}, \text{ student’s predictions})\big)$, where $\ell$ is the loss function and $\xi$ is some parameter $\in [0,1]$. SD has been empirically observed to provide performance gains in several settings. In this paper, we theoretically characterize the effect of SD in two supervised learning problems with *noisy labels*. We first analyze SD for regularized linear regression and show that in the high label noise regime, the optimal value of $\xi$ that minimizes the expected error in estimating the ground truth parameter is surprisingly greater than 1. Empirically, we show that $\xi > 1$ works better than $\xi \leq 1$ even with the cross-entropy loss for several classification datasets when 50% or 30% of the labels are corrupted. Further, we quantify when optimal SD is better than optimal regularization. Next, we analyze SD in the case of logistic regression for binary classification with random label corruption and quantify the range of label corruption in which the student outperforms the teacher (w.r.t. accuracy). To our knowledge, this is the first result of its kind for the cross-entropy loss.'
volume: 202
URL: https://proceedings.mlr.press/v202/das23d.html
PDF: https://proceedings.mlr.press/v202/das23d/das23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-das23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rudrajit
family: Das
- given: Sujay
family: Sanghavi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7102-7140
id: das23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7102
lastpage: 7140
published: 2023-07-03 00:00:00 +0000
- title: 'Interval Bound Interpolation for Few-shot Learning with Few Tasks'
abstract: 'Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks to unseen tasks from the same task distribution, with a limited amount of labeled data. The underlying requirement for effective few-shot generalization is to learn a good representation of the task manifold. This becomes more difficult when only a limited number of tasks are available for training. In such a few-task few-shot setting, it is beneficial to explicitly preserve the local neighborhoods from the task manifold and exploit this to generate artificial tasks for training. To this end, we introduce the notion of interval bounds from the provably robust training literature to few-shot learning. The interval bounds are used to characterize neighborhoods around the training tasks. These neighborhoods can then be preserved by minimizing the distance between a task and its respective bounds. We then use a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds. We apply our framework to both model-agnostic meta-learning as well as prototype-based metric-learning paradigms. The efficacy of our proposed approach is evident from the improved performance on several datasets from diverse domains in comparison to recent methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/datta23a.html
PDF: https://proceedings.mlr.press/v202/datta23a/datta23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-datta23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shounak
family: Datta
- given: Sankha Subhra
family: Mullick
- given: Anish
family: Chakrabarty
- given: Swagatam
family: Das
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7141-7166
id: datta23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7141
lastpage: 7166
published: 2023-07-03 00:00:00 +0000
- title: 'Hypervolume Knowledge Gradient: A Lookahead Approach for Multi-Objective Bayesian Optimization with Partial Information'
abstract: 'Bayesian optimization is a popular method for sample efficient multi-objective optimization. However, existing Bayesian optimization techniques fail to effectively exploit common and often-neglected problem structure such as decoupled evaluations, where objectives can be queried independently from one another and each may consume different resources, or multi-fidelity evaluations, where lower fidelity-proxies of the objectives can be evaluated at lower cost. In this work, we propose a general one-step lookahead acquisition function based on the Knowledge Gradient that addresses the complex question of what to evaluate when and at which design points in a principled Bayesian decision-theoretic fashion. Hence, our approach naturally addresses decoupled, multi-fidelity, and standard multi-objective optimization settings in a unified Bayesian decision making framework. By construction, our method is the one-step Bayes-optimal policy for hypervolume maximization. Empirically, we demonstrate that our method improves sample efficiency in a wide variety of synthetic and real-world problems. Furthermore, we show that our method is general-purpose and yields competitive performance in standard (potentially noisy) multi-objective optimization.'
volume: 202
URL: https://proceedings.mlr.press/v202/daulton23a.html
PDF: https://proceedings.mlr.press/v202/daulton23a/daulton23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-daulton23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sam
family: Daulton
- given: Maximilian
family: Balandat
- given: Eytan
family: Bakshy
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7167-7204
id: daulton23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7167
lastpage: 7204
published: 2023-07-03 00:00:00 +0000
- title: 'Fast Combinatorial Algorithms for Min Max Correlation Clustering'
abstract: 'We introduce fast algorithms for correlation clustering with respect to the Min Max objective that provide constant factor approximations on complete graphs. Our algorithms are the first purely combinatorial approximation algorithms for this problem. We construct a novel semi-metric on the set of vertices, which we call the correlation metric, that indicates to our clustering algorithms whether pairs of nodes should be in the same cluster. The paper demonstrates empirically that, compared to prior work, our algorithms sacrifice little in the objective quality to obtain significantly better run-time. Moreover, our algorithms scale to larger networks that are effectively intractable for known algorithms.'
volume: 202
URL: https://proceedings.mlr.press/v202/davies23a.html
PDF: https://proceedings.mlr.press/v202/davies23a/davies23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-davies23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sami
family: Davies
- given: Benjamin
family: Moseley
- given: Heather
family: Newman
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7205-7230
id: davies23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7205
lastpage: 7230
published: 2023-07-03 00:00:00 +0000
- title: 'Predictive Flows for Faster Ford-Fulkerson'
abstract: 'Recent work has shown that leveraging learned predictions can improve the running time of algorithms for bipartite matching and similar combinatorial problems. In this work, we build on this idea to improve the performance of the widely used Ford-Fulkerson algorithm for computing maximum flows by seeding Ford-Fulkerson with predicted flows. Our proposed method offers strong theoretical performance in terms of the quality of the prediction. We then consider image segmentation, a common use-case of flows in computer vision, and complement our theoretical analysis with strong empirical results.'
volume: 202
URL: https://proceedings.mlr.press/v202/davies23b.html
PDF: https://proceedings.mlr.press/v202/davies23b/davies23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-davies23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sami
family: Davies
- given: Benjamin
family: Moseley
- given: Sergei
family: Vassilvitskii
- given: Yuyan
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7231-7248
id: davies23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7231
lastpage: 7248
published: 2023-07-03 00:00:00 +0000
- title: 'The Persistent Laplacian for Data Science: Evaluating Higher-Order Persistent Spectral Representations of Data'
abstract: 'Persistent homology is arguably the most successful technique in Topological Data Analysis. It combines homology, a topological feature of a data set, with persistence, which tracks the evolution of homology over different scales. The persistent Laplacian is a recent theoretical development that combines persistence with the combinatorial Laplacian, the higher-order extension of the well-known graph Laplacian. Crucially, the Laplacian encode both the homology of a data set, and some additional geometric information not captured by the homology. Here, we provide the first investigation into the efficacy of the persistence Laplacian as an embedding of data for downstream classification and regression tasks. We extend the persistent Laplacian to cubical complexes so it can be used on images, then evaluate its performance as an embedding method on the MNIST and MoleculeNet datasets, demonstrating that it consistently outperforms persistent homology across tasks.'
volume: 202
URL: https://proceedings.mlr.press/v202/davies23c.html
PDF: https://proceedings.mlr.press/v202/davies23c/davies23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-davies23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Thomas
family: Davies
- given: Zhengchao
family: Wan
- given: Ruben J
family: Sanchez-Garcia
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7249-7263
id: davies23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7249
lastpage: 7263
published: 2023-07-03 00:00:00 +0000
- title: 'Mitigating Propagation Failures in Physics-informed Neural Networks using Retain-Resample-Release (R3) Sampling'
abstract: 'Despite the success of physics-informed neural networks (PINNs) in approximating partial differential equations (PDEs), PINNs can sometimes fail to converge to the correct solution in problems involving complicated PDEs. This is reflected in several recent studies on characterizing the "failure modes" of PINNs, although a thorough understanding of the connection between PINN failure modes and sampling strategies is missing. In this paper, we provide a novel perspective of failure modes of PINNs by hypothesizing that training PINNs relies on successful "propagation" of solution from initial and/or boundary condition points to interior points. We show that PINNs with poor sampling strategies can get stuck at trivial solutions if there are propagation failures, characterized by highly imbalanced PDE residual fields. To mitigate propagation failures, we propose a novel Retain-Resample-Release sampling (R3) algorithm that can incrementally accumulate collocation points in regions of high PDE residuals with little to no computational overhead. We provide an extension of R3 sampling to respect the principle of causality while solving time-dependent PDEs. We theoretically analyze the behavior of R3 sampling and empirically demonstrate its efficacy and efficiency in comparison with baselines on a variety of PDE problems.'
volume: 202
URL: https://proceedings.mlr.press/v202/daw23a.html
PDF: https://proceedings.mlr.press/v202/daw23a/daw23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-daw23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arka
family: Daw
- given: Jie
family: Bu
- given: Sifan
family: Wang
- given: Paris
family: Perdikaris
- given: Anuj
family: Karpatne
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7264-7302
id: daw23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7264
lastpage: 7302
published: 2023-07-03 00:00:00 +0000
- title: 'On the Robustness of Randomized Ensembles to Adversarial Perturbations'
abstract: 'Randomized ensemble classifiers (RECs), where one classifier is randomly selected during inference, have emerged as an attractive alternative to traditional ensembling methods for realizing adversarially robust classifiers with limited compute requirements. However, recent works have shown that existing methods for constructing RECs are more vulnerable than initially claimed, casting major doubts on their efficacy and prompting fundamental questions such as: "When are RECs useful?", "What are their limits?", and "How do we train them?". In this work, we first demystify RECs as we derive fundamental results regarding their theoretical limits, necessary and sufficient conditions for them to be useful, and more. Leveraging this new understanding, we propose a new boosting algorithm (BARRE) for training robust RECs, and empirically demonstrate its effectiveness at defending against strong $\ell_\infty$ norm-bounded adversaries across various network architectures and datasets. Our code can be found at https://github.com/hsndbk4/BARRE.'
volume: 202
URL: https://proceedings.mlr.press/v202/dbouk23a.html
PDF: https://proceedings.mlr.press/v202/dbouk23a/dbouk23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dbouk23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hassan
family: Dbouk
- given: Naresh
family: Shanbhag
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7303-7328
id: dbouk23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7303
lastpage: 7328
published: 2023-07-03 00:00:00 +0000
- title: 'Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute'
abstract: 'Retrieval-augmented language models such as Fusion-in-Decoder are powerful, setting the state of the art on a variety of knowledge-intensive tasks. However, they are also expensive, due to the need to encode a large number of retrieved passages. Some work avoids this cost by pre-encoding a text corpus into a memory and retrieving dense representations directly. However, pre-encoding memory incurs a severe quality penalty as the memory representations are not conditioned on the current input. We propose LUMEN, a hybrid between these two extremes, pre-computing the majority of the retrieval representation and completing the encoding on the fly using a live encoder that is conditioned on the question and fine-tuned for the task. We show that LUMEN significantly outperforms pure memory on multiple question-answering tasks while being much cheaper than FiD, and outperforms both for any given compute budget. Moreover, the advantage of LUMEN over FiD increases with model size.'
volume: 202
URL: https://proceedings.mlr.press/v202/de-jong23a.html
PDF: https://proceedings.mlr.press/v202/de-jong23a/de-jong23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-de-jong23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Michiel
family: De Jong
- given: Yury
family: Zemlyanskiy
- given: Nicholas
family: Fitzgerald
- given: Joshua
family: Ainslie
- given: Sumit
family: Sanghai
- given: Fei
family: Sha
- given: William W.
family: Cohen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7329-7342
id: de-jong23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7329
lastpage: 7342
published: 2023-07-03 00:00:00 +0000
- title: 'Continuous Spatiotemporal Transformer'
abstract: 'Modeling spatiotemporal dynamical systems is a fundamental challenge in machine learning. Transformer models have been very successful in NLP and computer vision where they provide interpretable representations of data. However, a limitation of transformers in modeling continuous dynamical systems is that they are fundamentally discrete time and space models and thus have no guarantees regarding continuous sampling. To address this challenge, we present the Continuous Spatiotemporal Transformer (CST), a new transformer architecture that is designed for modeling of continuous systems. This new framework guarantees a continuous and smooth output via optimization in Sobolev space. We benchmark CST against traditional transformers as well as other spatiotemporal dynamics modeling methods and achieve superior performance in a number of tasks on synthetic and real systems, including learning brain dynamics from calcium imaging data.'
volume: 202
URL: https://proceedings.mlr.press/v202/de-oliveira-fonseca23a.html
PDF: https://proceedings.mlr.press/v202/de-oliveira-fonseca23a/de-oliveira-fonseca23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-de-oliveira-fonseca23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Antonio Henrique
family: De Oliveira Fonseca
- given: Emanuele
family: Zappala
- given: Josue
family: Ortega Caro
- given: David Van
family: Dijk
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7343-7365
id: de-oliveira-fonseca23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7343
lastpage: 7365
published: 2023-07-03 00:00:00 +0000
- title: 'The Value of Out-of-Distribution Data'
abstract: 'Generalization error always improves with more in-distribution data. However, it is an open question what happens as we add out-of-distribution (OOD) data. Intuitively, if the OOD data is quite different, it seems more data would harm generalization error, though if the OOD data are sufficiently similar, much empirical evidence suggests that OOD data can actually improve generalization error. We show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the amount of OOD data. Specifically, we prove that generalization error can improve with small amounts of OOD data, and then get worse than no OOD data with larger amounts. In other words, there is value in training on small amounts of OOD data. We analytically demonstrate these results via Fisher’s Linear Discriminant on synthetic datasets, and empirically demonstrate them via deep networks on computer vision benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS and DomainNet. In the idealistic setting where we know which samples are OOD, we show that these non-monotonic trends can be exploited using an appropriately weighted objective of the target and OOD empirical risk. While its practical utility is limited, this does suggest that if we can detect OOD samples, then there may be ways to benefit from them. When we do not know which samples are OOD, we show how a number of go-to strategies such as data-augmentation, hyper-parameter optimization and pre-training are not enough to ensure that the target generalization error does not deteriorate with the number of OOD samples in the dataset.'
volume: 202
URL: https://proceedings.mlr.press/v202/de-silva23a.html
PDF: https://proceedings.mlr.press/v202/de-silva23a/de-silva23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-de-silva23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ashwin
family: De Silva
- given: Rahul
family: Ramesh
- given: Carey
family: Priebe
- given: Pratik
family: Chaudhari
- given: Joshua T
family: Vogelstein
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7366-7389
id: de-silva23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7366
lastpage: 7389
published: 2023-07-03 00:00:00 +0000
- title: 'High Fidelity Image Counterfactuals with Probabilistic Causal Models'
abstract: 'We present a general causal generative modelling framework for accurate estimation of high fidelity image counterfactuals with deep structural causal models. Estimation of interventional and counterfactual queries for high-dimensional structured variables, such as images, remains a challenging task. We leverage ideas from causal mediation analysis and advances in generative modelling to design new deep causal mechanisms for structured variables in causal models. Our experiments demonstrate that our proposed mechanisms are capable of accurate abduction and estimation of direct, indirect and total effects as measured by axiomatic soundness of counterfactuals.'
volume: 202
URL: https://proceedings.mlr.press/v202/de-sousa-ribeiro23a.html
PDF: https://proceedings.mlr.press/v202/de-sousa-ribeiro23a/de-sousa-ribeiro23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-de-sousa-ribeiro23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Fabio
family: De Sousa Ribeiro
- given: Tian
family: Xia
- given: Miguel
family: Monteiro
- given: Nick
family: Pawlowski
- given: Ben
family: Glocker
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7390-7425
id: de-sousa-ribeiro23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7390
lastpage: 7425
published: 2023-07-03 00:00:00 +0000
- title: 'Learning Noisy OR Bayesian Networks with Max-Product Belief Propagation'
abstract: 'Noisy-OR Bayesian Networks (BNs) are a family of probabilistic graphical models which express rich statistical dependencies in binary data. Variational inference (VI) has been the main method proposed to learn noisy-OR BNs with complex latent structures (Jaakkola & Jordan, 1999; Ji et al., 2020; Buhai et al., 2020). However, the proposed VI approaches either (a) use a recognition network with standard amortized inference that cannot induce "explaining-away"; or (b) assume a simple mean-field (MF) posterior which is vulnerable to bad local optima. Existing MF VI methods also update the MF parameters sequentially which makes them inherently slow. In this paper, we propose parallel max-product as an alternative algorithm for learning noisy-OR BNs with complex latent structures and we derive a fast stochastic training scheme that scales to large datasets. We evaluate both approaches on several benchmarks where VI is the state-of-the-art and show that our method (a) achieves better test performance than Ji et al. (2020) for learning noisy-OR BNs with hierarchical latent structures on large sparse real datasets; (b) recovers a higher number of ground truth parameters than Buhai et al. (2020) from cluttered synthetic scenes; and (c) solves the 2D blind deconvolution problem from Lazaro-Gredilla et al. (2021) and variants - including binary matrix factorization - while VI catastrophically fails and is up to two orders of magnitude slower.'
volume: 202
URL: https://proceedings.mlr.press/v202/dedieu23a.html
PDF: https://proceedings.mlr.press/v202/dedieu23a/dedieu23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dedieu23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Antoine
family: Dedieu
- given: Guangyao
family: Zhou
- given: Dileep
family: George
- given: Miguel
family: Lazaro-Gredilla
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7426-7448
id: dedieu23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7426
lastpage: 7448
published: 2023-07-03 00:00:00 +0000
- title: 'Learning-Rate-Free Learning by D-Adaptation'
abstract: 'The speed of gradient descent for convex Lipschitz functions is highly dependent on the choice of learning rate. Setting the learning rate to achieve the optimal convergence rate requires knowing the distance D from the initial point to the solution set. In this work, we describe a single-loop method, with no back-tracking or line searches, which does not require knowledge of D yet asymptotically achieves the optimal rate of convergence for the complexity class of convex Lipschitz functions. Our approach is the first parameter-free method for this class without additional multiplicative log factors in the convergence rate. We present extensive experiments for SGD and Adam variants of our method, where the method automatically matches hand-tuned learning rates across more than a dozen diverse machine learning problems, including large-scale vision and language problems. Our method is practical, efficient and requires no additional function value or gradient evaluations each step. An implementation is provided in the supplementary material.'
volume: 202
URL: https://proceedings.mlr.press/v202/defazio23a.html
PDF: https://proceedings.mlr.press/v202/defazio23a/defazio23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-defazio23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aaron
family: Defazio
- given: Konstantin
family: Mishchenko
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7449-7479
id: defazio23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7449
lastpage: 7479
published: 2023-07-03 00:00:00 +0000
- title: 'Scaling Vision Transformers to 22 Billion Parameters'
abstract: 'The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al., 2022). We present a recipe for highly efficient and stable training of a 22B-parameter ViT (ViT-22B) and perform a wide variety of experiments on the resulting model. When evaluated on downstream tasks (often with a lightweight linear model on frozen features), ViT-22B demonstrates increasing performance with scale. We further observe other interesting benefits of scale, including an improved tradeoff between fairness and performance, state-of-the-art alignment to human visual perception in terms of shape/texture bias, and improved robustness. ViT-22B demonstrates the potential for "LLM-like" scaling in vision, and provides key steps towards getting there.'
volume: 202
URL: https://proceedings.mlr.press/v202/dehghani23a.html
PDF: https://proceedings.mlr.press/v202/dehghani23a/dehghani23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dehghani23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mostafa
family: Dehghani
- given: Josip
family: Djolonga
- given: Basil
family: Mustafa
- given: Piotr
family: Padlewski
- given: Jonathan
family: Heek
- given: Justin
family: Gilmer
- given: Andreas Peter
family: Steiner
- given: Mathilde
family: Caron
- given: Robert
family: Geirhos
- given: Ibrahim
family: Alabdulmohsin
- given: Rodolphe
family: Jenatton
- given: Lucas
family: Beyer
- given: Michael
family: Tschannen
- given: Anurag
family: Arnab
- given: Xiao
family: Wang
- given: Carlos
family: Riquelme Ruiz
- given: Matthias
family: Minderer
- given: Joan
family: Puigcerver
- given: Utku
family: Evci
- given: Manoj
family: Kumar
- given: Sjoerd Van
family: Steenkiste
- given: Gamaleldin Fathy
family: Elsayed
- given: Aravindh
family: Mahendran
- given: Fisher
family: Yu
- given: Avital
family: Oliver
- given: Fantine
family: Huot
- given: Jasmijn
family: Bastings
- given: Mark
family: Collier
- given: Alexey A.
family: Gritsenko
- given: Vighnesh
family: Birodkar
- given: Cristina Nader
family: Vasconcelos
- given: Yi
family: Tay
- given: Thomas
family: Mensink
- given: Alexander
family: Kolesnikov
- given: Filip
family: Pavetic
- given: Dustin
family: Tran
- given: Thomas
family: Kipf
- given: Mario
family: Lucic
- given: Xiaohua
family: Zhai
- given: Daniel
family: Keysers
- given: Jeremiah J.
family: Harmsen
- given: Neil
family: Houlsby
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7480-7512
id: dehghani23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7480
lastpage: 7512
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration'
abstract: 'Since the control of the Lipschitz constant has a great impact on the training stability, generalization, and robustness of neural networks, the estimation of this value is nowadays a real scientific challenge. In this paper we introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory and a new alternative to the Power iteration. Called the Gram iteration, our approach exhibits a superlinear convergence. First, we show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. Then, it proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.'
volume: 202
URL: https://proceedings.mlr.press/v202/delattre23a.html
PDF: https://proceedings.mlr.press/v202/delattre23a/delattre23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-delattre23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Blaise
family: Delattre
- given: Quentin
family: Barthélemy
- given: Alexandre
family: Araujo
- given: Alexandre
family: Allauzen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7513-7532
id: delattre23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7513
lastpage: 7532
published: 2023-07-03 00:00:00 +0000
- title: 'Blossom: an Anytime Algorithm for Computing Optimal Decision Trees'
abstract: 'We propose a simple algorithm to learn optimal decision trees of bounded depth. This algorithm is essentially an anytime version of the state-of-the-art dynamic programming approach. It has virtually no overhead compared to heuristic methods and is comparable to the best exact methods to prove optimality on most data sets. Experiments show that whereas existing exact methods hardly scale to deep trees, this algorithm learns trees comparable to standard heuristics without computational overhead, and can significantly improve their accuracy when given more computation time, even for deep trees.'
volume: 202
URL: https://proceedings.mlr.press/v202/demirovic23a.html
PDF: https://proceedings.mlr.press/v202/demirovic23a/demirovic23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-demirovic23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Emir
family: Demirović
- given: Emmanuel
family: Hebrard
- given: Louis
family: Jean
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7533-7562
id: demirovic23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7533
lastpage: 7562
published: 2023-07-03 00:00:00 +0000
- title: 'Optimizing NOTEARS Objectives via Topological Swaps'
abstract: 'Recently, an intriguing class of non-convex optimization problems has emerged in the context of learning directed acyclic graphs (DAGs). These problems involve minimizing a given loss or score function, subject to a non-convex continuous constraint that penalizes the presence of cycles in a graph. In this work, we delve into the optimality challenges associated with this class of non-convex programs. To address these challenges, we propose a bi-level algorithm that leverages the non-convex constraint in a novel way. The outer level of the algorithm optimizes over topological orders by iteratively swapping pairs of nodes within the topological order of a DAG. A key innovation of our approach is the development of an effective method for generating a set of candidate swapping pairs for each iteration. At the inner level, given a topological order, we utilize off-the-shelf solvers that can handle linear constraints. The key advantage of our proposed algorithm is that it is guaranteed to find a local minimum or a KKT point under weaker conditions compared to previous work and finds solutions with lower scores. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in terms of achieving a better score. Additionally, our method can also be used as a post-processing algorithm to significantly improve the score of other algorithms. Code implementing the proposed method is available at https://github.com/duntrain/topo.'
volume: 202
URL: https://proceedings.mlr.press/v202/deng23a.html
PDF: https://proceedings.mlr.press/v202/deng23a/deng23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-deng23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chang
family: Deng
- given: Kevin
family: Bello
- given: Bryon
family: Aragam
- given: Pradeep Kumar
family: Ravikumar
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7563-7595
id: deng23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7563
lastpage: 7595
published: 2023-07-03 00:00:00 +0000
- title: 'Uncertainty Estimation by Fisher Information-based Evidential Deep Learning'
abstract: 'Uncertainty estimation is a key factor that makes deep learning reliable in practical applications. Recently proposed evidential neural networks explicitly account for different uncertainties by treating the network’s outputs as evidence to parameterize the Dirichlet distribution, and achieve impressive performance in uncertainty estimation. However, for high data uncertainty samples but annotated with the one-hot label, the evidence-learning process for those mislabeled classes is over-penalized and remains hindered. To address this problem, we propose a novel method, Fisher Information-based Evidential Deep Learning ($\mathcal{I}$-EDL). In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focus on the representation learning of uncertain classes. The generalization ability of our network is further improved by optimizing the PAC-Bayesian bound. As demonstrated empirically, our proposed method consistently outperforms traditional EDL-related algorithms in multiple uncertainty estimation tasks, especially in the more challenging few-shot classification settings.'
volume: 202
URL: https://proceedings.mlr.press/v202/deng23b.html
PDF: https://proceedings.mlr.press/v202/deng23b/deng23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-deng23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Danruo
family: Deng
- given: Guangyong
family: Chen
- given: Yang
family: Yu
- given: Furui
family: Liu
- given: Pheng-Ann
family: Heng
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7596-7616
id: deng23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7596
lastpage: 7616
published: 2023-07-03 00:00:00 +0000
- title: 'Multi-channel Autobidding with Budget and ROI Constraints'
abstract: 'In digital online advertising, advertisers procure ad impressions simultaneously on multiple platforms, or so-called channels, such as Google Ads, Meta Ads Manager, etc., each of which consists of numerous ad auctions. We study how an advertiser maximizes total conversion (e.g. ad clicks) while satisfying aggregate return-on-investment (ROI) and budget constraints across all channels. In practice, an advertiser does not have control over, and thus cannot globally optimize, which individual ad auctions she participates in for each channel, and instead authorizes a channel to procure impressions on her behalf: the advertiser can only utilize two levers on each channel, namely setting a per-channel budget and per-channel target ROI. In this work, we first analyze the effectiveness of each of these levers for solving the advertiser’s global multi-channel problem. We show that when an advertiser only optimizes over per-channel ROIs, her total conversion can be arbitrarily worse than what she could have obtained in the global problem. Further, we show that the advertiser can achieve the global optimal conversion when she only optimizes over per-channel budgets. In light of this finding, under a bandit feedback setting that mimics real-world scenarios where advertisers have limited information on ad auctions in each channels and how channels procure ads, we present an efficient learning algorithm that produces per-channel budgets whose resulting conversion approximates that of the global optimal problem.'
volume: 202
URL: https://proceedings.mlr.press/v202/deng23c.html
PDF: https://proceedings.mlr.press/v202/deng23c/deng23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-deng23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuan
family: Deng
- given: Negin
family: Golrezaei
- given: Patrick
family: Jaillet
- given: Jason Cheuk Nam
family: Liang
- given: Vahab
family: Mirrokni
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7617-7644
id: deng23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7617
lastpage: 7644
published: 2023-07-03 00:00:00 +0000
- title: 'Surrogate Module Learning: Reduce the Gradient Error Accumulation in Training Spiking Neural Networks'
abstract: 'Spiking neural networks provide an alternative solution to conventional artificial neural networks with energy-saving and high-efficiency characteristics after hardware implantation. However, due to its non-differentiable activation function and the temporally delayed accumulation in outputs, the direct training of SNNs is extraordinarily tough even adopting a surrogate gradient to mimic the backpropagation. For SNN training, this non-differentiability causes the intrinsic gradient error that would be magnified through layerwise backpropagation, especially through multiple layers. In this paper, we propose a novel approach to reducing gradient error from a new perspective called surrogate module learning (SML). Surrogate module learning tries to construct a shortcut path to back-propagate more accurate gradient to a certain SNN part utilizing the surrogate modules. Then, we develop a new loss function for concurrently training the network and enhancing the surrogate modules’ surrogate capacity. We demonstrate that when the outputs of surrogate modules are close to the SNN output, the fraction of the gradient error drops significantly. Our method consistently and significantly enhances the performance of SNNs on all experiment datasets, including CIFAR-10/100, ImageNet, and ES-ImageNet. For example, for spiking ResNet-34 architecture on ImageNet, we increased the SNN accuracy by 3.46%.'
volume: 202
URL: https://proceedings.mlr.press/v202/deng23d.html
PDF: https://proceedings.mlr.press/v202/deng23d/deng23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-deng23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shikuang
family: Deng
- given: Hao
family: Lin
- given: Yuhang
family: Li
- given: Shi
family: Gu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7645-7657
id: deng23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7645
lastpage: 7657
published: 2023-07-03 00:00:00 +0000
- title: 'Confidence and Dispersity Speak: Characterizing Prediction Matrix for Unsupervised Accuracy Estimation'
abstract: 'This work aims to assess how well a model performs under distribution shifts without using labels. While recent methods study prediction confidence, this work reports prediction dispersity is another informative cue. Confidence reflects whether the individual prediction is certain; dispersity indicates how the overall predictions are distributed across all categories. Our key insight is that a well-performing model should give predictions with high confidence and high dispersity. That is, we need to consider both properties so as to make more accurate estimates. To this end, we use nuclear norm that has been shown to be effective in characterizing both properties. Extensive experiments validate the effectiveness of nuclear norm for various models (e.g., ViT and ConvNeXt), different datasets (e.g., ImageNet and CUB-200), and diverse types of distribution shifts (e.g., style shift and reproduction shift). We show that nuclear norm is more accurate and robust in accuracy estimation than existing methods. Furthermore, we validate the feasibility of other measurements (e.g., mutual information maximization) for characterizing dispersity and confidence. Lastly, we investigate the limitation of the nuclear norm, study its improved variant under severe class imbalance, and discuss potential directions.'
volume: 202
URL: https://proceedings.mlr.press/v202/deng23e.html
PDF: https://proceedings.mlr.press/v202/deng23e/deng23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-deng23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Weijian
family: Deng
- given: Yumin
family: Suh
- given: Stephen
family: Gould
- given: Liang
family: Zheng
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7658-7674
id: deng23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7658
lastpage: 7674
published: 2023-07-03 00:00:00 +0000
- title: 'Great Models Think Alike: Improving Model Reliability via Inter-Model Latent Agreement'
abstract: 'Reliable application of machine learning is of primary importance to the practical deployment of deep learning methods. A fundamental challenge is that models are often unreliable due to overconfidence. In this paper, we estimate a model’s reliability by measuring the agreement between its latent space, and the latent space of a foundation model. However, it is challenging to measure the agreement between two different latent spaces due to their incoherence, e.g., arbitrary rotations and different dimensionality. To overcome this incoherence issue, we design a neighborhood agreement measure between latent spaces and find that this agreement is surprisingly well-correlated with the reliability of a model’s predictions. Further, we show that fusing neighborhood agreement into a model’s predictive confidence in a post-hoc way significantly improves its reliability. Theoretical analysis and extensive experiments on failure detection across various datasets verify the effectiveness of our method on both in-distribution and out-of-distribution settings.'
volume: 202
URL: https://proceedings.mlr.press/v202/deng23f.html
PDF: https://proceedings.mlr.press/v202/deng23f/deng23f.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-deng23f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ailin
family: Deng
- given: Miao
family: Xiong
- given: Bryan
family: Hooi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7675-7693
id: deng23f
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7675
lastpage: 7693
published: 2023-07-03 00:00:00 +0000
- title: 'Hyperbolic Image-text Representations'
abstract: 'Visual and linguistic concepts naturally organize themselves in a hierarchy, where a textual concept "dog" entails all images that contain dogs. Despite being intuitive, current large-scale vision and language models such as CLIP do not explicitly capture such hierarchy. We propose MERU, a contrastive model that yields hyperbolic representations of images and text. Hyperbolic spaces have suitable geometric properties to embed tree-like data, so MERU can better capture the underlying hierarchy in image-text datasets. Our results show that MERU learns a highly interpretable and structured representation space while being competitive with CLIP’s performance on standard multi-modal tasks like image classification and image-text retrieval.'
volume: 202
URL: https://proceedings.mlr.press/v202/desai23a.html
PDF: https://proceedings.mlr.press/v202/desai23a/desai23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-desai23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Karan
family: Desai
- given: Maximilian
family: Nickel
- given: Tanmay
family: Rajpurohit
- given: Justin
family: Johnson
- given: Shanmukha Ramakrishna
family: Vedantam
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7694-7731
id: desai23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7694
lastpage: 7731
published: 2023-07-03 00:00:00 +0000
- title: 'Hardware-Aware Compression with Random Operation Access Specific Tile (ROAST) Hashing'
abstract: 'Advancements in deep learning are often associated with increasing model sizes. Training and deploying large models require sophisticated hardware and incur significantly higher costs. Thus, model compression is a widely explored approach to solving the problem. However, SOTA techniques fall short in one or more desirable aspects of compression - for instance, pruning does not reduce memory for training, quantization can only provide up to 32$\times$ compression, HashedNet is cache-inefficient, etc. This paper proposes a model-agnostic, cache-friendly, and hardware-aware model compression approach: Random Operation Access Specific Tile (ROAST) hashing. ROAST collapses the parameters by clubbing them through a lightweight mapping. While clubbing these parameters, ROAST utilizes cache hierarchies by aligning the memory access pattern with the parameter access pattern. ROAST is up to ${\sim}25\times$ faster to train and ${\sim}50\times$ faster to infer than the popular parameter sharing method HashedNet. Additionally, ROAST introduces global weight sharing, which is empirically and theoretically superior to local weight sharing in HashedNet, and can be of independent interest. With ROAST, we can efficiently train and deploy the model using a much smaller memory footprint ($\sim 10 - 100\times$ lesser) in text and image classification tasks. ROAST-MM kernel implementation is open-source (https://github.com/apd10/RzLinear/tree/stable)'
volume: 202
URL: https://proceedings.mlr.press/v202/desai23b.html
PDF: https://proceedings.mlr.press/v202/desai23b/desai23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-desai23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aditya
family: Desai
- given: Keren
family: Zhou
- given: Anshumali
family: Shrivastava
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7732-7749
id: desai23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7732
lastpage: 7749
published: 2023-07-03 00:00:00 +0000
- title: 'The case for 4-bit precision: k-bit Inference Scaling Laws'
abstract: 'Quantization methods reduce the number of bits required to represent each parameter in a model, trading accuracy for smaller memory footprints and inference latencies. However, the final model size depends on both the number of parameters of the original model and the rate of compression. For example, a 30B 8-bit model and a 60B 4-bit model have the same number of bits but may have very different zero-shot accuracies. In this work, we study this trade-off by developing inference scaling laws of zero-shot performance in Large Language Models (LLMs) to determine the bit-precision and model size that maximizes zero-shot performance. We run more than 35,000 experiments with 16-bit inputs and k-bit parameters to examine which zero-shot quantization methods improve scaling for 3 to 8-bit precision at scales of 19M to 176B parameters across the LLM families BLOOM, OPT, NeoX/Pythia, and GPT-2. We find that it is challenging to improve the bit-level scaling trade-off, with the only improvements being the use of a small block size – splitting the parameters into small independently quantized blocks – and the quantization data type being used (e.g., Int vs Float). Overall, our findings show that 4-bit precision is almost universally optimal for total model bits and zero-shot accuracy.'
volume: 202
URL: https://proceedings.mlr.press/v202/dettmers23a.html
PDF: https://proceedings.mlr.press/v202/dettmers23a/dettmers23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dettmers23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tim
family: Dettmers
- given: Luke
family: Zettlemoyer
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7750-7774
id: dettmers23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7750
lastpage: 7774
published: 2023-07-03 00:00:00 +0000
- title: 'Fairness in Matching under Uncertainty'
abstract: 'The prevalence and importance of algorithmic two-sided marketplaces has drawn attention to the issue of fairness in such settings. Algorithmic decisions are used in assigning students to schools, users to advertisers, and applicants to job interviews. These decisions should heed the preferences of individuals, and simultaneously be fair with respect to their merits (synonymous with fit, future performance, or need). Merits conditioned on observable features are always *uncertain*, a fact that is exacerbated by the widespread use of machine learning algorithms to infer merit from the observables. As our key contribution, we carefully axiomatize a notion of individual fairness in the two-sided marketplace setting which respects the uncertainty in the merits; indeed, it simultaneously recognizes uncertainty as the primary potential cause of unfairness and an approach to address it. We design a linear programming framework to find fair utility-maximizing distributions over allocations, and we show that the linear program is robust to perturbations in the estimated parameters of the uncertain merit distributions, a key property in combining the approach with machine learning techniques.'
volume: 202
URL: https://proceedings.mlr.press/v202/devic23a.html
PDF: https://proceedings.mlr.press/v202/devic23a/devic23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-devic23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Siddartha
family: Devic
- given: David
family: Kempe
- given: Vatsal
family: Sharan
- given: Aleksandra
family: Korolova
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7775-7794
id: devic23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7775
lastpage: 7794
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient Parametric Approximations of Neural Network Function Space Distance'
abstract: 'It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset. As a specific case, we consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks. We propose a Linearized Activation Function TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks. The key idea is to approximate the architecture as a linear network with stochastic gating. Despite requiring only one parameter per unit of the network, our approach outcompetes other parametric approximations with larger memory requirements. Applied to continual learning, our parametric approximation is competitive with state-of-the-art nonparametric approximations, which require storing many training examples. Furthermore, we show its efficacy in estimating influence functions accurately and detecting mislabeled examples without expensive iterations over the entire dataset.'
volume: 202
URL: https://proceedings.mlr.press/v202/dhawan23a.html
PDF: https://proceedings.mlr.press/v202/dhawan23a/dhawan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dhawan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nikita
family: Dhawan
- given: Sicong
family: Huang
- given: Juhan
family: Bae
- given: Roger Baker
family: Grosse
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7795-7812
id: dhawan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7795
lastpage: 7812
published: 2023-07-03 00:00:00 +0000
- title: 'A Large-Scale Study of Probabilistic Calibration in Neural Network Regression'
abstract: 'Accurate probabilistic predictions are essential for optimal decision making. While neural network miscalibration has been studied primarily in classification, we investigate this in the less-explored domain of regression. We conduct the largest empirical study to date to assess the probabilistic calibration of neural networks. We also analyze the performance of recalibration, conformal, and regularization methods to enhance probabilistic calibration. Additionally, we introduce novel differentiable recalibration and regularization methods, uncovering new insights into their effectiveness. Our findings reveal that regularization methods offer a favorable tradeoff between calibration and sharpness. Post-hoc methods exhibit superior probabilistic calibration, which we attribute to the finite-sample coverage guarantee of conformal prediction. Furthermore, we demonstrate that quantile recalibration can be considered as a specific case of conformal prediction. Our study is fully reproducible and implemented in a common code base for fair comparisons.'
volume: 202
URL: https://proceedings.mlr.press/v202/dheur23a.html
PDF: https://proceedings.mlr.press/v202/dheur23a/dheur23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dheur23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Victor
family: Dheur
- given: Souhaib
family: Ben Taieb
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7813-7836
id: dheur23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7813
lastpage: 7836
published: 2023-07-03 00:00:00 +0000
- title: 'Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path'
abstract: 'We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. Our algorithm is based on extended value iteration with a fine-grained variance-aware confidence set, where the variance is estimated recursively from high-order moments. Our algorithm achieves an $\tilde{\mathcal{O}}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature mapping in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes. Our regret upper bound matches the $\Omega(dB_*\sqrt{K})$ lower bound of linear mixture SSPs in Min et al. (2022), which suggests that our algorithm is nearly minimax optimal.'
volume: 202
URL: https://proceedings.mlr.press/v202/di23a.html
PDF: https://proceedings.mlr.press/v202/di23a/di23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-di23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Qiwei
family: Di
- given: Jiafan
family: He
- given: Dongruo
family: Zhou
- given: Quanquan
family: Gu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7837-7864
id: di23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7837
lastpage: 7864
published: 2023-07-03 00:00:00 +0000
- title: 'On Over-Squashing in Message Passing Neural Networks: The Impact of Width, Depth, and Topology'
abstract: 'Message Passing Neural Networks (MPNNs) are instances of Graph Neural Networks that leverage the graph to send messages over the edges. This inductive bias leads to a phenomenon known as over-squashing, where a node feature is insensitive to information contained at distant nodes. Despite recent methods introduced to mitigate this issue, an understanding of the causes for over-squashing and of possible solutions are lacking. In this theoretical work, we prove that: (i) Neural network width can mitigate over-squashing, but at the cost of making the whole network more sensitive; (ii) Conversely, depth cannot help mitigate over-squashing: increasing the number of layers leads to over-squashing being dominated by vanishing gradients; (iii) The graph topology plays the greatest role, since over-squashing occurs between nodes at high commute time. Our analysis provides a unified framework to study different recent methods introduced to cope with over-squashing and serves as a justification for a class of methods that fall under graph rewiring.'
volume: 202
URL: https://proceedings.mlr.press/v202/di-giovanni23a.html
PDF: https://proceedings.mlr.press/v202/di-giovanni23a/di-giovanni23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-di-giovanni23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Francesco
family: Di Giovanni
- given: Lorenzo
family: Giusti
- given: Federico
family: Barbero
- given: Giulia
family: Luise
- given: Pietro
family: Lio
- given: Michael M.
family: Bronstein
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7865-7885
id: di-giovanni23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7865
lastpage: 7885
published: 2023-07-03 00:00:00 +0000
- title: 'Nearly-Linear Time and Streaming Algorithms for Outlier-Robust PCA'
abstract: 'We study principal component analysis (PCA), where given a dataset in $\mathbb R^d$ from a distribution, the task is to find a unit vector $v$ that approximately maximizes the variance of the distribution after being projected along $v$. Despite being a classical task, standard estimators fail drastically if the data contains even a small fraction of outliers, motivating the problem of robust PCA. Recent work has developed computationally-efficient algorithms for robust PCA that either take super-linear time or have sub-optimal error guarantees. Our main contribution is to develop a nearly linear time algorithm for robust PCA with near-optimal error guarantees. We also develop a single-pass streaming algorithm for robust PCA with memory usage nearly-linear in the dimension.'
volume: 202
URL: https://proceedings.mlr.press/v202/diakonikolas23a.html
PDF: https://proceedings.mlr.press/v202/diakonikolas23a/diakonikolas23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-diakonikolas23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ilias
family: Diakonikolas
- given: Daniel
family: Kane
- given: Ankit
family: Pensia
- given: Thanasis
family: Pittas
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7886-7921
id: diakonikolas23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7886
lastpage: 7921
published: 2023-07-03 00:00:00 +0000
- title: 'Near-Optimal Cryptographic Hardness of Agnostically Learning Halfspaces and ReLU Regression under Gaussian Marginals'
abstract: 'We study the task of agnostically learning halfspaces under the Gaussian distribution. Specifically, given labeled examples $(\\mathbf{x},y)$ from an unknown distribution on $\\mathbb{R}^n \\times \\{\pm 1 \\}$, whose marginal distribution on $\\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\\mathrm{OPT}+\\epsilon$, where $\\mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. We prove a near-optimal computational hardness result for this task, under the widely believed sub-exponential time hardness of the Learning with Errors (LWE) problem. Prior hardness results are either qualitatively suboptimal or apply to restricted families of algorithms. Our techniques extend to yield near-optimal lower bounds for related problems, including ReLU regression.'
volume: 202
URL: https://proceedings.mlr.press/v202/diakonikolas23b.html
PDF: https://proceedings.mlr.press/v202/diakonikolas23b/diakonikolas23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-diakonikolas23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ilias
family: Diakonikolas
- given: Daniel
family: Kane
- given: Lisheng
family: Ren
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7922-7938
id: diakonikolas23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7922
lastpage: 7938
published: 2023-07-03 00:00:00 +0000
- title: 'Improving Graph Generation by Restricting Graph Bandwidth'
abstract: 'Deep graph generative modeling has proven capable of learning the distribution of complex, multi-scale structures characterizing real-world graphs. However, one of the main limitations of existing methods is their large output space, which limits generation scalability and hinders accurate modeling of the underlying distribution. To overcome these limitations, we propose a novel approach that significantly reduces the output space of existing graph generative models. Specifically, starting from the observation that many real-world graphs have low graph bandwidth, we restrict graph bandwidth during training and generation. Our strategy improves both generation scalability and quality without increasing architectural complexity or reducing expressiveness. Our approach is compatible with existing graph generative methods, and we describe its application to both autoregressive and one-shot models. We extensively validate our strategy on synthetic and real datasets, including molecular graphs. Our experiments show that, in addition to improving generation efficiency, our approach consistently improves generation quality and reconstruction accuracy. The implementation is made available.'
volume: 202
URL: https://proceedings.mlr.press/v202/diamant23a.html
PDF: https://proceedings.mlr.press/v202/diamant23a/diamant23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-diamant23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nathaniel Lee
family: Diamant
- given: Alex M
family: Tseng
- given: Kangway V.
family: Chuang
- given: Tommaso
family: Biancalani
- given: Gabriele
family: Scalia
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7939-7959
id: diamant23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7939
lastpage: 7959
published: 2023-07-03 00:00:00 +0000
- title: 'Forward-Backward Gaussian Variational Inference via JKO in the Bures-Wasserstein Space'
abstract: 'Variational inference (VI) seeks to approximate a target distribution $\pi$ by an element of a tractable family of distributions. Of key interest in statistics and machine learning is Gaussian VI, which approximates $\pi$ by minimizing the Kullback-Leibler (KL) divergence to $\pi$ over the space of Gaussians. In this work, we develop the (Stochastic) Forward-Backward Gaussian Variational Inference (FB-GVI) algorithm to solve Gaussian VI. Our approach exploits the composite structure of the KL divergence, which can be written as the sum of a smooth term (the potential) and a non-smooth term (the entropy) over the Bures-Wasserstein (BW) space of Gaussians endowed with the Wasserstein distance. For our proposed algorithm, we obtain state-of-the-art convergence guarantees when $\pi$ is log-smooth and log-concave, as well as the first convergence guarantees to first-order stationary solutions when $\pi$ is only log-smooth.'
volume: 202
URL: https://proceedings.mlr.press/v202/diao23a.html
PDF: https://proceedings.mlr.press/v202/diao23a/diao23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-diao23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Michael Ziyang
family: Diao
- given: Krishna
family: Balasubramanian
- given: Sinho
family: Chewi
- given: Adil
family: Salim
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7960-7991
id: diao23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7960
lastpage: 7991
published: 2023-07-03 00:00:00 +0000
- title: 'Subset-Based Instance Optimality in Private Estimation'
abstract: 'We propose a new definition of instance optimality for differentially private estimation algorithms. Our definition requires an optimal algorithm to compete, simultaneously for every dataset $D$, with the best private benchmark algorithm that (a) knows $D$ in advance and (b) is evaluated by its worst-case performance on large subsets of $D$. That is, the benchmark algorithm need not perform well when potentially extreme points are added to $D$; it only has to handle the removal of a small number of real data points that already exist. This makes our benchmark significantly stronger than those proposed in prior work. We nevertheless show, for real-valued datasets, how to construct private algorithms that achieve our notion of instance optimality when estimating a broad class of dataset properties, including means, quantiles, and $\ell_p$-norm minimizers. For means in particular, we provide a detailed analysis and show that our algorithm simultaneously matches or exceeds the asymptotic performance of existing algorithms under a range of distributional assumptions.'
volume: 202
URL: https://proceedings.mlr.press/v202/dick23a.html
PDF: https://proceedings.mlr.press/v202/dick23a/dick23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dick23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Travis
family: Dick
- given: Alex
family: Kulesza
- given: Ziteng
family: Sun
- given: Ananda Theertha
family: Suresh
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 7992-8014
id: dick23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 7992
lastpage: 8014
published: 2023-07-03 00:00:00 +0000
- title: 'Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models'
abstract: 'In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution, superior to all its single-task trained counterparts. Since there is often not a unique solution optimal for all tasks, practitioners have to balance tradeoffs between tasks’ performance, and resort to optimality in the Pareto sense. Most MTL methodologies either completely neglect this aspect, and instead of aiming at learning a Pareto Front, produce one solution predefined by their optimization schemes, or produce diverse but discrete solutions. Recent approaches parameterize the Pareto Front via neural networks, leading to complex mappings from tradeoff to objective space. In this paper, we conjecture that the Pareto Front admits a linear parameterization in parameter space, which leads us to propose *Pareto Manifold Learning*, an ensembling method in weight space. Our approach produces a continuous Pareto Front in a single training run, that allows to modulate the performance on each task during inference. Experiments on multi-task learning benchmarks, ranging from image classification to tabular datasets and scene understanding, show that *Pareto Manifold Learning* outperforms state-of-the-art single-point algorithms, while learning a better Pareto parameterization than multi-point baselines.'
volume: 202
URL: https://proceedings.mlr.press/v202/dimitriadis23a.html
PDF: https://proceedings.mlr.press/v202/dimitriadis23a/dimitriadis23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dimitriadis23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nikolaos
family: Dimitriadis
- given: Pascal
family: Frossard
- given: François
family: Fleuret
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8015-8052
id: dimitriadis23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8015
lastpage: 8052
published: 2023-07-03 00:00:00 +0000
- title: 'Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models'
abstract: 'Recently, reward-conditioned reinforcement learning (RCRL) has gained popularity due to its simplicity, flexibility, and off-policy nature. However, we will show that current RCRL approaches are fundamentally limited and fail to address two critical challenges of RCRL – improving generalization on high reward-to-go (RTG) inputs, and avoiding out-of-distribution (OOD) RTG queries during testing time. To address these challenges when training vanilla RCRL architectures, we propose Bayesian Reparameterized RCRL (BR-RCRL), a novel set of inductive biases for RCRL inspired by Bayes’ theorem. BR-RCRL removes a core obstacle preventing vanilla RCRL from generalizing on high RTG inputs – a tendency that the model treats different RTG inputs as independent values, which we term “RTG Independence". BR-RCRL also allows us to design an accompanying adaptive inference method, which maximizes total returns while avoiding OOD queries that yield unpredictable behaviors in vanilla RCRL methods. We show that BR-RCRL achieves state-of-the-art performance on the Gym-Mujoco and Atari offline RL benchmarks, improving upon vanilla RCRL by up to 11%.'
volume: 202
URL: https://proceedings.mlr.press/v202/ding23a.html
PDF: https://proceedings.mlr.press/v202/ding23a/ding23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ding23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wenhao
family: Ding
- given: Tong
family: Che
- given: Ding
family: Zhao
- given: Marco
family: Pavone
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8053-8066
id: ding23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8053
lastpage: 8066
published: 2023-07-03 00:00:00 +0000
- title: 'DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm'
abstract: 'Decentralized Stochastic Gradient Descent (SGD) is an emerging neural network training approach that enables multiple agents to train a model collaboratively and simultaneously. Rather than using a central parameter server to collect gradients from all the agents, each agent keeps a copy of the model parameters and communicates with a small number of other agents to exchange model updates. Their communication, governed by the communication topology and gossip weight matrices, facilitates the exchange of model updates. The state-of-the-art approach uses the dynamic one-peer exponential-2 topology, achieving faster training times and improved scalability than the ring, grid, torus, and hypercube topologies. However, this approach requires a power-of-2 number of agents, which is impractical at scale. In this paper, we remove this restriction and propose Decentralized SGD with Communication-optimal Exact Consensus Algorithm (DSGD-CECA), which works for any number of agents while still achieving state-of-the-art properties. In particular, DSGD-CECA incurs a unit per-iteration communication overhead and an $\tilde{O}(n^3)$ transient iteration complexity. Our proof is based on newly discovered properties of gossip weight matrices and a novel approach to combine them with DSGD’s convergence analysis. Numerical experiments show the efficiency of DSGD-CECA.'
volume: 202
URL: https://proceedings.mlr.press/v202/ding23b.html
PDF: https://proceedings.mlr.press/v202/ding23b/ding23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ding23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lisang
family: Ding
- given: Kexin
family: Jin
- given: Bicheng
family: Ying
- given: Kun
family: Yuan
- given: Wotao
family: Yin
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8067-8089
id: ding23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8067
lastpage: 8089
published: 2023-07-03 00:00:00 +0000
- title: 'Open-Vocabulary Universal Image Segmentation with MaskCLIP'
abstract: 'In this paper, we tackle an emerging computer vision task, open-vocabulary universal image segmentation, that aims to perform semantic/instance/panoptic segmentation (background semantic labeling + foreground instance segmentation) for arbitrary categories of text-based descriptions in inference time. We first build a baseline method by directly adopting pre-trained CLIP models without finetuning or distillation. We then develop MaskCLIP, a Transformer-based approach with a MaskCLIP Visual Encoder, which is an encoder-only module that seamlessly integrates mask tokens with a pre-trained ViT CLIP model for semantic/instance segmentation and class prediction. MaskCLIP learns to efficiently and effectively utilize pre-trained partial/dense CLIP features within the MaskCLIP Visual Encoder that avoids the time-consuming student-teacher training process. MaskCLIP outperforms previous methods for semantic/instance/panoptic segmentation on ADE20K and PASCAL datasets. We show qualitative illustrations for MaskCLIP with online custom categories. Project website: https://maskclip.github.io.'
volume: 202
URL: https://proceedings.mlr.press/v202/ding23c.html
PDF: https://proceedings.mlr.press/v202/ding23c/ding23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ding23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zheng
family: Ding
- given: Jieke
family: Wang
- given: Zhuowen
family: Tu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8090-8102
id: ding23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8090
lastpage: 8102
published: 2023-07-03 00:00:00 +0000
- title: 'Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning'
abstract: 'We investigate the use of natural language to drive the generalization of policies in multi-agent settings. Unlike single-agent settings, the generalization of policies should also consider the influence of other agents. Besides, with the increasing number of entities in multi-agent settings, more agent-entity interactions are needed for language grounding, and the enormous search space could impede the learning process. Moreover, given a simple general instruction, e.g., beating all enemies, agents are required to decompose it into multiple subgoals and figure out the right one to focus on. Inspired by previous work, we try to address these issues at the entity level and propose a novel framework for language grounding in multi-agent reinforcement learning, entity divider (EnDi). EnDi enables agents to independently learn subgoal division at the entity level and act in the environment based on the associated entities. The subgoal division is regularized by agent modeling to avoid subgoal conflicts and promote coordinated strategies. Empirically, EnDi demonstrates the strong generalization ability to unseen games with new dynamics and expresses the superiority over existing methods. The code is available at https://github.com/PKU-RL/EnDi.'
volume: 202
URL: https://proceedings.mlr.press/v202/ding23d.html
PDF: https://proceedings.mlr.press/v202/ding23d/ding23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ding23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ziluo
family: Ding
- given: Wanpeng
family: Zhang
- given: Junpeng
family: Yue
- given: Xiangjun
family: Wang
- given: Tiejun
family: Huang
- given: Zongqing
family: Lu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8103-8119
id: ding23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8103
lastpage: 8119
published: 2023-07-03 00:00:00 +0000
- title: 'PixelAsParam: A Gradient View on Diffusion Sampling with Guidance'
abstract: 'Diffusion models recently achieved state-of-the-art in image generation. They mainly utilize the denoising framework, which leverages the Langevin dynamics process for image sampling. Recently, the guidance method has modified this process to add conditional information to achieve a controllable generator. However, the current guidance on denoising processes suffers from the trade-off between diversity, image quality, and conditional information. In this work, we propose to view this guidance sampling process from a gradient view, where image pixels are treated as parameters being optimized, and each mathematical term in the sampling process represents one update direction. This perspective reveals more insights into the conflict problems between updated directions on the pixels, which cause the trade-off as mentioned previously. We investigate the conflict problems and propose to solve them by a simple projection method. The experimental results evidently improve over different baselines on datasets with various resolutions.'
volume: 202
URL: https://proceedings.mlr.press/v202/dinh23a.html
PDF: https://proceedings.mlr.press/v202/dinh23a/dinh23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dinh23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Anh-Dung
family: Dinh
- given: Daochang
family: Liu
- given: Chang
family: Xu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8120-8137
id: dinh23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8120
lastpage: 8137
published: 2023-07-03 00:00:00 +0000
- title: 'Second-Order Optimization with Lazy Hessians'
abstract: 'We analyze Newton’s method with lazy Hessian updates for solving general possibly non-convex optimization problems. We propose to reuse a previously seen Hessian for several iterations while computing new gradients at each step of the method. This significantly reduces the overall arithmetic complexity of second-order optimization schemes. By using the cubic regularization technique, we establish fast global convergence of our method to a second-order stationary point, while the Hessian does not need to be updated each iteration. For convex problems, we justify global and local superlinear rates for lazy Newton steps with quadratic regularization, which is easier to compute. The optimal frequency for updating the Hessian is once every $d$ iterations, where $d$ is the dimension of the problem. This provably improves the total arithmetic complexity of second-order algorithms by a factor $\sqrt{d}$.'
volume: 202
URL: https://proceedings.mlr.press/v202/doikov23a.html
PDF: https://proceedings.mlr.press/v202/doikov23a/doikov23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-doikov23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nikita
family: Doikov
- given: El Mahdi
family: Chayti
- given: Martin
family: Jaggi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8138-8161
id: doikov23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8138
lastpage: 8161
published: 2023-07-03 00:00:00 +0000
- title: 'Polynomial Preconditioning for Gradient Methods'
abstract: 'We study first-order methods with preconditioning for solving structured convex optimization problems. We propose a new family of preconditioners generated by the symmetric polynomials. They provide the first-order optimization methods with a provable improvement of the condition number, cutting the gaps between highest eigenvalues, without explicit knowledge of the actual spectrum. We give a stochastic interpretation of this preconditioning in terms of the coordinate volume sampling and compare it with other classical approaches, including the Chebyshev polynomials. We show how to incorporate a polynomial preconditioning into the Gradient and Fast Gradient Methods and establish their better global complexity bounds. Finally, we propose a simple adaptive search procedure that automatically ensures the best polynomial preconditioning for the Gradient Method, minimizing the objective along a low-dimensional Krylov subspace. Numerical experiments confirm the efficiency of our preconditioning strategies for solving various machine learning problems.'
volume: 202
URL: https://proceedings.mlr.press/v202/doikov23b.html
PDF: https://proceedings.mlr.press/v202/doikov23b/doikov23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-doikov23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nikita
family: Doikov
- given: Anton
family: Rodomanov
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8162-8187
id: doikov23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8162
lastpage: 8187
published: 2023-07-03 00:00:00 +0000
- title: 'On Data Manifolds Entailed by Structural Causal Models'
abstract: 'The geometric structure of data is an important inductive bias in machine learning. In this work, we characterize the data manifolds entailed by structural causal models. The strengths of the proposed framework are twofold: firstly, the geometric structure of the data manifolds is causally informed, and secondly, it enables causal reasoning about the data manifolds in an interventional and a counterfactual sense. We showcase the versatility of the proposed framework by applying it to the generation of causally-grounded counterfactual explanations for machine learning classifiers, measuring distances along the data manifold in a differential geometric-principled manner.'
volume: 202
URL: https://proceedings.mlr.press/v202/dominguez-olmedo23a.html
PDF: https://proceedings.mlr.press/v202/dominguez-olmedo23a/dominguez-olmedo23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dominguez-olmedo23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ricardo
family: Dominguez-Olmedo
- given: Amir-Hossein
family: Karimi
- given: Georgios
family: Arvanitidis
- given: Bernhard
family: Schölkopf
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8188-8201
id: dominguez-olmedo23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8188
lastpage: 8201
published: 2023-07-03 00:00:00 +0000
- title: 'Towards Understanding and Reducing Graph Structural Noise for GNNs'
abstract: 'Graph neural networks (GNNs) have emerged as a powerful paradigm to learn from relational data mostly through applying the message passing mechanism. However, this approach may exhibit suboptimal performance when applied to graphs possessing various structural issues. In this work, we focus on understanding and alleviating the effect of graph structural noise on GNN performance. To evaluate the graph structural noise in real data, we propose edge signal-to-noise ratio (ESNR), a novel metric evaluating overall edge noise level with respect to data features or labels based on random matrix theory. We have found striking concordance between the proposed ESNR metric and the GNN performance in various simulated and real data. To reduce the effect of the noise, we propose GPS (Graph Propensity Score) graph rewiring, which estimates the edge likelihood for rewiring data graphs based on self-supervised link prediction. We provide a theoretical guarantee for GPS graph rewiring and demonstrate its efficacy by comprehensive benchmarks.'
volume: 202
URL: https://proceedings.mlr.press/v202/dong23a.html
PDF: https://proceedings.mlr.press/v202/dong23a/dong23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dong23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mingze
family: Dong
- given: Yuval
family: Kluger
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8202-8226
id: dong23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8202
lastpage: 8226
published: 2023-07-03 00:00:00 +0000
- title: 'SpeedDETR: Speed-aware Transformers for End-to-end Object Detection'
abstract: 'Vision Transformers (ViTs) have continuously achieved new milestones in object detection. However, the considerable computation and memory burden compromise their efficiency and generalization of deployment on resource-constraint devices. Besides, efficient transformer-based detectors designed by existing works can hardly achieve a realistic speedup, especially on multi-core processors (e.g., GPUs). The main issue is that the current literature solely concentrates on building algorithms with minimal computation, oblivious that the practical latency can also be affected by the memory access cost and the degree of parallelism. Therefore, we propose SpeedDETR, a novel speed-aware transformer for end-to-end object detectors, achieving high-speed inference on multiple devices. Specifically, we design a latency prediction model which can directly and accurately estimate the network latency by analyzing network properties, hardware memory access pattern, and degree of parallelism. Following the effective local-to-global visual modeling process and the guidance of the latency prediction model, we build our hardware-oriented architecture design and develop a new family of SpeedDETR. Experiments on the MS COCO dataset show SpeedDETR outperforms current DETR-based methods on Tesla V100. Even acceptable speed inference can be achieved on edge GPUs.'
volume: 202
URL: https://proceedings.mlr.press/v202/dong23b.html
PDF: https://proceedings.mlr.press/v202/dong23b/dong23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dong23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Peiyan
family: Dong
- given: Zhenglun
family: Kong
- given: Xin
family: Meng
- given: Peng
family: Zhang
- given: Hao
family: Tang
- given: Yanzhi
family: Wang
- given: Chih-Hsien
family: Chou
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8227-8243
id: dong23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8227
lastpage: 8243
published: 2023-07-03 00:00:00 +0000
- title: 'Understand and Modularize Generator Optimization in ELECTRA-style Pretraining'
abstract: 'Despite the effectiveness of ELECTRA-style pre-training, their performance is dependent on the careful selection of the model size for the auxiliary generator, leading to high trial-and-error costs. In this paper, we present the first systematic study of this problem. Our theoretical investigation highlights the importance of controlling the generator capacity in ELECTRA-style training. Meanwhile, we found it is *not* handled properly in the original ELECTRA design, leading to the sensitivity issue. Specifically, since adaptive optimizers like Adam will cripple the weighing of individual losses in the joint optimization, the original design fails to control the generator training effectively. To regain control over the generator, we modularize the generator optimization by decoupling the generator optimizer and discriminator optimizer completely, instead of simply relying on the weighted objective combination. Our simple technique reduced the sensitivity of ELECTRA training significantly and obtains considerable performance gain compared to the original design.'
volume: 202
URL: https://proceedings.mlr.press/v202/dong23c.html
PDF: https://proceedings.mlr.press/v202/dong23c/dong23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dong23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chengyu
family: Dong
- given: Liyuan
family: Liu
- given: Hao
family: Cheng
- given: Jingbo
family: Shang
- given: Jianfeng
family: Gao
- given: Xiaodong
family: Liu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8244-8259
id: dong23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8244
lastpage: 8259
published: 2023-07-03 00:00:00 +0000
- title: 'Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation'
abstract: 'Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i.e., a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing methods are extremely similar or even the same. The strong dependency among the generated data will lead the learning to fail. In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). Specifically, DEG-Net will generate data via minimizing the HSIC value (i.e., maximizing the independence) among the semantic features of the generated data. By DEG-Net, the generated unlabeled data are more diverse and more effective for addressing the FHA problem. Experimental results show that the DEG-Net outperforms existing FHA baselines and further verifies that generating diverse data plays an important role in addressing the FHA problem.'
volume: 202
URL: https://proceedings.mlr.press/v202/dong23d.html
PDF: https://proceedings.mlr.press/v202/dong23d/dong23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dong23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ruijiang
family: Dong
- given: Feng
family: Liu
- given: Haoang
family: Chi
- given: Tongliang
family: Liu
- given: Mingming
family: Gong
- given: Gang
family: Niu
- given: Masashi
family: Sugiyama
- given: Bo
family: Han
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8260-8275
id: dong23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8260
lastpage: 8275
published: 2023-07-03 00:00:00 +0000
- title: 'PASTA: Pessimistic Assortment Optimization'
abstract: 'We consider a fundamental class of assortment optimization problems in an offline data-driven setting. The firm does not know the underlying customer choice model but has access to an offline dataset consisting of the historically offered assortment set, customer choice, and revenue. The objective is to use the offline dataset to find an optimal assortment. Due to the combinatorial nature of assortment optimization, the problem of insufficient data coverage is likely to occur in the offline dataset. Therefore, designing a provably efficient offline learning algorithm becomes a significant challenge. To this end, based on the principle of pessimism, we propose a novel algorithm called Pessimistic ASsortment opTimizAtion (PASTA for short), which can correctly identify the optimal assortment by only requiring the offline data to cover the optimal assortment under general settings. In particular, we establish the first regret bound for the offline assortment optimization problem under the celebrated multinomial logit model (MNL). We also propose an efficient computational procedure to solve our pessimistic assortment optimization problem. Our numerical studies demonstrate the superiority of the proposed method over the existing baseline method.'
volume: 202
URL: https://proceedings.mlr.press/v202/dong23e.html
PDF: https://proceedings.mlr.press/v202/dong23e/dong23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dong23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Juncheng
family: Dong
- given: Weibin
family: Mo
- given: Zhengling
family: Qi
- given: Cong
family: Shi
- given: Ethan X
family: Fang
- given: Vahid
family: Tarokh
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8276-8295
id: dong23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8276
lastpage: 8295
published: 2023-07-03 00:00:00 +0000
- title: 'Adaptively Weighted Data Augmentation Consistency Regularization for Robust Optimization under Concept Shift'
abstract: 'Concept shift is a prevailing problem in natural tasks like medical image segmentation where samples usually come from different subpopulations with variant correlations between features and labels. One common type of concept shift in medical image segmentation is the "information imbalance" between label-sparse samples with few (if any) segmentation labels and label-dense samples with plentiful labeled pixels. Existing distributionally robust algorithms have focused on adaptively truncating/down-weighting the "less informative" (i.e., label-sparse in our context) samples. To exploit data features of label-sparse samples more efficiently, we propose an adaptively weighted online optimization algorithm — AdaWAC — to incorporate data augmentation consistency regularization in sample reweighting. Our method introduces a set of trainable weights to balance the supervised loss and unsupervised consistency regularization of each sample separately. At the saddle point of the underlying objective, the weights assign label-dense samples to the supervised loss and label-sparse samples to the unsupervised consistency regularization. We provide a convergence guarantee by recasting the optimization as online mirror descent on a saddle point problem. Our empirical results demonstrate that AdaWAC not only enhances the segmentation performance and sample efficiency but also improves the robustness to concept shift on various medical image segmentation tasks with different UNet-style backbones.'
volume: 202
URL: https://proceedings.mlr.press/v202/dong23f.html
PDF: https://proceedings.mlr.press/v202/dong23f/dong23f.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dong23f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yijun
family: Dong
- given: Yuege
family: Xie
- given: Rachel
family: Ward
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8296-8316
id: dong23f
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8296
lastpage: 8316
published: 2023-07-03 00:00:00 +0000
- title: 'Does Sparsity Help in Learning Misspecified Linear Bandits?'
abstract: 'Recently, the study of linear misspecified bandits has generated intriguing implications of the hardness of learning in bandits and reinforcement learning (RL). In particular, Du et al. (2020) shows that even if a learner is given linear features in $\mathbb{R}^d$ that approximate the rewards in a bandit or RL with a uniform error of $\varepsilon$, searching for an $O(\varepsilon)$-optimal action requires pulling at least $\Omega(\exp(d))$ queries. Furthermore, Lattimore et al. (2020) show that a degraded $O(\varepsilon\sqrt{d})$-optimal solution can be learned within $\operatorname{poly}(d/\varepsilon)$ queries. Yet it is unknown whether a structural assumption on the ground-truth parameter, such as sparsity, could break $\varepsilon\sqrt{d}$ barrier. In this paper, we address this question by showing that algorithms can obtain $O(\varepsilon)$-optimal actions by querying $\tilde{O}(\exp(m\varepsilon))$ actions, where $m$ is the sparsity parameter, removing the $\exp(d)$-dependence. We further show (with an information-theoretical lower bound) that this is the best possible if one demands an error $ m^{\delta}\varepsilon$ for $0<\delta<1$. We further show that $\operatorname{poly}(m/\varepsilon)$ bounds are possible when the linear features are "good”. These results provide a nearly complete picture of how sparsity can help in misspecified bandit learning and provide a deeper understanding of when linear features are “useful” for bandit and reinforcement learning with misspecification.'
volume: 202
URL: https://proceedings.mlr.press/v202/dong23g.html
PDF: https://proceedings.mlr.press/v202/dong23g/dong23g.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dong23g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jialin
family: Dong
- given: Lin
family: Yang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8317-8333
id: dong23g
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8317
lastpage: 8333
published: 2023-07-03 00:00:00 +0000
- title: 'Symmetry-Aware Robot Design with Structured Subgroups'
abstract: 'Robot design aims at learning to create robots that can be easily controlled and perform tasks efficiently. Previous works on robot design have proven its ability to generate robots for various tasks. However, these works searched the robots directly from the vast design space and ignored common structures, resulting in abnormal robots and poor performance. To tackle this problem, we propose a Symmetry-Aware Robot Design (SARD) framework that exploits the structure of the design space by incorporating symmetry searching into the robot design process. Specifically, we represent symmetries with the subgroups of the dihedral group and search for the optimal symmetry in structured subgroups. Then robots are designed under the searched symmetry. In this way, SARD can design efficient symmetric robots while covering the original design space, which is theoretically analyzed. We further empirically evaluate SARD on various tasks, and the results show its superior efficiency and generalizability.'
volume: 202
URL: https://proceedings.mlr.press/v202/dong23h.html
PDF: https://proceedings.mlr.press/v202/dong23h/dong23h.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dong23h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Heng
family: Dong
- given: Junyu
family: Zhang
- given: Tonghan
family: Wang
- given: Chongjie
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8334-8355
id: dong23h
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8334
lastpage: 8355
published: 2023-07-03 00:00:00 +0000
- title: 'DoCoFL: Downlink Compression for Cross-Device Federated Learning'
abstract: 'Many compression techniques have been proposed to reduce the communication overhead of Federated Learning training procedures. However, these are typically designed for compressing model updates, which are expected to decay throughout training. As a result, such methods are inapplicable to downlink (i.e., from the parameter server to clients) compression in the cross-device setting, where heterogeneous clients *may appear only once* during training and thus must download the model parameters. Accordingly, we propose DoCoFL – a new framework for downlink compression in the cross-device setting. Importantly, DoCoFL can be seamlessly combined with many uplink compression schemes, rendering it suitable for bi-directional compression. Through extensive evaluation, we show that DoCoFL offers significant bi-directional bandwidth reduction while achieving competitive accuracy to that of a baseline without any compression.'
volume: 202
URL: https://proceedings.mlr.press/v202/dorfman23a.html
PDF: https://proceedings.mlr.press/v202/dorfman23a/dorfman23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dorfman23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ron
family: Dorfman
- given: Shay
family: Vargaftik
- given: Yaniv
family: Ben-Itzhak
- given: Kfir Yehuda
family: Levy
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8356-8388
id: dorfman23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8356
lastpage: 8388
published: 2023-07-03 00:00:00 +0000
- title: 'Meta-Learning the Inductive Bias of Simple Neural Circuits'
abstract: 'Training data is always finite, making it unclear how to generalise to unseen situations. But, animals do generalise, wielding Occam’s razor to select a parsimonious explanation of their observations. How they do this is called their inductive bias, and it is implicitly built into the operation of animals’ neural circuits. This relationship between an observed circuit and its inductive bias is a useful explanatory window for neuroscience, allowing design choices to be understood normatively. However, it is generally very difficult to map circuit structure to inductive bias. Here, we present a neural network tool to bridge this gap. The tool meta-learns the inductive bias by learning functions that a neural circuit finds easy to generalise, since easy-to-generalise functions are exactly those the circuit chooses to explain incomplete data. In systems with analytically known inductive bias, i.e. linear and kernel regression, our tool recovers it. Generally, we show it can flexibly extract inductive biases from supervised learners, including spiking neural networks, and show how it could be applied to real animals. Finally, we use our tool to interpret recent connectomic data illustrating our intended use: understanding the role of circuit features through the resulting inductive bias.'
volume: 202
URL: https://proceedings.mlr.press/v202/dorrell23a.html
PDF: https://proceedings.mlr.press/v202/dorrell23a/dorrell23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dorrell23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Will
family: Dorrell
- given: Maria
family: Yuffa
- given: Peter E.
family: Latham
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8389-8402
id: dorrell23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8389
lastpage: 8402
published: 2023-07-03 00:00:00 +0000
- title: 'Self-Repellent Random Walks on General Graphs - Achieving Minimal Sampling Variance via Nonlinear Markov Chains'
abstract: 'We consider random walks on discrete state spaces, such as general undirected graphs, where the random walkers are designed to approximate a target quantity over the network topology via sampling and neighborhood exploration in the form of Markov chain Monte Carlo (MCMC) procedures. Given any Markov chain corresponding to a target probability distribution, we design a *self-repellent random walk* (SRRW) which is less likely to transition to nodes that were highly visited in the past, and more likely to transition to seldom visited nodes. For a class of SRRWs parameterized by a positive real $\alpha$, we prove that the empirical distribution of the process converges almost surely to the the target (stationary) distribution of the underlying Markov chain kernel. We then provide a central limit theorem and derive the exact form of the arising asymptotic co-variance matrix, which allows us to show that the SRRW with a stronger repellence (larger $\alpha$) always achieves a smaller asymptotic covariance, in the sense of Loewner ordering of co-variance matrices. Especially for SRRW-driven MCMC algorithms, we show that the decrease in the asymptotic sampling variance is of the order $O(1/\alpha)$, eventually going down to zero. Finally, we provide numerical simulations complimentary to our theoretical results, also empirically testing a version of SRRW with $\alpha$ increasing in time to combine the benefits of smaller asymptotic variance due to large $\alpha$, with empirically observed faster mixing properties of SRRW with smaller $\alpha$.'
volume: 202
URL: https://proceedings.mlr.press/v202/doshi23a.html
PDF: https://proceedings.mlr.press/v202/doshi23a/doshi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-doshi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vishwaraj
family: Doshi
- given: Jie
family: Hu
- given: Do Young
family: Eun
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8403-8423
id: doshi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8403
lastpage: 8423
published: 2023-07-03 00:00:00 +0000
- title: 'Linear Time GPs for Inferring Latent Trajectories from Neural Spike Trains'
abstract: 'Latent Gaussian process (GP) models are widely used in neuroscience to uncover hidden state evolutions from sequential observations, mainly in neural activity recordings. While latent GP models provide a principled and powerful solution in theory, the intractable posterior in non-conjugate settings necessitates approximate inference schemes, which may lack scalability. In this work, we propose cvHM, a general inference framework for latent GP models leveraging Hida-Matérn kernels and conjugate computation variational inference (CVI). With cvHM, we are able to perform variational inference of latent neural trajectories with linear time complexity for arbitrary likelihoods. The reparameterization of stationary kernels using Hida-Matérn GPs helps us connect the latent variable models that encode prior assumptions through dynamical systems to those that encode trajectory assumptions through GPs. In contrast to previous work, we use bidirectional information filtering, leading to a more concise implementation. Furthermore, we employ the Whittle approximate likelihood to achieve highly efficient hyperparameter learning.'
volume: 202
URL: https://proceedings.mlr.press/v202/dowling23a.html
PDF: https://proceedings.mlr.press/v202/dowling23a/dowling23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dowling23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Dowling
- given: Yuan
family: Zhao
- given: Il Memming
family: Park
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8424-8448
id: dowling23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8424
lastpage: 8448
published: 2023-07-03 00:00:00 +0000
- title: 'On the Convergence Rate of Gaussianization with Random Rotations'
abstract: 'Gaussianization is a simple generative model that can be trained without backpropagation. It has shown compelling performance on low dimensional data. As the dimension increases, however, it has been observed that the convergence speed slows down. We show analytically that the number of required layers scales linearly with the dimension for Gaussian input. We argue that this is because the model is unable to capture dependencies between dimensions. Empirically, we find the same linear increase in cost for arbitrary input $p(x)$, but observe favorable scaling for some distributions. We explore potential speed-ups and formulate challenges for further research.'
volume: 202
URL: https://proceedings.mlr.press/v202/draxler23a.html
PDF: https://proceedings.mlr.press/v202/draxler23a/draxler23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-draxler23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Felix
family: Draxler
- given: Lars
family: Kühmichel
- given: Armand
family: Rousselot
- given: Jens
family: Müller
- given: Christoph
family: Schnoerr
- given: Ullrich
family: Koethe
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8449-8468
id: draxler23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8449
lastpage: 8468
published: 2023-07-03 00:00:00 +0000
- title: 'PaLM-E: An Embodied Multimodal Language Model'
abstract: 'Large language models excel at a wide range of complex tasks. However, enabling general inference in the real world, e.g. for robotics problems, raises the challenge of grounding. We propose embodied language models to directly incorporate real-world continuous sensor modalities into language models and thereby establish the link between words and percepts. Input to our embodied language model are multimodal sentences that interleave visual, continuous state estimation, and textual input encodings. We train these encodings end-to-end, in conjunction with a pre-trained large language model, for multiple embodied tasks including sequential robotic manipulation planning, visual question answering, and captioning. Our evaluations show that PaLM-E, a single large embodied multimodal model, can address a variety of embodied reasoning tasks, from a variety of observation modalities, on multiple embodiments, and further, exhibits positive transfer: the model benefits from diverse joint training across internet-scale language, vision, and visual-language domains. Our largest model with 562B parameters, in addition to being trained on robotics tasks, is a visual-language generalist with state-of-the-art performance on OK-VQA, and retains generalist language capabilities with increasing scale.'
volume: 202
URL: https://proceedings.mlr.press/v202/driess23a.html
PDF: https://proceedings.mlr.press/v202/driess23a/driess23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-driess23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Danny
family: Driess
- given: Fei
family: Xia
- given: Mehdi S. M.
family: Sajjadi
- given: Corey
family: Lynch
- given: Aakanksha
family: Chowdhery
- given: Brian
family: Ichter
- given: Ayzaan
family: Wahid
- given: Jonathan
family: Tompson
- given: Quan
family: Vuong
- given: Tianhe
family: Yu
- given: Wenlong
family: Huang
- given: Yevgen
family: Chebotar
- given: Pierre
family: Sermanet
- given: Daniel
family: Duckworth
- given: Sergey
family: Levine
- given: Vincent
family: Vanhoucke
- given: Karol
family: Hausman
- given: Marc
family: Toussaint
- given: Klaus
family: Greff
- given: Andy
family: Zeng
- given: Igor
family: Mordatch
- given: Pete
family: Florence
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8469-8488
id: driess23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8469
lastpage: 8488
published: 2023-07-03 00:00:00 +0000
- title: 'Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC'
abstract: 'Since their introduction, diffusion models have quickly become the prevailing approach to generative modeling in many domains. They can be interpreted as learning the gradients of a time-varying sequence of log-probability density functions. This interpretation has motivated classifier-based and classifier-free guidance as methods for post-hoc control of diffusion models. In this work, we build upon these ideas using the score-based interpretation of diffusion models, and explore alternative ways to condition, modify, and reuse diffusion models for tasks involving compositional generation and guidance. In particular, we investigate why certain types of composition fail using current techniques and present a number of solutions. We conclude that the sampler (not the model) is responsible for this failure and propose new samplers, inspired by MCMC, which enable successful compositional generation. Further, we propose an energy-based parameterization of diffusion models which enables the use of new compositional operators and more sophisticated, Metropolis-corrected samplers. Intriguingly we find these samplers lead to notable improvements in compositional generation across a wide variety of problems such as classifier-guided ImageNet modeling and compositional text-to-image generation.'
volume: 202
URL: https://proceedings.mlr.press/v202/du23a.html
PDF: https://proceedings.mlr.press/v202/du23a/du23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-du23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yilun
family: Du
- given: Conor
family: Durkan
- given: Robin
family: Strudel
- given: Joshua B.
family: Tenenbaum
- given: Sander
family: Dieleman
- given: Rob
family: Fergus
- given: Jascha
family: Sohl-Dickstein
- given: Arnaud
family: Doucet
- given: Will Sussman
family: Grathwohl
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8489-8510
id: du23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8489
lastpage: 8510
published: 2023-07-03 00:00:00 +0000
- title: 'Multi-task Representation Learning for Pure Exploration in Linear Bandits'
abstract: 'Despite the recent success of representation learning in sequential decision making, the study of the pure exploration scenario (i.e., identify the best option and minimize the sample complexity) is still limited. In this paper, we study multi-task representation learning for best arm identification in linear bandit (RepBAI-LB) and best policy identification in contextual linear bandit (RepBPI-CLB), two popular pure exploration settings with wide applications, e.g., clinical trials and web content optimization. In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks. For these problems, we design computationally and sample efficient algorithms DouExpDes and C-DouExpDes, which perform double experimental designs to plan optimal sample allocations for learning the global representation. We show that by learning the common representation among tasks, our sample complexity is significantly better than that of the native approach which solves tasks independently. To the best of our knowledge, this is the first work to demonstrate the benefits of representation learning for multi-task pure exploration.'
volume: 202
URL: https://proceedings.mlr.press/v202/du23b.html
PDF: https://proceedings.mlr.press/v202/du23b/du23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-du23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yihan
family: Du
- given: Longbo
family: Huang
- given: Wen
family: Sun
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8511-8564
id: du23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8511
lastpage: 8564
published: 2023-07-03 00:00:00 +0000
- title: 'Nonparametric Generative Modeling with Conditional Sliced-Wasserstein Flows'
abstract: 'Sliced-Wasserstein Flow (SWF) is a promising approach to nonparametric generative modeling but has not been widely adopted due to its suboptimal generative quality and lack of conditional modeling capabilities. In this work, we make two major contributions to bridging this gap. First, based on a pleasant observation that (under certain conditions) the SWF of joint distributions coincides with those of conditional distributions, we propose Conditional Sliced-Wasserstein Flow (CSWF), a simple yet effective extension of SWF that enables nonparametric conditional modeling. Second, we introduce appropriate inductive biases of images into SWF with two techniques inspired by local connectivity and multiscale representation in vision research, which greatly improve the efficiency and quality of modeling images. With all the improvements, we achieve generative performance comparable with many deep parametric generative models on both conditional and unconditional tasks in a purely nonparametric fashion, demonstrating its great potential.'
volume: 202
URL: https://proceedings.mlr.press/v202/du23c.html
PDF: https://proceedings.mlr.press/v202/du23c/du23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-du23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chao
family: Du
- given: Tianbo
family: Li
- given: Tianyu
family: Pang
- given: Shuicheng
family: Yan
- given: Min
family: Lin
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8565-8584
id: du23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8565
lastpage: 8584
published: 2023-07-03 00:00:00 +0000
- title: 'Subsample Ridge Ensembles: Equivalences and Generalized Cross-Validation'
abstract: 'We study subsampling-based ridge ensembles in the proportional asymptotics regime, where the feature size grows proportionally with the sample size such that their ratio converges to a constant. By analyzing the squared prediction risk of ridge ensembles as a function of the explicit penalty $\lambda$ and the limiting subsample aspect ratio $\phi_s$ (the ratio of the feature size to the subsample size), we characterize contours in the $(\lambda, \phi_s)$-plane at any achievable risk. As a consequence, we prove that the risk of the optimal full ridgeless ensemble (fitted on all possible subsamples) matches that of the optimal ridge predictor. In addition, we prove strong uniform consistency of generalized cross-validation (GCV) over the subsample sizes for estimating the prediction risk of ridge ensembles. This allows for GCV-based tuning of full ridgeless ensembles without sample splitting and yields a predictor whose risk matches optimal ridge risk.'
volume: 202
URL: https://proceedings.mlr.press/v202/du23d.html
PDF: https://proceedings.mlr.press/v202/du23d/du23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-du23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jin-Hong
family: Du
- given: Pratik
family: Patil
- given: Arun K.
family: Kuchibhotla
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8585-8631
id: du23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8585
lastpage: 8631
published: 2023-07-03 00:00:00 +0000
- title: 'On Uni-Modal Feature Learning in Supervised Multi-Modal Learning'
abstract: 'We abstract the features (i.e. learned representations) of multi-modal data into 1) uni-modal features, which can be learned from uni-modal training, and 2) paired features, which can only be learned from cross-modal interactions. Multi-modal models are expected to benefit from cross-modal interactions on the basis of ensuring uni-modal feature learning. However, recent supervised multi-modal late-fusion training approaches still suffer from insufficient learning of uni-modal features on each modality. We prove that this phenomenon does hurt the model’s generalization ability. To this end, we propose to choose a targeted late-fusion learning method for the given supervised multi-modal task from Uni-Modal Ensemble (UME) and the proposed Uni-Modal Teacher (UMT), according to the distribution of uni-modal and paired features. We demonstrate that, under a simple guiding strategy, we can achieve comparable results to other complex late-fusion or intermediate-fusion methods on various multi-modal datasets, including VGG-Sound, Kinetics-400, UCF101, and ModelNet40.'
volume: 202
URL: https://proceedings.mlr.press/v202/du23e.html
PDF: https://proceedings.mlr.press/v202/du23e/du23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-du23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chenzhuang
family: Du
- given: Jiaye
family: Teng
- given: Tingle
family: Li
- given: Yichen
family: Liu
- given: Tianyuan
family: Yuan
- given: Yue
family: Wang
- given: Yang
family: Yuan
- given: Hang
family: Zhao
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8632-8656
id: du23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8632
lastpage: 8656
published: 2023-07-03 00:00:00 +0000
- title: 'Guiding Pretraining in Reinforcement Learning with Large Language Models'
abstract: 'Reinforcement learning algorithms typically struggle in the absence of a dense, well-shaped reward function. Intrinsically motivated exploration methods address this limitation by rewarding agents for visiting novel states or transitions, but these methods offer limited benefits in large environments where most discovered novelty is irrelevant for downstream tasks. We describe a method that uses background knowledge from text corpora to shape exploration. This method, called ELLM (Exploring with LLMs) rewards an agent for achieving goals suggested by a language model prompted with a description of the agent’s current state. By leveraging large-scale language model pretraining, ELLM guides agents toward human-meaningful and plausibly useful behaviors without requiring a human in the loop. We evaluate ELLM in the Crafter game environment and the Housekeep robotic simulator, showing that ELLM-trained agents have better coverage of common-sense behaviors during pretraining and usually match or improve performance on a range of downstream tasks.'
volume: 202
URL: https://proceedings.mlr.press/v202/du23f.html
PDF: https://proceedings.mlr.press/v202/du23f/du23f.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-du23f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yuqing
family: Du
- given: Olivia
family: Watkins
- given: Zihan
family: Wang
- given: Cédric
family: Colas
- given: Trevor
family: Darrell
- given: Pieter
family: Abbeel
- given: Abhishek
family: Gupta
- given: Jacob
family: Andreas
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8657-8677
id: du23f
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8657
lastpage: 8677
published: 2023-07-03 00:00:00 +0000
- title: 'A Flexible Diffusion Model'
abstract: 'Denoising diffusion (score-based) generative models have become a popular choice for modeling complex data. Recently, a deep connection between forward-backward stochastic differential equations (SDEs) and diffusion-based models has been established, leading to the development of new SDE variants such as sub-VP and critically-damped Langevin. Despite the empirical success of some hand-crafted forward SDEs, many potentially promising forward SDEs remain unexplored. In this work, we propose a general framework for parameterizing diffusion models, particularly the spatial part of forward SDEs, by leveraging the symplectic and Riemannian geometry of the data manifold. We introduce a systematic formalism with theoretical guarantees and connect it with previous diffusion models. Finally, we demonstrate the theoretical advantages of our method from a variational optimization perspective. We present numerical experiments on synthetic datasets, MNIST and CIFAR10 to validate the effectiveness of our framework.'
volume: 202
URL: https://proceedings.mlr.press/v202/du23g.html
PDF: https://proceedings.mlr.press/v202/du23g/du23g.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-du23g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Weitao
family: Du
- given: He
family: Zhang
- given: Tao
family: Yang
- given: Yuanqi
family: Du
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8678-8696
id: du23g
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8678
lastpage: 8696
published: 2023-07-03 00:00:00 +0000
- title: 'Fast Excess Risk Rates via Offset Rademacher Complexity'
abstract: 'Based on the offset Rademacher complexity, this work outlines a systematical framework for deriving sharp excess risk bounds in statistical learning without Bernstein condition. In addition to recovering fast rates in a unified way for some parametric and nonparametric supervised learning models with minimum identifiability assumptions, we also obtain new and improved results for LAD (sparse) linear regression and deep logistic regression with deep ReLU neural networks, respectively.'
volume: 202
URL: https://proceedings.mlr.press/v202/duan23a.html
PDF: https://proceedings.mlr.press/v202/duan23a/duan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-duan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chenguang
family: Duan
- given: Yuling
family: Jiao
- given: Lican
family: Kang
- given: Xiliang
family: Lu
- given: Jerry Zhijian
family: Yang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8697-8716
id: duan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8697
lastpage: 8716
published: 2023-07-03 00:00:00 +0000
- title: 'Are Diffusion Models Vulnerable to Membership Inference Attacks?'
abstract: 'Diffusion-based generative models have shown great potential for image synthesis, but there is a lack of research on the security and privacy risks they may pose. In this paper, we investigate the vulnerability of diffusion models to Membership Inference Attacks (MIAs), a common privacy concern. Our results indicate that existing MIAs designed for GANs or VAE are largely ineffective on diffusion models, either due to inapplicable scenarios (e.g., requiring the discriminator of GANs) or inappropriate assumptions (e.g., closer distances between synthetic samples and member samples). To address this gap, we propose Step-wise Error Comparing Membership Inference (SecMI), a query-based MIA that infers memberships by assessing the matching of forward process posterior estimation at each timestep. SecMI follows the common overfitting assumption in MIA where member samples normally have smaller estimation errors, compared with hold-out samples. We consider both the standard diffusion models, e.g., DDPM, and the text-to-image diffusion models, e.g., Latent Diffusion Models and Stable Diffusion. Experimental results demonstrate that our methods precisely infer the membership with high confidence on both of the two scenarios across multiple different datasets. Code is available at https://github.com/jinhaoduan/SecMI.'
volume: 202
URL: https://proceedings.mlr.press/v202/duan23b.html
PDF: https://proceedings.mlr.press/v202/duan23b/duan23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-duan23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jinhao
family: Duan
- given: Fei
family: Kong
- given: Shiqi
family: Wang
- given: Xiaoshuang
family: Shi
- given: Kaidi
family: Xu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8717-8730
id: duan23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8717
lastpage: 8730
published: 2023-07-03 00:00:00 +0000
- title: 'Bayesian Progressive Deep Topic Model with Knowledge Informed Textual Data Coarsening Process'
abstract: 'Deep topic models have shown an impressive ability to extract multi-layer document latent representations and discover hierarchical semantically meaningful topics.However, most deep topic models are limited to the single-step generative process, despite the fact that the progressive generative process has achieved impressive performance in modeling image data. To this end, in this paper, we propose a novel progressive deep topic model that consists of a knowledge-informed textural data coarsening process and a corresponding progressive generative model. The former is used to build multi-level observations ranging from concrete to abstract, while the latter is used to generate more concrete observations gradually. Additionally, we incorporate a graph-enhanced decoder to capture the semantic relationships among words at different levels of observation. Furthermore, we perform a theoretical analysis of the proposed model based on the principle of information theory and show how it can alleviate the well-known "latent variable collapse" problem. Finally, extensive experiments demonstrate that our proposed model effectively improves the ability of deep topic models, resulting in higher-quality latent document representations and topics.'
volume: 202
URL: https://proceedings.mlr.press/v202/duan23c.html
PDF: https://proceedings.mlr.press/v202/duan23c/duan23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-duan23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhibin
family: Duan
- given: Xinyang
family: Liu
- given: Yudi
family: Su
- given: Yishi
family: Xu
- given: Bo
family: Chen
- given: Mingyuan
family: Zhou
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8731-8746
id: duan23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8731
lastpage: 8746
published: 2023-07-03 00:00:00 +0000
- title: 'Are Equivariant Equilibrium Approximators Beneficial?'
abstract: 'Recently, remarkable progress has been made by approximating Nash equilibrium (NE), correlated equilibrium (CE), and coarse correlated equilibrium (CCE) through function approximation that trains a neural network to predict equilibria from game representations. Furthermore, equivariant architectures are widely adopted in designing such equilibrium approximators in normal-form games. In this paper, we theoretically characterize the benefits and limitations of equivariant equilibrium approximators. For the benefits, we show that they enjoy better generalizability than general ones and can achieve better approximations when the payoff distribution is permutation-invariant. For the limitations, we discuss their drawbacks in terms of equilibrium selection and social welfare. Together, our results help to understand the role of equivariance in equilibrium approximators.'
volume: 202
URL: https://proceedings.mlr.press/v202/duan23d.html
PDF: https://proceedings.mlr.press/v202/duan23d/duan23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-duan23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhijian
family: Duan
- given: Yunxuan
family: Ma
- given: Xiaotie
family: Deng
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8747-8778
id: duan23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8747
lastpage: 8778
published: 2023-07-03 00:00:00 +0000
- title: 'Evaluating Self-Supervised Learning via Risk Decomposition'
abstract: 'Self-supervised learning (SSL) is typically evaluated using a single metric (linear probing on ImageNet), which neither provides insight into tradeoffs between models nor highlights how to improve them. To address this, we propose an SSL risk decomposition, which generalizes the classical approximation-estimation decomposition. Our decomposition consists of four error terms: approximation, representation usability, probe generalization, and encoder generalization. We provide efficient estimators for each term and use them to analyze the effect of 30 design choices on 169 SSL vision models evaluated on ImageNet. Our analysis gives valuable insights for designing and using SSL models. For example, it highlights the main source of errors and shows how to improve SSL in specific settings (full- vs few-shot) by trading off error components.'
volume: 202
URL: https://proceedings.mlr.press/v202/dubois23a.html
PDF: https://proceedings.mlr.press/v202/dubois23a/dubois23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dubois23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yann
family: Dubois
- given: Tatsunori
family: Hashimoto
- given: Percy
family: Liang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8779-8820
id: dubois23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8779
lastpage: 8820
published: 2023-07-03 00:00:00 +0000
- title: 'Fully Dynamic Submodular Maximization over Matroids'
abstract: 'Maximizing monotone submodular functions under a matroid constraint is a classic algorithmic problem with multiple applications in data mining and machine learning. We study this classic problem in the fully dynamic setting, where elements can be both inserted and deleted in real-time. Our main result is a randomized algorithm that maintains an efficient data structure with an $\tilde{O}(k^2)$ amortized update time (in the number of additions and deletions) and yields a $4$-approximate solution, where $k$ is the rank of the matroid.'
volume: 202
URL: https://proceedings.mlr.press/v202/duetting23a.html
PDF: https://proceedings.mlr.press/v202/duetting23a/duetting23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-duetting23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Paul
family: Duetting
- given: Federico
family: Fusco
- given: Silvio
family: Lattanzi
- given: Ashkan
family: Norouzi-Fard
- given: Morteza
family: Zadimoghaddam
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8821-8835
id: duetting23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8821
lastpage: 8835
published: 2023-07-03 00:00:00 +0000
- title: 'Optimal No-Regret Learning for One-Sided Lipschitz Functions'
abstract: 'Inspired by applications in pricing and contract design, we study the maximization of one-sided Lipschitz functions, which only provide the (weaker) guarantee that they do not grow too quickly in one direction. We show that it is possible to learn a maximizer for such a function while incurring $O(\log \log T)$ total regret (with a universal constant independent of the number of discontinuities / complexity of the function). This regret bound is asymptotically optimal in $T$ due to a lower bound of Kleinberg and Leighton. By applying this algorithm, we show that one can sell digital goods to multiple buyers and learn the optimal linear contract in the principal-agent setting while incurring at most $O(\log \log T)$ regret.'
volume: 202
URL: https://proceedings.mlr.press/v202/duetting23b.html
PDF: https://proceedings.mlr.press/v202/duetting23b/duetting23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-duetting23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Paul
family: Duetting
- given: Guru
family: Guruganesh
- given: Jon
family: Schneider
- given: Joshua Ruizhi
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8836-8850
id: duetting23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8836
lastpage: 8850
published: 2023-07-03 00:00:00 +0000
- title: 'Integrating Prior Knowledge in Contrastive Learning with Kernel'
abstract: 'Data augmentation is a crucial component in unsupervised contrastive learning (CL). It determines how positive samples are defined and, ultimately, the quality of the learned representation. In this work, we open the door to new perspectives for CL by integrating prior knowledge, given either by generative models - viewed as prior representations - or weak attributes in the positive and negative sampling. To this end, we use kernel theory to propose a novel loss, called decoupled uniformity, that i) allows the integration of prior knowledge and ii) removes the positive-negative coupling in the original InfoNCE loss. We draw a connection between contrastive learning and the conditional mean embedding theory to derive tight bounds on the downstream classification loss. In an unsupervised setting, we empirically demonstrate that CL benefits from generative models to improve its representation both on natural and medical images. In a weakly supervised scenario, our framework outperforms other unconditional and conditional CL approaches.'
volume: 202
URL: https://proceedings.mlr.press/v202/dufumier23a.html
PDF: https://proceedings.mlr.press/v202/dufumier23a/dufumier23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dufumier23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Benoit
family: Dufumier
- given: Carlo Alberto
family: Barbano
- given: Robin
family: Louiset
- given: Edouard
family: Duchesnay
- given: Pietro
family: Gori
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8851-8878
id: dufumier23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8851
lastpage: 8878
published: 2023-07-03 00:00:00 +0000
- title: 'Q-Flow: Generative Modeling for Differential Equations of Open Quantum Dynamics with Normalizing Flows'
abstract: 'Studying the dynamics of open quantum systems can enable breakthroughs both in fundamental physics and applications to quantum engineering and quantum computation. Since the density matrix $\rho$, which is the fundamental description for the dynamics of such systems, is high-dimensional, customized deep generative neural networks have been instrumental in modeling $\rho$. However, the complex-valued nature and normalization constraints of $\rho$, as well as its complicated dynamics, prohibit a seamless connection between open quantum systems and the recent advances in deep generative modeling. Here we lift that limitation by utilizing a reformulation of open quantum system dynamics to a partial differential equation (PDE) for a corresponding probability distribution $Q$, the Husimi Q function. Thus, we model the Q function seamlessly with *off-the-shelf* deep generative models such as normalizing flows. Additionally, we develop novel methods for learning normalizing flow evolution governed by high-dimensional PDEs based on the Euler method and the application of the time-dependent variational principle. We name the resulting approach *Q-Flow* and demonstrate the scalability and efficiency of Q-Flow on open quantum system simulations, including the dissipative harmonic oscillator and the dissipative bosonic model. Q-Flow is superior to conventional PDE solvers and state-of-the-art physics-informed neural network solvers, especially in high-dimensional systems.'
volume: 202
URL: https://proceedings.mlr.press/v202/dugan23a.html
PDF: https://proceedings.mlr.press/v202/dugan23a/dugan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dugan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Owen M
family: Dugan
- given: Peter Y.
family: Lu
- given: Rumen
family: Dangovski
- given: Di
family: Luo
- given: Marin
family: Soljacic
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8879-8901
id: dugan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8879
lastpage: 8901
published: 2023-07-03 00:00:00 +0000
- title: 'Adaptive Whitening in Neural Populations with Gain-modulating Interneurons'
abstract: 'Statistical whitening transformations play a fundamental role in many computational systems, and may also play an important role in biological sensory systems. Existing neural circuit models of adaptive whitening operate by modifying synaptic interactions; however, such modifications would seem both too slow and insufficiently reversible. Motivated by the extensive neuroscience literature on gain modulation, we propose an alternative model that adaptively whitens its responses by modulating the gains of individual neurons. Starting from a novel whitening objective, we derive an online algorithm that whitens its outputs by adjusting the marginal variances of an overcomplete set of projections. We map the algorithm onto a recurrent neural network with fixed synaptic weights and gain-modulating interneurons. We demonstrate numerically that sign-constraining the gains improves robustness of the network to ill-conditioned inputs, and a generalization of the circuit achieves a form of local whitening in convolutional populations, such as those found throughout the visual or auditory systems.'
volume: 202
URL: https://proceedings.mlr.press/v202/duong23a.html
PDF: https://proceedings.mlr.press/v202/duong23a/duong23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-duong23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lyndon
family: Duong
- given: David
family: Lipshutz
- given: David
family: Heeger
- given: Dmitri
family: Chklovskii
- given: Eero P
family: Simoncelli
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8902-8921
id: duong23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8902
lastpage: 8921
published: 2023-07-03 00:00:00 +0000
- title: 'Generalization Bounds using Data-Dependent Fractal Dimensions'
abstract: 'Providing generalization guarantees for modern neural networks has been a crucial task in statistical learning. Recently, several studies have attempted to analyze the generalization error in such settings by using tools from fractal geometry. While these works have successfully introduced new mathematical tools to apprehend generalization, they heavily rely on a Lipschitz continuity assumption, which in general does not hold for neural networks and might make the bounds vacuous. In this work, we address this issue and prove fractal geometry-based generalization bounds without requiring any Lipschitz assumption. To achieve this goal, we build up on a classical covering argument in learning theory and introduce a data-dependent fractal dimension. Despite introducing a significant amount of technical complications, this new notion lets us control the generalization error (over either fixed or random hypothesis spaces) along with certain mutual information (MI) terms. To provide a clearer interpretation to the newly introduced MI terms, as a next step, we introduce a notion of ‘geometric stability’ and link our bounds to the prior art. Finally, we make a rigorous connection between the proposed data-dependent dimension and topological data analysis tools, which then enables us to compute the dimension in a numerically efficient way. We support our theory with experiments conducted on various settings.'
volume: 202
URL: https://proceedings.mlr.press/v202/dupuis23a.html
PDF: https://proceedings.mlr.press/v202/dupuis23a/dupuis23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dupuis23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Benjamin
family: Dupuis
- given: George
family: Deligiannidis
- given: Umut
family: Simsekli
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8922-8968
id: dupuis23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8922
lastpage: 8968
published: 2023-07-03 00:00:00 +0000
- title: 'Multi-Objective Population Based Training'
abstract: 'Population Based Training (PBT) is an efficient hyperparameter optimization algorithm. PBT is a single-objective algorithm, but many real-world hyperparameter optimization problems involve two or more conflicting objectives. In this work, we therefore introduce a multi-objective version of PBT, MO-PBT. Our experiments on diverse multi-objective hyperparameter optimization problems (Precision/Recall, Accuracy/Fairness, Accuracy/Adversarial Robustness) show that MO-PBT outperforms random search, single-objective PBT, and the state-of-the-art multi-objective hyperparameter optimization algorithm MO-ASHA.'
volume: 202
URL: https://proceedings.mlr.press/v202/dushatskiy23a.html
PDF: https://proceedings.mlr.press/v202/dushatskiy23a/dushatskiy23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dushatskiy23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arkadiy
family: Dushatskiy
- given: Alexander
family: Chebykin
- given: Tanja
family: Alderliesten
- given: Peter
family: Bosman
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8969-8989
id: dushatskiy23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8969
lastpage: 8989
published: 2023-07-03 00:00:00 +0000
- title: 'Neural Diffusion Processes'
abstract: 'Neural network approaches for meta-learning distributions over functions have desirable properties such as increased flexibility and a reduced complexity of inference. Building on the successes of denoising diffusion models for generative modelling, we propose Neural Diffusion Processes (NDPs), a novel approach that learns to sample from a rich distribution over functions through its finite marginals. By introducing a custom attention block we are able to incorporate properties of stochastic processes, such as exchangeability, directly into the NDP’s architecture. We empirically show that NDPs can capture functional distributions close to the true Bayesian posterior, demonstrating that they can successfully emulate the behaviour of Gaussian processes and surpass the performance of neural processes. NDPs enable a variety of downstream tasks, including regression, implicit hyperparameter marginalisation, non-Gaussian posterior prediction and global optimisation.'
volume: 202
URL: https://proceedings.mlr.press/v202/dutordoir23a.html
PDF: https://proceedings.mlr.press/v202/dutordoir23a/dutordoir23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-dutordoir23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vincent
family: Dutordoir
- given: Alan
family: Saul
- given: Zoubin
family: Ghahramani
- given: Fergus
family: Simpson
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 8990-9012
id: dutordoir23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 8990
lastpage: 9012
published: 2023-07-03 00:00:00 +0000
- title: 'FAENet: Frame Averaging Equivariant GNN for Materials Modeling'
abstract: 'Applications of machine learning techniques for materials modeling typically involve functions that are known to be equivariant or invariant to specific symmetries. While graph neural networks (GNNs) have proven successful in such applications, conventional GNN approaches that enforce symmetries via the model architecture often reduce expressivity, scalability or comprehensibility. In this paper, we introduce (1) a flexible, model-agnostic framework based on stochastic frame averaging that enforces E(3) equivariance or invariance, without any architectural constraints; (2) FAENet: a simple, fast and expressive GNN that leverages stochastic frame averaging to process geometric information without constraints. We prove the validity of our method theoretically and demonstrate its superior accuracy and computational scalability in materials modeling on the OC20 dataset (S2EF, IS2RE) as well as common molecular modeling tasks (QM9, QM7-X).'
volume: 202
URL: https://proceedings.mlr.press/v202/duval23a.html
PDF: https://proceedings.mlr.press/v202/duval23a/duval23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-duval23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexandre Agm
family: Duval
- given: Victor
family: Schmidt
- given: Alex
family: Hernández-Garcı́a
- given: Santiago
family: Miret
- given: Fragkiskos D.
family: Malliaros
- given: Yoshua
family: Bengio
- given: David
family: Rolnick
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9013-9033
id: duval23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9013
lastpage: 9033
published: 2023-07-03 00:00:00 +0000
- title: 'Blackout Diffusion: Generative Diffusion Models in Discrete-State Spaces'
abstract: 'Typical generative diffusion models rely on a Gaussian diffusion process for training the backward transformations, which can then be used to generate samples from Gaussian noise. However, real world data often takes place in discrete-state spaces, including many scientific applications. Here, we develop a theoretical formulation for arbitrary discrete-state Markov processes in the forward diffusion process using exact (as opposed to variational) analysis. We relate the theory to the existing continuous-state Gaussian diffusion as well as other approaches to discrete diffusion, and identify the corresponding reverse-time stochastic process and score function in the continuous-time setting, and the reverse-time mapping in the discrete-time setting. As an example of this framework, we introduce “Blackout Diffusion”, which learns to produce samples from an empty image instead of from noise. Numerical experiments on the CIFAR-10, Binarized MNIST, and CelebA datasets confirm the feasibility of our approach. Generalizing from specific (Gaussian) forward processes to discrete-state processes without a variational approximation sheds light on how to interpret diffusion models, which we discuss.'
volume: 202
URL: https://proceedings.mlr.press/v202/santos23a.html
PDF: https://proceedings.mlr.press/v202/santos23a/santos23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-santos23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Javier E.
family: Santos
- given: Zachary R.
family: Fox
- given: Nicholas
family: Lubbers
- given: Yen Ting
family: Lin
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9034-9059
id: santos23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9034
lastpage: 9059
published: 2023-07-03 00:00:00 +0000
- title: 'The Computational Complexity of Concise Hypersphere Classification'
abstract: 'Hypersphere classification is a classical and foundational method that can provide easy-to-process explanations for the classification of real-valued as well as binary data. However, obtaining an (ideally concise) explanation via hypersphere classification is much more difficult when dealing with binary data as opposed to real-valued data. In this paper, we perform the first complexity-theoretic study of the hypersphere classification problem for binary data. We use the fine-grained parameterized complexity paradigm to analyze the impact of structural properties that may be present in the input data as well as potential conciseness constraints. Our results include not only stronger lower bounds but also a number of new fixed-parameter algorithms for hypersphere classification of binary data, which can find an exact and concise explanation when one exists.'
volume: 202
URL: https://proceedings.mlr.press/v202/eiben23a.html
PDF: https://proceedings.mlr.press/v202/eiben23a/eiben23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-eiben23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eduard
family: Eiben
- given: Robert
family: Ganian
- given: Iyad A.
family: Kanj
- given: Sebastian
family: Ordyniak
- given: Stefan
family: Szeider
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9060-9070
id: eiben23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9060
lastpage: 9070
published: 2023-07-03 00:00:00 +0000
- title: 'E$(n)$ Equivariant Message Passing Simplicial Networks'
abstract: 'This paper presents $\mathrm{E}(n)$ Equivariant Message Passing Simplicial Networks (EMPSNs), a novel approach to learning on geometric graphs and point clouds that is equivariant to rotations, translations, and reflections. EMPSNs can learn high-dimensional simplex features in graphs (e.g. triangles), and use the increase of geometric information of higher-dimensional simplices in an $\mathrm{E}(n)$ equivariant fashion. EMPSNs simultaneously generalize $\mathrm{E}(n)$ Equivariant Graph Neural Networks to a topologically more elaborate counterpart and provide an approach for including geometric information in Message Passing Simplicial Networks, thereby serving as a proof of concept for combining geometric and topological information in graph learning. The results indicate that EMPSNs can leverage the benefits of both approaches, leading to a general increase in performance when compared to either method individually, being on par with state-of-the-art approaches for learning on geometric graphs. Moreover, the results suggest that incorporating geometric information serves as an effective measure against over-smoothing in message passing networks, especially when operating on high-dimensional simplicial structures.'
volume: 202
URL: https://proceedings.mlr.press/v202/eijkelboom23a.html
PDF: https://proceedings.mlr.press/v202/eijkelboom23a/eijkelboom23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-eijkelboom23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Floor
family: Eijkelboom
- given: Rob
family: Hesselink
- given: Erik J
family: Bekkers
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9071-9081
id: eijkelboom23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9071
lastpage: 9081
published: 2023-07-03 00:00:00 +0000
- title: 'Performative Recommendation: Diversifying Content via Strategic Incentives'
abstract: 'The primary goal in recommendation is to suggest relevant content to users, but optimizing for accuracy often results in recommendations that lack diversity. To remedy this, conventional approaches such as re-ranking improve diversity by *presenting* more diverse items. Here we argue that to promote inherent and prolonged diversity, the system must encourage its *creation*. Towards this, we harness the performative nature of recommendation, and show how learning can incentivize strategic content creators to create diverse content. Our approach relies on a novel form of regularization that anticipates strategic changes to content, and penalizes for content homogeneity. We provide analytic and empirical results that demonstrate when and how diversity can be incentivized, and experimentally demonstrate the utility of our approach on synthetic and semi-synthetic data.'
volume: 202
URL: https://proceedings.mlr.press/v202/eilat23a.html
PDF: https://proceedings.mlr.press/v202/eilat23a/eilat23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-eilat23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Itay
family: Eilat
- given: Nir
family: Rosenfeld
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9082-9103
id: eilat23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9082
lastpage: 9103
published: 2023-07-03 00:00:00 +0000
- title: 'Hyperparameters in Reinforcement Learning and How To Tune Them'
abstract: 'In order to improve reproducibility, deep reinforcement learning (RL) has been adopting better scientific practices such as standardized evaluation metrics and reporting. However, the process of hyperparameter optimization still varies widely across papers, which makes it challenging to compare RL algorithms fairly. In this paper, we show that hyperparameter choices in RL can significantly affect the agent’s final performance and sample efficiency, and that the hyperparameter landscape can strongly depend on the tuning seed which may lead to overfitting. We therefore propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds, as well as principled hyperparameter optimization (HPO) across a broad search space. We support this by comparing multiple state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts, demonstrating that HPO approaches often have higher performance and lower compute overhead. As a result of our findings, we recommend a set of best practices for the RL community, which should result in stronger empirical results with fewer computational costs, better reproducibility, and thus faster progress. In order to encourage the adoption of these practices, we provide plug-and-play implementations of the tuning algorithms used in this paper at https://github.com/facebookresearch/how-to-autorl.'
volume: 202
URL: https://proceedings.mlr.press/v202/eimer23a.html
PDF: https://proceedings.mlr.press/v202/eimer23a/eimer23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-eimer23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Theresa
family: Eimer
- given: Marius
family: Lindauer
- given: Roberta
family: Raileanu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9104-9149
id: eimer23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9104
lastpage: 9149
published: 2023-07-03 00:00:00 +0000
- title: 'Fairness in Streaming Submodular Maximization over a Matroid Constraint'
abstract: 'Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in developing fair machine learning algorithms. Recently, such algorithms have been developed for monotone submodular maximization under a cardinality constraint. In this paper, we study the natural generalization of this problem to a matroid constraint. We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness. We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks.'
volume: 202
URL: https://proceedings.mlr.press/v202/el-halabi23a.html
PDF: https://proceedings.mlr.press/v202/el-halabi23a/el-halabi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-el-halabi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marwa
family: El Halabi
- given: Federico
family: Fusco
- given: Ashkan
family: Norouzi-Fard
- given: Jakab
family: Tardos
- given: Jakub
family: Tarnawski
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9150-9171
id: el-halabi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9150
lastpage: 9171
published: 2023-07-03 00:00:00 +0000
- title: 'Difference of submodular minimization via DC programming'
abstract: 'Minimizing the difference of two submodular (DS) functions is a problem that naturally occurs in various machine learning problems. Although it is well known that a DS problem can be equivalently formulated as the minimization of the difference of two convex (DC) functions, existing algorithms do not fully exploit this connection. A classical algorithm for DC problems is called the DC algorithm (DCA). We introduce variants of DCA and its complete form (CDCA) that we apply to the DC program corresponding to DS minimization. We extend existing convergence properties of DCA, and connect them to convergence properties on the DS problem. Our results on DCA match the theoretical guarantees satisfied by existing DS algorithms, while providing a more complete characterization of convergence properties. In the case of CDCA, we obtain a stronger local minimality guarantee. Our numerical results show that our proposed algorithms outperform existing baselines on two applications: speech corpus selection and feature selection.'
volume: 202
URL: https://proceedings.mlr.press/v202/el-halabi23b.html
PDF: https://proceedings.mlr.press/v202/el-halabi23b/el-halabi23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-el-halabi23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marwa
family: El Halabi
- given: George
family: Orfanides
- given: Tim
family: Hoheisel
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9172-9201
id: el-halabi23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9172
lastpage: 9201
published: 2023-07-03 00:00:00 +0000
- title: 'Graph Positional Encoding via Random Feature Propagation'
abstract: 'Two main families of node feature augmentation schemes have been explored for enhancing GNNs: random features and spectral positional encoding. Surprisingly, however, there is still no clear understanding of the relation between these two augmentation schemes. Here we propose a novel family of positional encoding schemes which draws a link between the above two approaches and improves over both. The new approach, named Random Feature Propagation (RFP), is inspired by the power iteration method and its generalizations. It concatenates several intermediate steps of an iterative algorithm for computing the dominant eigenvectors of a propagation matrix, starting from random node features. Notably, these propagation steps are based on graph-dependent propagation operators that can be either predefined or learned. We explore the theoretical and empirical benefits of RFP. First, we provide theoretical justifications for using random features, for incorporating early propagation steps, and for using multiple random initializations. Then, we empirically demonstrate that RFP significantly outperforms both spectral PE and random features in multiple node classification and graph classification benchmarks.'
volume: 202
URL: https://proceedings.mlr.press/v202/eliasof23a.html
PDF: https://proceedings.mlr.press/v202/eliasof23a/eliasof23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-eliasof23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Moshe
family: Eliasof
- given: Fabrizio
family: Frasca
- given: Beatrice
family: Bevilacqua
- given: Eran
family: Treister
- given: Gal
family: Chechik
- given: Haggai
family: Maron
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9202-9223
id: eliasof23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9202
lastpage: 9223
published: 2023-07-03 00:00:00 +0000
- title: 'Improving Graph Neural Networks with Learnable Propagation Operators'
abstract: 'Graph Neural Networks (GNNs) are limited in their propagation operators. In many cases, these operators often contain non-negative elements only and are shared across channels, limiting the expressiveness of GNNs. Moreover, some GNNs suffer from over-smoothing, limiting their depth. On the other hand, Convolutional Neural Networks (CNNs) can learn diverse propagation filters, and phenomena like over-smoothing are typically not apparent in CNNs. In this paper, we bridge these gaps by incorporating trainable channel-wise weighting factors $\omega$ to learn and mix multiple smoothing and sharpening propagation operators at each layer. Our generic method is called $\omega$GNN, and is easy to implement. We study two variants: $\omega$GCN and $\omega$GAT. For $\omega$GCN, we theoretically analyse its behaviour and the impact of $\omega$ on the obtained node features. Our experiments confirm these findings, demonstrating and explaining how both variants do not over-smooth. Additionally, we experiment with 15 real-world datasets on node- and graph-classification tasks, where our $\omega$GCN and $\omega$GAT perform on par with state-of-the-art methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/eliasof23b.html
PDF: https://proceedings.mlr.press/v202/eliasof23b/eliasof23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-eliasof23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Moshe
family: Eliasof
- given: Lars
family: Ruthotto
- given: Eran
family: Treister
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9224-9245
id: eliasof23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9224
lastpage: 9245
published: 2023-07-03 00:00:00 +0000
- title: 'Phase Transitions in the Detection of Correlated Databases'
abstract: 'We study the problem of detecting the correlation between two Gaussian databases $\mathsf{X}\in\mathbb{R}^{n\times d}$ and $\mathsf{Y}^{n\times d}$, each composed of $n$ users with $d$ features. This problem is relevant in the analysis of social media, computational biology, etc. We formulate this as a hypothesis testing problem: under the null hypothesis, these two databases are statistically independent. Under the alternative, however, there exists an unknown permutation $\sigma$ over the set of $n$ users (or, row permutation), such that $\mathsf{X}$ is $\rho$-correlated with $\mathsf{Y}^\sigma$, a permuted version of $\mathsf{Y}$. We determine sharp thresholds at which optimal testing exhibits a phase transition, depending on the asymptotic regime of $n$ and $d$. Specifically, we prove that if $\rho^2d\to0$, as $d\to\infty$, then weak detection (performing slightly better than random guessing) is statistically impossible, *irrespectively* of the value of $n$. This compliments the performance of a simple test that thresholds the sum all entries of $\mathsf{X}^T\mathsf{Y}$. Furthermore, when $d$ is fixed, we prove that strong detection (vanishing error probability) is impossible for any $\rho<\rho^\star$, where $\rho^\star$ is an explicit function of $d$, while weak detection is again impossible as long as $\rho^2d=o(1)$, as $n\to\infty$. These results close significant gaps in current recent related studies.'
volume: 202
URL: https://proceedings.mlr.press/v202/elimelech23a.html
PDF: https://proceedings.mlr.press/v202/elimelech23a/elimelech23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-elimelech23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dor
family: Elimelech
- given: Wasim
family: Huleihel
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9246-9266
id: elimelech23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9246
lastpage: 9266
published: 2023-07-03 00:00:00 +0000
- title: 'A new near-linear time algorithm for k-nearest neighbor search using a compressed cover tree'
abstract: 'Given a reference set R of n points and a query set Q of m points in a metric space, this paper studies an important problem of finding k-nearest neighbors of every point q of Q in the set R in a near-linear time. In the paper at ICML 2006, Beygelzimer, Kakade, and Langford introduced a cover tree and attempted to prove that this tree can be built in O(n log n) time while the nearest neighbor search can be done O(n log m) time with a hidden dimensionality factor. In 2015, section 5.3 of Curtin’s PhD pointed out that the proof of the latter claim can have a serious gap in time complexity estimation. A paper at TopoInVis 2022 reported explicit counterexamples for a key step in the proofs of both claims. The past obstacles will be overcome by a simpler compressed cover tree on the reference set R. The first new algorithm constructs a compressed cover tree in O(n log n) time. The second new algorithm finds all k-nearest neighbors of all points from Q using a compressed cover tree in time O(m(k+log n)log k) with a hidden dimensionality factor depending on point distributions of the sets R,Q but not on their sizes.'
volume: 202
URL: https://proceedings.mlr.press/v202/elkin23a.html
PDF: https://proceedings.mlr.press/v202/elkin23a/elkin23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-elkin23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yury
family: Elkin
- given: Vitaliy
family: Kurlin
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9267-9311
id: elkin23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9267
lastpage: 9311
published: 2023-07-03 00:00:00 +0000
- title: 'Motion Question Answering via Modular Motion Programs'
abstract: 'In order to build artificial intelligence systems that can perceive and reason with human behavior in the real world, we must first design models that conduct complex spatio-temporal reasoning over motion sequences. Moving towards this goal, we propose the HumanMotionQA task to evaluate complex, multi-step reasoning abilities of models on long-form human motion sequences. We generate a dataset of question-answer pairs that require detecting motor cues in small portions of motion sequences, reasoning temporally about when events occur, and querying specific motion attributes. In addition, we propose NSPose, a neuro-symbolic method for this task that uses symbolic reasoning and a modular design to ground motion through learning motion concepts, attribute neural operators, and temporal relations. We demonstrate the suitability of NSPose for the HumanMotionQA task, outperforming all baseline methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/endo23a.html
PDF: https://proceedings.mlr.press/v202/endo23a/endo23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-endo23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mark
family: Endo
- given: Joy
family: Hsu
- given: Jiaman
family: Li
- given: Jiajun
family: Wu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9312-9328
id: endo23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9312
lastpage: 9328
published: 2023-07-03 00:00:00 +0000
- title: 'Learning Perturbations to Explain Time Series Predictions'
abstract: 'Explaining predictions based on multivariate time series data carries the additional difficulty of handling not only multiple features, but also time dependencies. It matters not only what happened, but also when, and the same feature could have a very different impact on a prediction depending on this time information. Previous work has used perturbation-based saliency methods to tackle this issue, perturbing an input using a trainable mask to discover which features at which times are driving the predictions. However these methods introduce fixed perturbations, inspired from similar methods on static data, while there seems to be little motivation to do so on temporal data. In this work, we aim to explain predictions by learning not only masks, but also associated perturbations. We empirically show that learning these perturbations significantly improves the quality of these explanations on time series data.'
volume: 202
URL: https://proceedings.mlr.press/v202/enguehard23a.html
PDF: https://proceedings.mlr.press/v202/enguehard23a/enguehard23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-enguehard23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Joseph
family: Enguehard
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9329-9342
id: enguehard23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9329
lastpage: 9342
published: 2023-07-03 00:00:00 +0000
- title: 'Regret Minimization and Convergence to Equilibria in General-sum Markov Games'
abstract: 'An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. The bounds we obtain are for $\textit{swap regret}$, and thus, along the way, imply convergence to a $\textit{correlated}$ equilibrium. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents. Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of $\textit{weighted}$ regret minimization, with $\textit{unknown}$ weights determined by the path length of the agents’ policy sequence. Consequently, controlling the path length leads to weighted regret objectives for which sufficiently adaptive algorithms provide sublinear regret guarantees.'
volume: 202
URL: https://proceedings.mlr.press/v202/erez23a.html
PDF: https://proceedings.mlr.press/v202/erez23a/erez23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-erez23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Liad
family: Erez
- given: Tal
family: Lancewicki
- given: Uri
family: Sherman
- given: Tomer
family: Koren
- given: Yishay
family: Mansour
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9343-9373
id: erez23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9343
lastpage: 9373
published: 2023-07-03 00:00:00 +0000
- title: 'Delayed Bandits: When Do Intermediate Observations Help?'
abstract: 'We study a $K$-armed bandit with delayed feedback and intermediate observations. We consider a model, where intermediate observations have a form of a finite state, which is observed immediately after taking an action, whereas the loss is observed after an adversarially chosen delay. We show that the regime of the mapping of states to losses determines the complexity of the problem, irrespective of whether the mapping of actions to states is stochastic or adversarial. If the mapping of states to losses is adversarial, then the regret rate is of order $\sqrt{(K+d)T}$ (within log factors), where $T$ is the time horizon and $d$ is a fixed delay. This matches the regret rate of a $K$-armed bandit with delayed feedback and without intermediate observations, implying that intermediate observations are not helpful. However, if the mapping of states to losses is stochastic, we show that the regret grows at a rate of $\sqrt{\bigl(K+\min\{|\mathcal{S}|,d\}\bigr)T}$ (within log factors), implying that if the number $|\mathcal{S}|$ of states is smaller than the delay, then intermediate observations help. We also provide refined high-probability regret upper bounds for non-uniform delays, together with experimental validation of our algorithms.'
volume: 202
URL: https://proceedings.mlr.press/v202/esposito23a.html
PDF: https://proceedings.mlr.press/v202/esposito23a/esposito23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-esposito23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Emmanuel
family: Esposito
- given: Saeed
family: Masoudian
- given: Hao
family: Qiu
- given: Dirk
family: Van Der Hoeven
- given: Nicolò
family: Cesa-Bianchi
- given: Yevgeny
family: Seldin
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9374-9395
id: esposito23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9374
lastpage: 9395
published: 2023-07-03 00:00:00 +0000
- title: 'Scaling Spherical CNNs'
abstract: 'Spherical CNNs generalize CNNs to functions on the sphere, by using spherical convolutions as the main linear operation. The most accurate and efficient way to compute spherical convolutions is in the spectral domain (via the convolution theorem), which is still costlier than the usual planar convolutions. For this reason, applications of spherical CNNs have so far been limited to small problems that can be approached with low model capacity. In this work, we show how spherical CNNs can be scaled for much larger problems. To achieve this, we make critical improvements including novel variants of common model components, an implementation of core operations to exploit hardware accelerator characteristics, and application-specific input representations that exploit the properties of our model. Experiments show our larger spherical CNNs reach state-of-the-art on several targets of the QM9 molecular benchmark, which was previously dominated by equivariant graph neural networks, and achieve competitive performance on multiple weather forecasting tasks. Our code is available at https://github.com/google-research/spherical-cnn.'
volume: 202
URL: https://proceedings.mlr.press/v202/esteves23a.html
PDF: https://proceedings.mlr.press/v202/esteves23a/esteves23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-esteves23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Carlos
family: Esteves
- given: Jean-Jacques
family: Slotine
- given: Ameesh
family: Makadia
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9396-9411
id: esteves23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9396
lastpage: 9411
published: 2023-07-03 00:00:00 +0000
- title: 'Stochastic Gradient Descent under Markovian Sampling Schemes'
abstract: 'We study a variation of vanilla stochastic gradient descent where the optimizer only has access to a Markovian sampling scheme. These schemes encompass applications that range from decentralized optimization with a random walker (token algorithms), to RL and online system identification problems. We focus on obtaining rates of convergence under the least restrictive assumptions possible on the underlying Markov chain and on the functions optimized. We first unveil the theoretical lower bound for methods that sample stochastic gradients along the path of a Markov chain, making appear a dependency in the hitting time of the underlying Markov chain. We then study Markov chain SGD (MC-SGD) under much milder regularity assumptions than prior works. We finally introduce MC-SAG, an alternative to MC-SGD with variance reduction, that only depends on the hitting time of the Markov chain, therefore obtaining a communication-efficient token algorithm.'
volume: 202
URL: https://proceedings.mlr.press/v202/even23a.html
PDF: https://proceedings.mlr.press/v202/even23a/even23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-even23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mathieu
family: Even
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9412-9439
id: even23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9412
lastpage: 9439
published: 2023-07-03 00:00:00 +0000
- title: 'Continual Learning in Linear Classification on Separable Data'
abstract: 'We analyze continual learning on a sequence of separable linear classification tasks with binary labels. We show theoretically that learning with weak regularization reduces to solving a sequential max-margin problem, corresponding to a special case of the Projection Onto Convex Sets (POCS) framework. We then develop upper bounds on the forgetting and other quantities of interest under various settings with recurring tasks, including cyclic and random orderings of tasks. We discuss several practical implications to popular training practices like regularization scheduling and weighting. We point out several theoretical differences between our continual classification setting and a recently studied continual regression setting.'
volume: 202
URL: https://proceedings.mlr.press/v202/evron23a.html
PDF: https://proceedings.mlr.press/v202/evron23a/evron23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-evron23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Itay
family: Evron
- given: Edward
family: Moroshko
- given: Gon
family: Buzaglo
- given: Maroun
family: Khriesh
- given: Badea
family: Marjieh
- given: Nathan
family: Srebro
- given: Daniel
family: Soudry
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9440-9484
id: evron23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9440
lastpage: 9484
published: 2023-07-03 00:00:00 +0000
- title: 'A Connection between One-Step RL and Critic Regularization in Reinforcement Learning'
abstract: 'As with any machine learning problem with limited data, effective offline RL algorithms require careful regularization to avoid overfitting. One class of methods, known as one-step RL, perform just one step of policy improvement. These methods, which include advantage-weighted regression and conditional behavioral cloning, are thus simple and stable, but can have limited asymptotic performance. A second class of methods, known as critic regularization, perform many steps of policy improvement with a regularized objective. These methods typically require more compute but have appealing lower-bound guarantees. In this paper, we draw a connection between these methods: applying a multi-step critic regularization method with a regularization coefficient of 1 yields the same policy as one-step RL. While our theoretical results require assumptions (e.g., deterministic dynamics), our experiments nevertheless show that our analysis makes accurate, testable predictions about practical offline RL methods (CQL and one-step RL) with commonly-used hyperparameters.'
volume: 202
URL: https://proceedings.mlr.press/v202/eysenbach23a.html
PDF: https://proceedings.mlr.press/v202/eysenbach23a/eysenbach23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-eysenbach23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Benjamin
family: Eysenbach
- given: Matthieu
family: Geist
- given: Sergey
family: Levine
- given: Ruslan
family: Salakhutdinov
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9485-9507
id: eysenbach23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9485
lastpage: 9507
published: 2023-07-03 00:00:00 +0000
- title: 'Neural Status Registers'
abstract: 'We study the problem of learning comparisons between numbers with neural networks. Despite comparisons being a seemingly simple problem, we find that both general-purpose models such as multilayer perceptrons (MLPs) as well as arithmetic architectures such as the Neural Arithmetic Logic Unit (NALU) struggle with learning comparisons. Neither architecture can extrapolate to much larger numbers than those seen in the training set. We propose a novel differentiable architecture, the Neural Status Register (NSR) to solve this problem. We experimentally validate the NSR in various settings. We can combine the NSR with other neural models to solve interesting problems such as piecewise-defined arithmetic, comparison of digit images, recurrent problems, or finding shortest paths in graphs. The NSR outperforms all baseline architectures, especially when it comes to extrapolating to larger numbers.'
volume: 202
URL: https://proceedings.mlr.press/v202/faber23a.html
PDF: https://proceedings.mlr.press/v202/faber23a/faber23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-faber23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lukas
family: Faber
- given: Roger
family: Wattenhofer
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9508-9522
id: faber23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9508
lastpage: 9522
published: 2023-07-03 00:00:00 +0000
- title: 'Learning Rate Schedules in the Presence of Distribution Shift'
abstract: 'We design learning rate schedules that minimize regret for SGD-based online learning in the presence of a changing data distribution. We fully characterize the optimal learning rate schedule for online linear regression via a novel analysis with stochastic differential equations. For general convex loss functions, we propose new learning rate schedules that are robust to distribution shift, and give upper and lower bounds for the regret that only differ by constants. For non-convex loss functions, we define a notion of regret based on the gradient norm of the estimated models and propose a learning schedule that minimizes an upper bound on the total expected regret. Intuitively, one expects changing loss landscapes to require more exploration, and we confirm that optimal learning rate schedules typically have higher learning rates in the presence of distribution shift. Finally, we provide experiments that illustrate these learning rate schedules and their regret.'
volume: 202
URL: https://proceedings.mlr.press/v202/fahrbach23a.html
PDF: https://proceedings.mlr.press/v202/fahrbach23a/fahrbach23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fahrbach23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthew
family: Fahrbach
- given: Adel
family: Javanmard
- given: Vahab
family: Mirrokni
- given: Pratik
family: Worah
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9523-9546
id: fahrbach23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9523
lastpage: 9546
published: 2023-07-03 00:00:00 +0000
- title: 'Predicting Rare Events by Shrinking Towards Proportional Odds'
abstract: 'Training classifiers is difficult with severe class imbalance, but many rare events are the culmination of a sequence with much more common intermediate outcomes. For example, in online marketing a user first sees an ad, then may click on it, and finally may make a purchase; estimating the probability of purchases is difficult because of their rarity. We show both theoretically and through data experiments that the more abundant data in earlier steps may be leveraged to improve estimation of probabilities of rare events. We present PRESTO, a relaxation of the proportional odds model for ordinal regression. Instead of estimating weights for one separating hyperplane that is shifted by separate intercepts for each of the estimated Bayes decision boundaries between adjacent pairs of categorical responses, we estimate separate weights for each of these transitions. We impose an L1 penalty on the differences between weights for the same feature in adjacent weight vectors in order to shrink towards the proportional odds model. We prove that PRESTO consistently estimates the decision boundary weights under a sparsity assumption. Synthetic and real data experiments show that our method can estimate rare probabilities in this setting better than both logistic regression on the rare category, which fails to borrow strength from more abundant categories, and the proportional odds model, which is too inflexible.'
volume: 202
URL: https://proceedings.mlr.press/v202/faletto23a.html
PDF: https://proceedings.mlr.press/v202/faletto23a/faletto23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-faletto23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gregory
family: Faletto
- given: Jacob
family: Bien
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9547-9602
id: faletto23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9547
lastpage: 9602
published: 2023-07-03 00:00:00 +0000
- title: 'Free-Form Variational Inference for Gaussian Process State-Space Models'
abstract: 'Gaussian process state-space models (GPSSMs) provide a principled and flexible approach to modeling the dynamics of a latent state, which is observed at discrete-time points via a likelihood model. However, inference in GPSSMs is computationally and statistically challenging due to the large number of latent variables in the model and the strong temporal dependencies between them. In this paper, we propose a new method for inference in Bayesian GPSSMs, which overcomes the drawbacks of previous approaches, namely over-simplified assumptions, and high computational requirements. Our method is based on free-form variational inference via stochastic gradient Hamiltonian Monte Carlo within the inducing-variable formalism. Furthermore, by exploiting our proposed variational distribution, we provide a collapsed extension of our method where the inducing variables are marginalized analytically. We also showcase results when combining our framework with particle MCMC methods. We show that, on six real-world datasets, our approach can learn transition dynamics and latent states more accurately than competing methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/fan23a.html
PDF: https://proceedings.mlr.press/v202/fan23a/fan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xuhui
family: Fan
- given: Edwin V.
family: Bonilla
- given: Terence
family: O’Kane
- given: Scott A
family: Sisson
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9603-9622
id: fan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9603
lastpage: 9622
published: 2023-07-03 00:00:00 +0000
- title: 'Optimizing DDPM Sampling with Shortcut Fine-Tuning'
abstract: 'In this study, we propose Shortcut Fine-Tuning (SFT), a new approach for addressing the challenge of fast sampling of pretrained Denoising Diffusion Probabilistic Models (DDPMs). SFT advocates for the fine-tuning of DDPM samplers through the direct minimization of Integral Probability Metrics (IPM), instead of learning the backward diffusion process. This enables samplers to discover an alternative and more efficient sampling shortcut, deviating from the backward diffusion process. Inspired by a control perspective, we propose a new algorithm SFT-PG: Shortcut Fine-Tuning with Policy Gradient, and prove that under certain assumptions, gradient descent of diffusion models with respect to IPM is equivalent to performing policy gradient. To our best knowledge, this is the first attempt to utilize reinforcement learning (RL) methods to train diffusion models. Through empirical evaluation, we demonstrate that our fine-tuning method can further enhance existing fast DDPM samplers, resulting in sample quality comparable to or even surpassing that of the full-step model across various datasets.'
volume: 202
URL: https://proceedings.mlr.press/v202/fan23b.html
PDF: https://proceedings.mlr.press/v202/fan23b/fan23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fan23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ying
family: Fan
- given: Kangwook
family: Lee
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9623-9639
id: fan23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9623
lastpage: 9639
published: 2023-07-03 00:00:00 +0000
- title: 'LSDS++ : Dual Sampling for Accelerated k-means++'
abstract: 'k-means clustering is an important problem in machine learning and statistics. The k-means++ initialization algorithm has driven new acceleration strategies and theoretical analysis for solving the k-means clustering problem. The state-of-the-art variant, called LocalSearch++, adds extra local search steps upon k-means++ to achieve constant approximation error in expectation. In this paper, we propose a new variant named LSDS++, which improves the sampling efficiency of LocalSearch++ via a strategy called dual sampling. By defining a new capture graph based on the concept of coreset, we show that the proposed LSDS++ is able to achieve the same expected constant error with reduced complexity. Experiments are conducted to justify the benefit of LSDS++ in practice.'
volume: 202
URL: https://proceedings.mlr.press/v202/fan23c.html
PDF: https://proceedings.mlr.press/v202/fan23c/fan23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fan23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chenglin
family: Fan
- given: Ping
family: Li
- given: Xiaoyun
family: Li
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9640-9649
id: fan23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9640
lastpage: 9649
published: 2023-07-03 00:00:00 +0000
- title: 'Smart Initial Basis Selection for Linear Programs'
abstract: 'The simplex method, introduced by Dantzig more than half a century ago, is still to date one of the most efficient methods for solving large-scale linear programming (LP) problems. While the simplex method is known to have the finite termination property under mild assumptions, the number of iterations until optimality largely depends on the choice of initial basis. Existing strategies for selecting an advanced initial basis are mostly rule-based. These rules usually require extensive expert knowledge and empirical study to develop. Yet, many of them fail to exhibit consistent improvement, even for LP problems that arise in a single application scenario. In this paper, we propose a learning-based approach for initial basis selection. We employ graph neural networks as a building block and develop a model that attempts to capture the relationship between LP problems and their optimal bases. In addition, during the inference phase, we supplement the learning-based prediction with linear algebra tricks to ensure the validity of the generated initial basis. We validate the effectiveness of our proposed strategy by extensively testing it with state-of-the-art simplex solvers, including the open-source solver HiGHS and the commercial solver OptVerse. Through these rigorous experiments, we demonstrate that our strategy achieves substantial speedup and consistently outperforms existing rule-based methods. Furthermore, we extend the proposed approach to generating restricted master problems for column generation methods and present encouraging numerical results.'
volume: 202
URL: https://proceedings.mlr.press/v202/fan23d.html
PDF: https://proceedings.mlr.press/v202/fan23d/fan23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fan23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhenan
family: Fan
- given: Xinglu
family: Wang
- given: Oleksandr
family: Yakovenko
- given: Abdullah Ali
family: Sivas
- given: Owen
family: Ren
- given: Yong
family: Zhang
- given: Zirui
family: Zhou
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9650-9664
id: fan23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9650
lastpage: 9664
published: 2023-07-03 00:00:00 +0000
- title: 'General Covariance Data Augmentation for Neural PDE Solvers'
abstract: 'The growing body of research shows how to replace classical partial differential equation (PDE) integrators with neural networks. The popular strategy is to generate the input-output pairs with a PDE solver, train the neural network in the regression setting, and use the trained model as a cheap surrogate for the solver. The bottleneck in this scheme is the number of expensive queries of a PDE solver needed to generate the dataset. To alleviate the problem, we propose a computationally cheap augmentation strategy based on general covariance and simple random coordinate transformations. Our approach relies on the fact that physical laws are independent of the coordinate choice, so the change in the coordinate system preserves the type of a parametric PDE and only changes PDE’s data (e.g., initial conditions, diffusion coefficient). For tried neural networks and partial differential equations, proposed augmentation improves test error by 23% on average. The worst observed result is a 17% increase in test error for multilayer perceptron, and the best case is a 80% decrease for dilated residual network.'
volume: 202
URL: https://proceedings.mlr.press/v202/fanaskov23a.html
PDF: https://proceedings.mlr.press/v202/fanaskov23a/fanaskov23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fanaskov23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vladimir
family: Fanaskov
- given: Tianchi
family: Yu
- given: Alexander
family: Rudikov
- given: Ivan
family: Oseledets
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9665-9688
id: fanaskov23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9665
lastpage: 9688
published: 2023-07-03 00:00:00 +0000
- title: 'The Fast Johnson-Lindenstrauss Transform Is Even Faster'
abstract: 'The Johnson-Lindenstaruss lemma (Johnson & Lindenstrauss, 1984) is a cornerstone result in dimensionality reduction, stating it is possible to embed a set of $n$ points in $d$-dimensional Euclidean space into optimal $k=O(\varepsilon^{-2} \ln n)$ dimensions, while preserving all pairwise distances to within a factor $(1 \pm \varepsilon)$. The seminal Fast Johnson-Lindenstrauss (Fast JL) transform by Ailon and Chazelle (SICOMP’09) supports computing the embedding of a data point in $O(d \ln d +k \ln^2 n)$ time, where the $d \ln d$ term comes from multiplication with a $d \times d$ Hadamard matrix and the $k \ln^2 n$ term comes from multiplication with a sparse $k \times d$ matrix. Despite the Fast JL transform being more than a decade old, it is one of the fastest dimensionality reduction techniques for many tradeoffs between $\varepsilon, d$ and $n$. In this work, we give a surprising new analysis of the Fast JL transform, showing that the $k \ln^2 n$ term in the embedding time can be improved to $(k \ln^2 n)/\alpha$ for an $\alpha = \Omega(\min\{\varepsilon^{-1}\ln(1/\varepsilon), \ln n\})$. The improvement follows by using an even sparser matrix. We complement our improved analysis with a lower bound showing that our new analysis is in fact tight.'
volume: 202
URL: https://proceedings.mlr.press/v202/fandina23a.html
PDF: https://proceedings.mlr.press/v202/fandina23a/fandina23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fandina23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ora Nova
family: Fandina
- given: Mikael Møller
family: Høgsgaard
- given: Kasper Green
family: Larsen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9689-9715
id: fandina23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9689
lastpage: 9715
published: 2023-07-03 00:00:00 +0000
- title: 'Regression with Label Permutation in Generalized Linear Model'
abstract: 'The assumption that response and predictor belong to the same statistical unit may be violated in practice. Unbiased estimation and recovery of true label ordering based on unlabeled data are challenging tasks and have attracted increasing attentions in the recent literature. In this paper, we present a relatively complete analysis of label permutation problem for the generalized linear model with multivariate responses. The theory is established under different scenarios, with knowledge of true parameters, with partial knowledge of underlying label permutation matrix and without any knowledge. Our results remove the stringent conditions required by the current literature and are further extended to the missing observation setting which has never been considered in the field of label permutation problem. On computational side, we propose two methods, "maximum likelihood estimation" algorithm and "two-step estimation" algorithm, to accommodate for different settings. When the proportion of permuted labels is moderate, both methods work effectively. Multiple numerical experiments are provided and corroborate our theoretical findings.'
volume: 202
URL: https://proceedings.mlr.press/v202/fang23a.html
PDF: https://proceedings.mlr.press/v202/fang23a/fang23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fang23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Guanhua
family: Fang
- given: Ping
family: Li
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9716-9760
id: fang23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9716
lastpage: 9760
published: 2023-07-03 00:00:00 +0000
- title: 'Robust Collaborative Learning with Linear Gradient Overhead'
abstract: 'Collaborative learning algorithms, such as distributed SGD (or D-SGD), are prone to faulty machines that may deviate from their prescribed algorithm because of software or hardware bugs, poisoned data or malicious behaviors. While many solutions have been proposed to enhance the robustness of D-SGD to such machines, previous works either resort to strong assumptions (trusted server, homogeneous data, specific noise model) or impose a gradient computational cost that is several orders of magnitude higher than that of D-SGD. We present MoNNA, a new algorithm that (a) is provably robust under standard assumptions and (b) has a gradient computation overhead that is linear in the fraction of faulty machines, which is conjectured to be tight. Essentially, MoNNA uses Polyak’s momentum of local gradients for local updates and nearest-neighbor averaging (NNA) for global mixing, respectively. While MoNNA is rather simple to implement, its analysis has been more challenging and relies on two key elements that may be of independent interest. Specifically, we introduce the mixing criterion of $(\alpha, \lambda)$-reduction to analyze the non-linear mixing of non-faulty machines, and present a way to control the tension between the momentum and the model drifts. We validate our theory by experiments on image classification and make our code available at https://github.com/LPD-EPFL/robust-collaborative-learning.'
volume: 202
URL: https://proceedings.mlr.press/v202/farhadkhani23a.html
PDF: https://proceedings.mlr.press/v202/farhadkhani23a/farhadkhani23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-farhadkhani23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Sadegh
family: Farhadkhani
- given: Rachid
family: Guerraoui
- given: Nirupam
family: Gupta
- given: Lê-Nguyên
family: Hoang
- given: Rafael
family: Pinot
- given: John
family: Stephan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9761-9813
id: farhadkhani23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9761
lastpage: 9813
published: 2023-07-03 00:00:00 +0000
- title: 'Neural FIM for learning Fisher information metrics from point cloud data'
abstract: 'Although data diffusion embeddings are ubiquitous in unsupervised learning and have proven to be a viable technique for uncovering the underlying intrinsic geometry of data, diffusion embeddings are inherently limited due to their discrete nature. To this end, we propose neural FIM, a method for computing the Fisher information metric (FIM) from point cloud data - allowing for a continuous manifold model for the data. Neural FIM creates an extensible metric space from discrete point cloud data such that information from the metric can inform us of manifold characteristics such as volume and geodesics. We demonstrate Neural FIM’s utility in selecting parameters for the PHATE visualization method as well as its ability to obtain information pertaining to local volume illuminating branching points and cluster centers embeddings of a toy dataset and two single-cell datasets of IPSC reprogramming and PBMCs (immune cells).'
volume: 202
URL: https://proceedings.mlr.press/v202/fasina23a.html
PDF: https://proceedings.mlr.press/v202/fasina23a/fasina23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fasina23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Oluwadamilola
family: Fasina
- given: Guillaume
family: Huguet
- given: Alexander
family: Tong
- given: Yanlei
family: Zhang
- given: Guy
family: Wolf
- given: Maximilian
family: Nickel
- given: Ian
family: Adelstein
- given: Smita
family: Krishnaswamy
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9814-9826
id: fasina23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9814
lastpage: 9826
published: 2023-07-03 00:00:00 +0000
- title: 'Stochastic Policy Gradient Methods: Improved Sample Complexity for Fisher-non-degenerate Policies'
abstract: 'Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed the development of their theoretical foundations. Despite the huge efforts directed at the design of efficient stochastic PG-type algorithms, the understanding of their convergence to a globally optimal policy is still limited. In this work, we develop improved global convergence guarantees for a general class of Fisher-non-degenerate parameterized policies which allows to address the case of continuous state action spaces. First, we propose a Normalized Policy Gradient method with Implicit Gradient Transport (N-PG-IGT) and derive a $\tilde{\mathcal{O}}(\varepsilon^{-2.5})$ sample complexity of this method for finding a global $\varepsilon$-optimal policy. Improving over the previously known $\tilde{\mathcal{O}}(\varepsilon^{-3})$ complexity, this algorithm does not require the use of importance sampling or second-order information and samples only one trajectory per iteration. Second, we further improve this complexity to $\tilde{ \mathcal{\mathcal{O}} }(\varepsilon^{-2})$ by considering a Hessian-Aided Recursive Policy Gradient ((N)-HARPG) algorithm enhanced with a correction based on a Hessian-vector product. Interestingly, both algorithms are $(i)$ simple and easy to implement: single-loop, do not require large batches of trajectories and sample at most two trajectories per iteration; $(ii)$ computationally and memory efficient: they do not require expensive subroutines at each iteration and can be implemented with memory linear in the dimension of parameters.'
volume: 202
URL: https://proceedings.mlr.press/v202/fatkhullin23a.html
PDF: https://proceedings.mlr.press/v202/fatkhullin23a/fatkhullin23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fatkhullin23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ilyas
family: Fatkhullin
- given: Anas
family: Barakat
- given: Anastasia
family: Kireeva
- given: Niao
family: He
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9827-9869
id: fatkhullin23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9827
lastpage: 9869
published: 2023-07-03 00:00:00 +0000
- title: 'Parallel Neurosymbolic Integration with Concordia'
abstract: 'Parallel neurosymbolic architectures have been applied effectively in NLP by distilling knowledge from a logic theory into a deep model. However, prior art faces several limitations including supporting restricted forms of logic theories and relying on the assumption of independence between the logic and the deep network. We present Concordia, a framework overcoming the limitations of prior art. Concordia is agnostic both to the deep network and the logic theory offering support for a wide range of probabilistic theories. Our framework can support supervised training of both components and unsupervised training of the neural component. Concordia has been successfully applied to tasks beyond NLP and data classification, improving the accuracy of state-of-the-art on collective activity detection, entity linking and recommendation tasks.'
volume: 202
URL: https://proceedings.mlr.press/v202/feldstein23a.html
PDF: https://proceedings.mlr.press/v202/feldstein23a/feldstein23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-feldstein23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonathan
family: Feldstein
- given: Modestas
family: Jurčius
- given: Efthymia
family: Tsamoura
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9870-9885
id: feldstein23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9870
lastpage: 9885
published: 2023-07-03 00:00:00 +0000
- title: 'Why Target Networks Stabilise Temporal Difference Methods'
abstract: 'Integral to recent successes in deep reinforcement learning has been a class of temporal difference methods that use infrequently updated target values for policy evaluation in a Markov Decision Process. Yet a complete theoretical explanation for the effectiveness of target networks remains elusive. In this work, we provide an analysis of this popular class of algorithms, to finally answer the question: “why do target networks stabilise TD learning”? To do so, we formalise the notion of a partially fitted policy evaluation method, which describes the use of target networks and bridges the gap between fitted methods and semigradient temporal difference algorithms. Using this framework we are able to uniquely characterise the so-called deadly triad–the use of TD updates with (nonlinear) function approximation and off-policy data–which often leads to nonconvergent algorithms.This insight leads us to conclude that the use of target networks can mitigate the effects of poor conditioning in the Jacobian of the TD update. Instead, we show that under mild regularity con- ditions and a well tuned target network update frequency, convergence can be guaranteed even in the extremely challenging off-policy sampling and nonlinear function approximation setting.'
volume: 202
URL: https://proceedings.mlr.press/v202/fellows23a.html
PDF: https://proceedings.mlr.press/v202/fellows23a/fellows23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fellows23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mattie
family: Fellows
- given: Matthew J. A.
family: Smith
- given: Shimon
family: Whiteson
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9886-9909
id: fellows23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9886
lastpage: 9909
published: 2023-07-03 00:00:00 +0000
- title: 'Weighted Sampling without Replacement for Deep Top-$k$ Classification'
abstract: 'The top-$k$ classification accuracy is a crucial metric in machine learning and is often used to evaluate the performance of deep neural networks. These networks are typically trained using the cross-entropy loss, which optimizes for top-$1$ classification and is considered optimal in the case of infinite data. However, in real-world scenarios, data is often noisy and limited, leading to the need for more robust losses. In this paper, we propose using the Weighted Sampling Without Replacement (WSWR) method as a learning objective for top-$k$ loss. While traditional methods for evaluating **WSWR-based top-$k$ loss** are computationally impractical, we show a novel connection between WSWR and Reinforcement Learning (RL) and apply well-established RL algorithms to estimate gradients. We compared our method with recently proposed top-$k$ losses in various regimes of noise and data size for the prevalent use case of $k = 5$. Our experimental results reveal that our method consistently outperforms all other methods on the top-$k$ metric for noisy datasets, has more robustness on extreme testing scenarios, and achieves competitive results on training with limited data.'
volume: 202
URL: https://proceedings.mlr.press/v202/feng23a.html
PDF: https://proceedings.mlr.press/v202/feng23a/feng23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-feng23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dieqiao
family: Feng
- given: Yuanqi
family: Du
- given: Carla P
family: Gomes
- given: Bart
family: Selman
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9910-9920
id: feng23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9910
lastpage: 9920
published: 2023-07-03 00:00:00 +0000
- title: 'Improved Online Learning Algorithms for CTR Prediction in Ad Auctions'
abstract: 'In this work, we investigate the online learning problem of revenue maximization in ad auctions, where the seller needs to learn the click-through rates (CTRs) of each ad candidate and charge the price of the winner through a pay-per-click manner. We focus on two models of the advertisers’ strategic behaviors. First, we assume that the advertiser is completely myopic; i.e. in each round, they aim to maximize their utility only for the current round. In this setting, we develop an online mechanism based on upper-confidence bounds that achieves a tight $O(\sqrt{T})$ regret in the worst-case and negative regret when the values are static across all the auctions and there is a gap between the highest expected value (i.e. value multiplied by their CTR) and second highest expected value ad. Next, we assume that the advertiser is non-myopic and cares about their long term utility. This setting is much more complex since an advertiser is incentivized to influence the mechanism by bidding strategically in earlier rounds. In this setting, we provide an algorithm to achieve negative regret for the static valuation setting (with a positive gap), which is in sharp contrast with the prior work that shows $O(T^{2/3})$ regret when the valuation is generated by adversary.'
volume: 202
URL: https://proceedings.mlr.press/v202/feng23b.html
PDF: https://proceedings.mlr.press/v202/feng23b/feng23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-feng23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Zhe
family: Feng
- given: Christopher
family: Liaw
- given: Zixin
family: Zhou
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9921-9937
id: feng23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9921
lastpage: 9937
published: 2023-07-03 00:00:00 +0000
- title: 'Fractional Denoising for 3D Molecular Pre-training'
abstract: 'Coordinate denoising is a promising 3D molecular pre-training method, which has achieved remarkable performance in various downstream drug discovery tasks. Theoretically, the objective is equivalent to learning the force field, which is revealed helpful for downstream tasks. Nevertheless, there are two challenges for coordinate denoising to learn an effective force field, i.e. low coverage samples and isotropic force field. The underlying reason is that molecular distributions assumed by existing denoising methods fail to capture the anisotropic characteristic of molecules. To tackle these challenges, we propose a novel hybrid noise strategy, including noises on both dihedral angel and coordinate. However, denoising such hybrid noise in a traditional way is no more equivalent to learning the force field. Through theoretical deductions, we find that the problem is caused by the dependency of the input conformation for covariance. To this end, we propose to decouple the two types of noise and design a novel fractional denoising method (Frad), which only denoises the latter coordinate part. In this way, Frad enjoys both the merits of sampling more low-energy structures and the force field equivalence. Extensive experiments show the effectiveness of Frad in molecule representation, with a new state-of-the-art on 9 out of 12 tasks of QM9 and on 7 out of 8 targets of MD17.'
volume: 202
URL: https://proceedings.mlr.press/v202/feng23c.html
PDF: https://proceedings.mlr.press/v202/feng23c/feng23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-feng23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shikun
family: Feng
- given: Yuyan
family: Ni
- given: Yanyan
family: Lan
- given: Zhi-Ming
family: Ma
- given: Wei-Ying
family: Ma
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9938-9961
id: feng23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9938
lastpage: 9961
published: 2023-07-03 00:00:00 +0000
- title: 'Improved Algorithms for White-Box Adversarial Streams'
abstract: 'We study streaming algorithms in the white-box adversarial stream model, where the internal state of the streaming algorithm is revealed to an adversary who adaptively generates the stream updates, but the algorithm obtains fresh randomness unknown to the adversary at each time step. We incorporate cryptographic assumptions to construct robust algorithms against such adversaries. We propose efficient algorithms for sparse recovery of vectors, low rank recovery of matrices and tensors, as well as low rank plus sparse recovery of matrices, i.e., robust PCA. Unlike deterministic algorithms, our algorithms can report when the input is not sparse or low rank even in the presence of such an adversary. We use these recovery algorithms to improve upon and solve new problems in numerical linear algebra and combinatorial optimization on white-box adversarial streams. For example, we give the first efficient algorithm for outputting a matching in a graph with insertions and deletions to its edges provided the matching size is small, and otherwise we declare the matching size is large. We also improve the approximation versus memory tradeoff of previous work for estimating the number of non-zero elements in a vector and computing the matrix rank.'
volume: 202
URL: https://proceedings.mlr.press/v202/feng23d.html
PDF: https://proceedings.mlr.press/v202/feng23d/feng23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-feng23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ying
family: Feng
- given: David
family: Woodruff
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9962-9975
id: feng23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9962
lastpage: 9975
published: 2023-07-03 00:00:00 +0000
- title: 'Non-stationary Reinforcement Learning under General Function Approximation'
abstract: 'General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.'
volume: 202
URL: https://proceedings.mlr.press/v202/feng23e.html
PDF: https://proceedings.mlr.press/v202/feng23e/feng23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-feng23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Songtao
family: Feng
- given: Ming
family: Yin
- given: Ruiquan
family: Huang
- given: Yu-Xiang
family: Wang
- given: Jing
family: Yang
- given: Yingbin
family: Liang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 9976-10007
id: feng23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 9976
lastpage: 10007
published: 2023-07-03 00:00:00 +0000
- title: 'Random Matrix Analysis to Balance between Supervised and Unsupervised Learning under the Low Density Separation Assumption'
abstract: 'We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime. In particular, we introduce QLDS, a linear classification model, where the low density separation assumption is implemented via quadratic margin maximization. The algorithm has an explicit solution with rich theoretical properties, and we show that particular cases of our algorithm are the least-square support vector machine in the supervised case, the spectral clustering in the fully unsupervised regime, and a class of semi-supervised graph-based approaches. As such, QLDS establishes a smooth bridge between these supervised and unsupervised learning methods. Using recent advances in the random matrix theory, we formally derive a theoretical evaluation of the classification error in the asymptotic regime. As an application, we derive a hyperparameter selection policy that finds the best balance between the supervised and the unsupervised terms of our learning criterion. Finally, we provide extensive illustrations of our framework, as well as an experimental study on several benchmarks to demonstrate that QLDS, while being computationally more efficient, improves over cross-validation for hyperparameter selection, indicating a high promise of the usage of random matrix theory for semi-supervised model selection.'
volume: 202
URL: https://proceedings.mlr.press/v202/feofanov23a.html
PDF: https://proceedings.mlr.press/v202/feofanov23a/feofanov23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-feofanov23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vasilii
family: Feofanov
- given: Malik
family: Tiomoko
- given: Aladin
family: Virmaux
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10008-10033
id: feofanov23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10008
lastpage: 10033
published: 2023-07-03 00:00:00 +0000
- title: 'SurCo: Learning Linear SURrogates for COmbinatorial Nonlinear Optimization Problems'
abstract: 'Optimization problems with nonlinear cost functions and combinatorial constraints appear in many real-world applications but remain challenging to solve efficiently compared to their linear counterparts. To bridge this gap, we propose $\textbf{\emph{\texttt{SurCo}}}$ that learns linear $\underline{\text{Sur}}$rogate costs which can be used in existing $\underline{\text{Co}}$mbinatorial solvers to output good solutions to the original nonlinear combinatorial optimization problem. The surrogate costs are learned end-to-end with nonlinear loss by differentiating through the linear surrogate solver, combining the flexibility of gradient-based methods with the structure of linear combinatorial optimization. We propose three $\texttt{SurCo}$ variants: $\texttt{SurCo}-\texttt{zero}$ for individual nonlinear problems, $\texttt{SurCo}-\texttt{prior}$ for problem distributions, and $\texttt{SurCo}-\texttt{hybrid}$ to combine both distribution and problem-specific information. We give theoretical intuition motivating $\texttt{SurCo}$, and evaluate it empirically. Experiments show that $\texttt{SurCo}$ finds better solutions faster than state-of-the-art and domain expert approaches in real-world optimization problems such as embedding table sharding, inverse photonic design, and nonlinear route planning.'
volume: 202
URL: https://proceedings.mlr.press/v202/ferber23a.html
PDF: https://proceedings.mlr.press/v202/ferber23a/ferber23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ferber23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aaron M
family: Ferber
- given: Taoan
family: Huang
- given: Daochen
family: Zha
- given: Martin
family: Schubert
- given: Benoit
family: Steiner
- given: Bistra
family: Dilkina
- given: Yuandong
family: Tian
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10034-10052
id: ferber23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10034
lastpage: 10052
published: 2023-07-03 00:00:00 +0000
- title: 'Scaling Laws for Multilingual Neural Machine Translation'
abstract: 'In this work, we provide a large-scale empirical study of the scaling properties of multilingual neural machine translation models. We examine how increases in the model size affect the model performance and investigate the role of the individual language pair weights on the scaling behavior. We find that these weights only affect the multiplicative factor of the scaling law, and in particular, the scaling exponent is unaffected by them. Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models. We find little evidence that language similarity has any impact. In contrast, “direction” of the multilinguality plays a significant role, with models translating from multiple languages into English having a larger number of effective parameters per task than their reversed counterparts. Finally, we leverage our observations to predict the performance of multilingual models trained with any language weighting at any scale, greatly reducing efforts required for language balancing in large multilingual models. Our findings apply to both in-domain and out-of-domain test sets and to multiple evaluation metrics, such as ChrF and BLEURT.'
volume: 202
URL: https://proceedings.mlr.press/v202/fernandes23a.html
PDF: https://proceedings.mlr.press/v202/fernandes23a/fernandes23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fernandes23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Patrick
family: Fernandes
- given: Behrooz
family: Ghorbani
- given: Xavier
family: Garcia
- given: Markus
family: Freitag
- given: Orhan
family: Firat
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10053-10071
id: fernandes23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10053
lastpage: 10071
published: 2023-07-03 00:00:00 +0000
- title: 'Constant Matters: Fine-grained Error Bound on Differentially Private Continual Observation'
abstract: 'We study fine-grained error bounds for differentially private algorithms for counting under continual observation. Our main insight is that the matrix mechanism when using lower-triangular matrices can be used in the continual observation model. More specifically, we give an explicit factorization for the counting matrix $M_\mathsf{count}$ and upper bound the error explicitly. We also give a fine-grained analysis, specifying the exact constant in the upper bound. Our analysis is based on upper and lower bounds of the *completely bounded norm* (cb-norm) of $M_\mathsf{count}$. Along the way, we improve the best-known bound of 28 years by Mathias (SIAM Journal on Matrix Analysis and Applications, 1993) on the cb-norm of $M_\mathsf{count}$ for a large range of the dimension of $M_\mathsf{count}$. Furthermore, we are the first to give concrete error bounds for various problems under continual observation such as binary counting, maintaining a histogram, releasing an approximately cut-preserving synthetic graph, many graph-based statistics, and substring and episode counting. Finally, we note that our result can be used to get a fine-grained error bound for non-interactive local learning and the first lower bounds on the additive error for $(\epsilon,\delta)$-differentially-private counting under continual observation. Subsequent to this work, Henzinger et al. (SODA, 2023) showed that our factorization also achieves fine-grained mean-squared error.'
volume: 202
URL: https://proceedings.mlr.press/v202/fichtenberger23a.html
PDF: https://proceedings.mlr.press/v202/fichtenberger23a/fichtenberger23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fichtenberger23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Hendrik
family: Fichtenberger
- given: Monika
family: Henzinger
- given: Jalaj
family: Upadhyay
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10072-10092
id: fichtenberger23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10072
lastpage: 10092
published: 2023-07-03 00:00:00 +0000
- title: 'Adapting to game trees in zero-sum imperfect information games'
abstract: 'Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\widetilde{\mathcal{O}}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularized leader (FTRL) algorithms for this setting: Balanced FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive FTRL which needs $\widetilde{\mathcal{O}}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ realizations without this requirement by progressively adapting the regularization to the observations.'
volume: 202
URL: https://proceedings.mlr.press/v202/fiegel23a.html
PDF: https://proceedings.mlr.press/v202/fiegel23a/fiegel23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fiegel23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Côme
family: Fiegel
- given: Pierre
family: Menard
- given: Tadashi
family: Kozuno
- given: Remi
family: Munos
- given: Vianney
family: Perchet
- given: Michal
family: Valko
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10093-10135
id: fiegel23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10093
lastpage: 10135
published: 2023-07-03 00:00:00 +0000
- title: 'User-defined Event Sampling and Uncertainty Quantification in Diffusion Models for Physical Dynamical Systems'
abstract: 'Diffusion models are a class of probabilistic generative models that have been widely used as a prior for image processing tasks like text conditional generation and inpainting. We demonstrate that these models can be adapted to make predictions and provide uncertainty quantification for chaotic dynamical systems. In these applications, diffusion models can implicitly represent knowledge about outliers and extreme events; however, querying that knowledge through conditional sampling or measuring probabilities is surprisingly difficult. Existing methods for conditional sampling at inference time seek mainly to enforce the constraints, which is insufficient to match the statistics of the distribution or compute the probability of the chosen events. To achieve these ends, optimally one would use the conditional score function, but its computation is typically intractable. In this work, we develop a probabilistic approximation scheme for the conditional score function which provably converges to the true distribution as the noise level decreases. With this scheme we are able to sample conditionally on nonlinear user-defined events at inference time, and matches data statistics even when sampling from the tails of the distribution.'
volume: 202
URL: https://proceedings.mlr.press/v202/finzi23a.html
PDF: https://proceedings.mlr.press/v202/finzi23a/finzi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-finzi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Marc Anton
family: Finzi
- given: Anudhyan
family: Boral
- given: Andrew Gordon
family: Wilson
- given: Fei
family: Sha
- given: Leonardo
family: Zepeda-Nunez
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10136-10152
id: finzi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10136
lastpage: 10152
published: 2023-07-03 00:00:00 +0000
- title: 'ACAT: Adversarial Counterfactual Attention for Classification and Detection in Medical Imaging'
abstract: 'In some medical imaging tasks and other settings where only small parts of the image are informative for the classification task, traditional CNNs can sometimes struggle to generalise. Manually annotated Regions of Interest (ROI) are often used to isolate the most informative parts of the image. However, these are expensive to collect and may vary significantly across annotators. To overcome these issues, we propose a framework that employs saliency maps to obtain soft spatial attention masks that modulate the image features at different scales. We refer to our method as *Adversarial Counterfactual Attention* (ACAT). ACAT increases the baseline classification accuracy of lesions in brain CT scans from $71.39 %$ to $72.55 %$ and of COVID-19 related findings in lung CT scans from $67.71 %$ to $70.84 %$ and exceeds the performance of competing methods. We investigate the best way to generate the saliency maps employed in our architecture and propose a way to obtain them from adversarially generated counterfactual images. They are able to isolate the area of interest in brain and lung CT scans without using any manual annotations. In the task of localising the lesion location out of 6 possible regions, they obtain a score of $65.05 %$ on brain CT scans, improving the score of $61.29 %$ obtained with the best competing method.'
volume: 202
URL: https://proceedings.mlr.press/v202/fontanella23a.html
PDF: https://proceedings.mlr.press/v202/fontanella23a/fontanella23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fontanella23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alessandro
family: Fontanella
- given: Antreas
family: Antoniou
- given: Wenwen
family: Li
- given: Joanna
family: Wardlaw
- given: Grant
family: Mair
- given: Emanuele
family: Trucco
- given: Amos
family: Storkey
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10153-10169
id: fontanella23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10153
lastpage: 10169
published: 2023-07-03 00:00:00 +0000
- title: 'Explainable Data-Driven Optimization: From Context to Decision and Back Again'
abstract: 'Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters. While a vast body of work is dedicated to interpreting machine learning models in the classification setting, explaining decision pipelines involving learning algorithms remains unaddressed. This lack of interpretability can block the adoption of data-driven solutions as practitioners may not understand or trust the recommended decisions. We bridge this gap by introducing a counterfactual explanation methodology tailored to explain solutions to data-driven problems. We introduce two classes of explanations and develop methods to find nearest explanations of random forest and nearest-neighbor predictors. We demonstrate our approach by explaining key problems in operations management such as inventory management and routing.'
volume: 202
URL: https://proceedings.mlr.press/v202/forel23a.html
PDF: https://proceedings.mlr.press/v202/forel23a/forel23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-forel23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Alexandre
family: Forel
- given: Axel
family: Parmentier
- given: Thibaut
family: Vidal
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10170-10187
id: forel23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10170
lastpage: 10187
published: 2023-07-03 00:00:00 +0000
- title: 'Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games'
abstract: 'We consider the problem of decentralized multi-agent reinforcement learning in Markov games. A fundamental question is whether there exist algorithms that, when run independently by all agents, lead to no-regret for each player, analogous to celebrated convergence results for no-regret learning in normal-form games. While recent work has shown that such algorithms exist for restricted settings (notably, when regret is defined with respect to deviations to Markov policies), the question of whether independent no-regret learning can be achieved in the standard Markov game framework was open. We provide a decisive negative resolution to this problem, both from a computational and statistical perspective. We show that: • Under the complexity-theoretic assumption that PPAD $\neq$ P, there is no polynomial-time algorithm that attains no-regret in two-player general-sum Markov games when executed independently by all players, even when the game is known to the algorithm designer. • When the game is unknown, no algorithm, efficient or otherwise, can achieve no-regret without observing exponentially many episodes in the number of players. These results are proven via lower bounds for a simpler problem we refer to as SparseCCE, in which the goal is to compute a coarse correlated equilibrium that is “sparse” in the sense that it can be represented as a mixture of a small number of product policies.'
volume: 202
URL: https://proceedings.mlr.press/v202/foster23a.html
PDF: https://proceedings.mlr.press/v202/foster23a/foster23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-foster23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dylan J
family: Foster
- given: Noah
family: Golowich
- given: Sham M.
family: Kakade
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10188-10221
id: foster23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10188
lastpage: 10221
published: 2023-07-03 00:00:00 +0000
- title: 'Disentangled Generative Models for Robust Prediction of System Dynamics'
abstract: 'The use of deep neural networks for modelling system dynamics is increasingly popular, but long-term prediction accuracy and out-of-distribution generalization still present challenges. In this study, we address these challenges by considering the parameters of dynamical systems as factors of variation of the data and leverage their ground-truth values to disentangle the representations learned by generative models. Our experimental results in phase-space and observation-space dynamics, demonstrate the effectiveness of latent-space supervision in producing disentangled representations, leading to improved long-term prediction accuracy and out-of-distribution robustness.'
volume: 202
URL: https://proceedings.mlr.press/v202/fotiadis23a.html
PDF: https://proceedings.mlr.press/v202/fotiadis23a/fotiadis23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fotiadis23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Stathi
family: Fotiadis
- given: Mario
family: Lino Valencia
- given: Shunlong
family: Hu
- given: Stef
family: Garasto
- given: Chris D
family: Cantwell
- given: Anil Anthony
family: Bharath
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10222-10248
id: fotiadis23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10222
lastpage: 10248
published: 2023-07-03 00:00:00 +0000
- title: 'Can Forward Gradient Match Backpropagation?'
abstract: 'Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/fournier23a.html
PDF: https://proceedings.mlr.press/v202/fournier23a/fournier23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fournier23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Louis
family: Fournier
- given: Stephane
family: Rivaud
- given: Eugene
family: Belilovsky
- given: Michael
family: Eickenberg
- given: Edouard
family: Oyallon
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10249-10264
id: fournier23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10249
lastpage: 10264
published: 2023-07-03 00:00:00 +0000
- title: 'Last Switch Dependent Bandits with Monotone Payoff Functions'
abstract: 'In a recent work, Laforgue et al. introduce the model of last switch dependent (LSD) bandits, in an attempt to capture nonstationary phenomena induced by the interaction between the player and the environment. Examples include satiation, where consecutive plays of the same action lead to decreased performance, or deprivation, where the payoff of an action increases after an interval of inactivity. In this work, we take a step towards understanding the approximability of planning LSD bandits, namely, the (NP-hard) problem of computing an optimal arm-pulling strategy under complete knowledge of the model. In particular, we design the first efficient constant approximation algorithm for the problem and show that, under a natural monotonicity assumption on the payoffs, its approximation guarantee (almost) matches the state-of-the-art for the special and well-studied class of recharging bandits (also known as delay-dependent). In this attempt, we develop new tools and insights for this class of problems, including a novel higher-dimensional relaxation and the technique of mirroring the evolution of virtual states. We believe that these novel elements could potentially be used for approaching richer classes of action-induced nonstationary bandits (e.g., special instances of restless bandits). In the case where the model parameters are initially unknown, we develop an online learning adaptation of our algorithm for which we provide sublinear regret guarantees against its full-information counterpart.'
volume: 202
URL: https://proceedings.mlr.press/v202/foussoul23a.html
PDF: https://proceedings.mlr.press/v202/foussoul23a/foussoul23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-foussoul23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ayoub
family: Foussoul
- given: Vineet
family: Goyal
- given: Orestis
family: Papadigenopoulos
- given: Assaf
family: Zeevi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10265-10284
id: foussoul23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10265
lastpage: 10284
published: 2023-07-03 00:00:00 +0000
- title: 'A Theoretical Analysis of the Learning Dynamics under Class Imbalance'
abstract: 'Data imbalance is a common problem in machine learning that can have a critical effect on the performance of a model. Various solutions exist but their impact on the convergence of the learning dynamics is not understood. Here, we elucidate the significant negative impact of data imbalance on learning, showing that the learning curves for minority and majority classes follow sub-optimal trajectories when training with a gradient-based optimizer. This slowdown is related to the imbalance ratio and can be traced back to a competition between the optimization of different classes. Our main contribution is the analysis of the convergence of full-batch (GD) and stochastic gradient descent (SGD), and of variants that renormalize the contribution of each per-class gradient. We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient. With SGD, class imbalance has an additional effect on the direction of the gradients: the minority class suffers from a higher directional noise, which reduces the effectiveness of the per-class gradient normalization. Our findings not only allow us to understand the potential and limitations of strategies involving the per-class gradients, but also the reason for the effectiveness of previously used solutions for class imbalancesuch as oversampling.'
volume: 202
URL: https://proceedings.mlr.press/v202/francazi23a.html
PDF: https://proceedings.mlr.press/v202/francazi23a/francazi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-francazi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Emanuele
family: Francazi
- given: Marco
family: Baity-Jesi
- given: Aurelien
family: Lucchi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10285-10322
id: francazi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10285
lastpage: 10322
published: 2023-07-03 00:00:00 +0000
- title: 'SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot'
abstract: 'We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in *one-shot, without any retraining*, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt.'
volume: 202
URL: https://proceedings.mlr.press/v202/frantar23a.html
PDF: https://proceedings.mlr.press/v202/frantar23a/frantar23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-frantar23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elias
family: Frantar
- given: Dan
family: Alistarh
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10323-10337
id: frantar23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10323
lastpage: 10337
published: 2023-07-03 00:00:00 +0000
- title: 'Learning Temporally AbstractWorld Models without Online Experimentation'
abstract: 'Agents that can build temporally abstract representations of their environment are better able to understand their world and make plans on extended time scales, with limited computational power and modeling capacity. However, existing methods for automatically learning temporally abstract world models usually require millions of online environmental interactions and incentivize agents to reach every accessible environmental state, which is infeasible for most real-world robots both in terms of data efficiency and hardware safety. In this paper, we present an approach for simultaneously learning sets of skills and temporally abstract, skill-conditioned world models purely from offline data, enabling agents to perform zero-shot online planning of skill sequences for new tasks. We show that our approach performs comparably to or better than a wide array of state-of-the-art offline RL algorithms on a number of simulated robotics locomotion and manipulation benchmarks, while offering a higher degree of adaptability to new goals. Finally, we show that our approach offers a much higher degree of robustness to perturbations in environmental dynamics, compared to policy-based methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/freed23a.html
PDF: https://proceedings.mlr.press/v202/freed23a/freed23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-freed23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Benjamin
family: Freed
- given: Siddarth
family: Venkatraman
- given: Guillaume Adrien
family: Sartoretti
- given: Jeff
family: Schneider
- given: Howie
family: Choset
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10338-10356
id: freed23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10338
lastpage: 10356
published: 2023-07-03 00:00:00 +0000
- title: 'A Coupled Flow Approach to Imitation Learning'
abstract: 'In reinforcement learning and imitation learning, an object of central importance is the state distribution induced by the policy. It plays a crucial role in the policy gradient theorem, and references to it–along with the related state-action distribution–can be found all across the literature. Despite its importance, the state distribution is mostly discussed indirectly and theoretically, rather than being modeled explicitly. The reason being an absence of appropriate density estimation tools. In this work, we investigate applications of a normalizing flow based model for the aforementioned distributions. In particular, we use a pair of flows coupled through the optimality point of the Donsker-Varadhan representation of the Kullback-Leibler (KL) divergence, for distribution matching based imitation learning. Our algorithm, Coupled Flow Imitation Learning (CFIL), achieves state-of-the-art performance on benchmark tasks with a single expert trajectory and extends naturally to a variety of other settings, including the subsampled and state-only regimes.'
volume: 202
URL: https://proceedings.mlr.press/v202/freund23a.html
PDF: https://proceedings.mlr.press/v202/freund23a/freund23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-freund23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gideon Joseph
family: Freund
- given: Elad
family: Sarafian
- given: Sarit
family: Kraus
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10357-10372
id: freund23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10357
lastpage: 10372
published: 2023-07-03 00:00:00 +0000
- title: 'Simple Hardware-Efficient Long Convolutions for Sequence Modeling'
abstract: 'State space models (SSMs) have high performance on long sequence modeling but require sophisticated initialization techniques and specialized implementations for high quality and runtime performance. We study whether a simple alternative can match SSMs in performance and efficiency: directly learning long convolutions over the sequence. We find that a key requirement to achieving high performance is keeping the convolution kernels smooth. We find that simple interventions-such as squashing the kernel weights-result in smooth kernels and recover SSM performance on a range of tasks including the long range arena, image classification, language modeling, and brain data modeling. Next, we develop FlashButterfly, an IO-aware algorithm to improve the runtime performance of long convolutions. FlashButterfly appeals to classic Butterfly decompositions of the convolution to reduce GPU memory IO and increase FLOP utilization. FlashButterfly speeds up convolutions by 2.2$\times$, and allows us to train on Path256, a challenging task with sequence length 64K, where we set state-of-the-art by 29.1 points while training 7.2$\times$ faster than prior work. Lastly, we introduce an extension to FlashButterfly that learns the coefficients of the Butterfly decomposition, increasing expressivity without increasing runtime. Using this extension, we outperform a Transformer on WikiText103 by 0.2 PPL with 30% fewer parameters.'
volume: 202
URL: https://proceedings.mlr.press/v202/fu23a.html
PDF: https://proceedings.mlr.press/v202/fu23a/fu23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fu23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniel Y
family: Fu
- given: Elliot L
family: Epstein
- given: Eric
family: Nguyen
- given: Armin W
family: Thomas
- given: Michael
family: Zhang
- given: Tri
family: Dao
- given: Atri
family: Rudra
- given: Christopher
family: Re
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10373-10391
id: fu23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10373
lastpage: 10391
published: 2023-07-03 00:00:00 +0000
- title: 'MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses'
abstract: 'We propose a generalizable neural radiance fields - MonoNeRF, that can be trained on large-scale monocular videos of moving in static scenes without any ground-truth annotations of depth and camera poses. MonoNeRF follows an Autoencoder-based architecture, where the encoder estimates the monocular depth and the camera pose, and the decoder constructs a Multiplane NeRF representation based on the depth encoder feature, and renders the input frames with the estimated camera. The learning is supervised by the reconstruction error. Once the model is learned, it can be applied to multiple applications including depth estimation, camera pose estimation, and single-image novel view synthesis. More qualitative results are available at: https://oasisyang.github.io/mononerf.'
volume: 202
URL: https://proceedings.mlr.press/v202/fu23b.html
PDF: https://proceedings.mlr.press/v202/fu23b/fu23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fu23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yang
family: Fu
- given: Ishan
family: Misra
- given: Xiaolong
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10392-10404
id: fu23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10392
lastpage: 10404
published: 2023-07-03 00:00:00 +0000
- title: 'Go Beyond Imagination: Maximizing Episodic Reachability with World Models'
abstract: 'Efficient exploration is a challenging topic in reinforcement learning, especially for sparse reward tasks. To deal with the reward sparsity, people commonly apply intrinsic rewards to motivate agents to explore the state space efficiently. In this paper, we introduce a new intrinsic reward design called GoBI - Go Beyond Imagination, which combines the traditional lifelong novelty motivation with an episodic intrinsic reward that is designed to maximize the stepwise reachability expansion. More specifically, we apply learned world models to generate predicted future states with random actions. States with more unique predictions that are not in episodic memory are assigned high intrinsic rewards. Our method greatly outperforms previous state-of-the-art methods on 12 of the most challenging Minigrid navigation tasks and improves the sample efficiency on locomotion tasks from DeepMind Control Suite.'
volume: 202
URL: https://proceedings.mlr.press/v202/fu23c.html
PDF: https://proceedings.mlr.press/v202/fu23c/fu23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fu23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yao
family: Fu
- given: Run
family: Peng
- given: Honglak
family: Lee
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10405-10420
id: fu23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10405
lastpage: 10420
published: 2023-07-03 00:00:00 +0000
- title: 'Specializing Smaller Language Models towards Multi-Step Reasoning'
abstract: 'The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models. We show that such abilities can, in fact, be distilled down from GPT-3.5 (≥ 175B) to T5 variants (≤ 11B). We propose model specialization, to specialize the model’s ability towards a target task. The hypothesis is that large models (commonly viewed as larger than 100B) have strong modeling power such that they can perform a large spectrum of tasks. Small models (commonly viewed as smaller than 10B) have limited model capacity, but if we specialize their capacity towards a target task, the model can achieve decent performance improvements. We use multi-step math reasoning as our testbed because it is a very typical emergent ability. We show two important aspects of model abilities: (1) balancing language model’s performance on multiple tasks is a delicate matter, as improvements on one task may compromise other tasks; (2) yet by intentionally paying the price of decreased generic ability, we can clearly improve across different model scales smaller than 10B towards a specialized multi-step math reasoning ability. We further give comprehensive discussions about important design choices for better generalization, including the data format mixture and the start model checkpoint. We hope our practice and discoveries can serve as an important attempt towards specialized smaller models in the new research paradigm set by LLMs.'
volume: 202
URL: https://proceedings.mlr.press/v202/fu23d.html
PDF: https://proceedings.mlr.press/v202/fu23d/fu23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fu23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yao
family: Fu
- given: Hao
family: Peng
- given: Litu
family: Ou
- given: Ashish
family: Sabharwal
- given: Tushar
family: Khot
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10421-10430
id: fu23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10421
lastpage: 10430
published: 2023-07-03 00:00:00 +0000
- title: 'Accelerated Stochastic Optimization Methods under Quasar-convexity'
abstract: 'Non-convex optimization plays a key role in a growing number of machine learning applications. This motivates the identification of specialized structure that enables sharper theoretical analysis. One such identified structure is quasar-convexity, a non-convex generalization of convexity that subsumes convex functions. Existing algorithms for minimizing quasar-convex functions in the stochastic setting have either high complexity or slow convergence, which prompts us to derive a new class of stochastic methods for optimizing smooth quasar-convex functions. We demonstrate that our algorithms have fast convergence and outperform existing algorithms on several examples, including the classical problem of learning linear dynamical systems. We also present a unified analysis of our newly proposed algorithms and a previously studied deterministic algorithm.'
volume: 202
URL: https://proceedings.mlr.press/v202/fu23e.html
PDF: https://proceedings.mlr.press/v202/fu23e/fu23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fu23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Qiang
family: Fu
- given: Dongchu
family: Xu
- given: Ashia Camage
family: Wilson
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10431-10460
id: fu23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10431
lastpage: 10460
published: 2023-07-03 00:00:00 +0000
- title: 'Meta-learning Parameterized Skills'
abstract: 'We propose a novel parameterized skill-learning algorithm that aims to learn transferable parameterized skills and synthesize them into a new action space that supports efficient learning in long-horizon tasks. We propose to leverage off-policy Meta-RL combined with a trajectory-centric smoothness term to learn a set of parameterized skills. Our agent can use these learned skills to construct a three-level hierarchical framework that models a Temporally-extended Parameterized Action Markov Decision Process. We empirically demonstrate that the proposed algorithms enable an agent to solve a set of highly difficult long-horizon (obstacle-course and robot manipulation) tasks.'
volume: 202
URL: https://proceedings.mlr.press/v202/fu23f.html
PDF: https://proceedings.mlr.press/v202/fu23f/fu23f.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fu23f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Haotian
family: Fu
- given: Shangqun
family: Yu
- given: Saket
family: Tiwari
- given: Michael
family: Littman
- given: George
family: Konidaris
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10461-10481
id: fu23f
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10461
lastpage: 10481
published: 2023-07-03 00:00:00 +0000
- title: 'NeRFool: Uncovering the Vulnerability of Generalizable Neural Radiance Fields against Adversarial Perturbations'
abstract: 'Generalizable Neural Radiance Fields (GNeRF) are one of the most promising real-world solutions for novel view synthesis, thanks to their cross-scene generalization capability and thus the possibility of instant rendering on new scenes. While adversarial robustness is essential for real-world applications, little study has been devoted to understanding its implication on GNeRF. We hypothesize that because GNeRF is implemented by conditioning on the source views from new scenes, which are often acquired from the Internet or third-party providers, there are potential new security concerns regarding its real-world applications. Meanwhile, existing understanding and solutions for neural networks’ adversarial robustness may not be applicable to GNeRF, due to its 3D nature and uniquely diverse operations. To this end, we present NeRFool, which to the best of our knowledge is the first work that sets out to understand the adversarial robustness of GNeRF. Specifically, NeRFool unveils the vulnerability patterns and important insights regarding GNeRF’s adversarial robustness. Built upon the above insights gained from NeRFool, we further develop NeRFool$^+$, which integrates two techniques capable of effectively attacking GNeRF across a wide range of target views, and provide guidelines for defending against our proposed attacks. We believe that our NeRFool/NeRFool$^+$ lays the initial foundation for future innovations in developing robust real-world GNeRF solutions. Our codes are available at: https://github.com/GATECH-EIC/NeRFool.'
volume: 202
URL: https://proceedings.mlr.press/v202/fu23g.html
PDF: https://proceedings.mlr.press/v202/fu23g/fu23g.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-fu23g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yonggan
family: Fu
- given: Ye
family: Yuan
- given: Souvik
family: Kundu
- given: Shang
family: Wu
- given: Shunyao
family: Zhang
- given: Yingyan Celine
family: Lin
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10482-10493
id: fu23g
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10482
lastpage: 10493
published: 2023-07-03 00:00:00 +0000
- title: 'Hierarchies of Reward Machines'
abstract: 'Reward machines (RMs) are a recent formalism for representing the reward function of a reinforcement learning task through a finite-state machine whose edges encode subgoals of the task using high-level events. The structure of RMs enables the decomposition of a task into simpler and independently solvable subtasks that help tackle long-horizon and/or sparse reward tasks. We propose a formalism for further abstracting the subtask structure by endowing an RM with the ability to call other RMs, thus composing a hierarchy of RMs (HRM). We exploit HRMs by treating each call to an RM as an independently solvable subtask using the options framework, and describe a curriculum-based method to learn HRMs from traces observed by the agent. Our experiments reveal that exploiting a handcrafted HRM leads to faster convergence than with a flat HRM, and that learning an HRM is feasible in cases where its equivalent flat representation is not.'
volume: 202
URL: https://proceedings.mlr.press/v202/furelos-blanco23a.html
PDF: https://proceedings.mlr.press/v202/furelos-blanco23a/furelos-blanco23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-furelos-blanco23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniel
family: Furelos-Blanco
- given: Mark
family: Law
- given: Anders
family: Jonsson
- given: Krysia
family: Broda
- given: Alessandra
family: Russo
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10494-10541
id: furelos-blanco23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10494
lastpage: 10541
published: 2023-07-03 00:00:00 +0000
- title: 'Why Random Pruning Is All We Need to Start Sparse'
abstract: 'Random masks define surprisingly effective sparse neural network models, as has been shown empirically. The resulting sparse networks can often compete with dense architectures and state-of-the-art lottery ticket pruning algorithms, even though they do not rely on computationally expensive prune-train iterations and can be drawn initially without significant computational overhead. We offer a theoretical explanation of how random masks can approximate arbitrary target networks if they are wider by a logarithmic factor in the inverse sparsity $1 / \log(1/\text{sparsity})$. This overparameterization factor is necessary at least for 3-layer random networks, which elucidates the observed degrading performance of random networks at higher sparsity. At moderate to high sparsity levels, however, our results imply that sparser networks are contained within random source networks so that any dense-to-sparse training scheme can be turned into a computationally more efficient sparse-to-sparse one by constraining the search to a fixed random mask. We demonstrate the feasibility of this approach in experiments for different pruning methods and propose particularly effective choices of initial layer-wise sparsity ratios of the random source network. As a special case, we show theoretically and experimentally that random source networks also contain strong lottery tickets.'
volume: 202
URL: https://proceedings.mlr.press/v202/gadhikar23a.html
PDF: https://proceedings.mlr.press/v202/gadhikar23a/gadhikar23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gadhikar23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Advait Harshal
family: Gadhikar
- given: Sohom
family: Mukherjee
- given: Rebekka
family: Burkholz
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10542-10570
id: gadhikar23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10542
lastpage: 10570
published: 2023-07-03 00:00:00 +0000
- title: 'Cell-Free Latent Go-Explore'
abstract: 'In this paper, we introduce Latent Go-Explore (LGE), a simple and general approach based on the Go-Explore paradigm for exploration in reinforcement learning (RL). Go-Explore was initially introduced with a strong domain knowledge constraint for partitioning the state space into cells. However, in most real-world scenarios, drawing domain knowledge from raw observations is complex and tedious. If the cell partitioning is not informative enough, Go-Explore can completely fail to explore the environment. We argue that the Go-Explore approach can be generalized to any environment without domain knowledge and without cells by exploiting a learned latent representation. Thus, we show that LGE can be flexibly combined with any strategy for learning a latent representation. Our results indicate that LGE, although simpler than Go-Explore, is more robust and outperforms state-of-the-art algorithms in terms of pure exploration on multiple hard-exploration environments including Montezuma’s Revenge. The LGE implementation is available as open-source at https://github.com/qgallouedec/lge.'
volume: 202
URL: https://proceedings.mlr.press/v202/gallouedec23a.html
PDF: https://proceedings.mlr.press/v202/gallouedec23a/gallouedec23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gallouedec23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Quentin
family: Gallouédec
- given: Emmanuel
family: Dellandrea
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10571-10586
id: gallouedec23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10571
lastpage: 10586
published: 2023-07-03 00:00:00 +0000
- title: 'Graph Reinforcement Learning for Network Control via Bi-Level Optimization'
abstract: 'Optimization problems over dynamic networks have been extensively studied and widely used in the past decades to formulate numerous real-world problems. However, (1) traditional optimization-based approaches do not scale to large networks, and (2) the design of good heuristics or approximation algorithms often requires significant manual trial-and-error. In this work, we argue that data-driven strategies can automate this process and learn efficient algorithms without compromising optimality. To do so, we present network control problems through the lens of reinforcement learning and propose a graph network-based framework to handle a broad class of problems. Instead of naively computing actions over high-dimensional graph elements, e.g., edges, we propose a bi-level formulation where we (1) specify a desired next state via RL, and (2) solve a convex program to best achieve it, leading to drastically improved scalability and performance. We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework.'
volume: 202
URL: https://proceedings.mlr.press/v202/gammelli23a.html
PDF: https://proceedings.mlr.press/v202/gammelli23a/gammelli23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gammelli23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Daniele
family: Gammelli
- given: James
family: Harrison
- given: Kaidi
family: Yang
- given: Marco
family: Pavone
- given: Filipe
family: Rodrigues
- given: Francisco C.
family: Pereira
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10587-10610
id: gammelli23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10587
lastpage: 10610
published: 2023-07-03 00:00:00 +0000
- title: 'Why Is Public Pretraining Necessary for Private Model Training?'
abstract: 'In the privacy-utility tradeoff of a model trained on benchmark language and vision tasks, remarkable improvements have been widely reported when the model is pretrained on public data. Some gain is expected as these models inherit the benefits of transfer learning, which is the standard motivation in non-private settings. However, the stark contrast in the gain of pretraining between non-private and private machine learning suggests that the gain in the latter is rooted in a fundamentally different cause. To explain this phenomenon, we hypothesize that the non-convex loss landscape of a model training necessitates the optimization algorithm to go through two phases. In the first, the algorithm needs to select a good “basin” in the loss landscape. In the second, the algorithm solves an easy optimization within that basin. The former is a harder problem to solve with private data, while the latter is harder to solve with public data due to a distribution shift or data scarcity. Guided by this intuition, we provide theoretical constructions that provably demonstrate the separation between private training with and without public pretraining. Further, systematic experiments on CIFAR10 and Librispeech provide supporting evidence for our hypothesis.'
volume: 202
URL: https://proceedings.mlr.press/v202/ganesh23a.html
PDF: https://proceedings.mlr.press/v202/ganesh23a/ganesh23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ganesh23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Arun
family: Ganesh
- given: Mahdi
family: Haghifam
- given: Milad
family: Nasr
- given: Sewoong
family: Oh
- given: Thomas
family: Steinke
- given: Om
family: Thakkar
- given: Abhradeep
family: Guha Thakurta
- given: Lun
family: Wang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10611-10627
id: ganesh23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10611
lastpage: 10627
published: 2023-07-03 00:00:00 +0000
- title: 'Do Perceptually Aligned Gradients Imply Robustness?'
abstract: 'Adversarially robust classifiers possess a trait that non-robust models do not - Perceptually Aligned Gradients (PAG). Their gradients with respect to the input align well with human perception. Several works have identified PAG as a byproduct of robust training, but none have considered it as a standalone phenomenon nor studied its own implications. In this work, we focus on this trait and test whether Perceptually Aligned Gradients imply Robustness. To this end, we develop a novel objective to directly promote PAG in training classifiers and examine whether models with such gradients are more robust to adversarial attacks. Extensive experiments on multiple datasets and architectures validate that models with aligned gradients exhibit significant robustness, exposing the surprising bidirectional connection between PAG and robustness. Lastly, we show that better gradient alignment leads to increased robustness and harness this observation to boost the robustness of existing adversarial training techniques.'
volume: 202
URL: https://proceedings.mlr.press/v202/ganz23a.html
PDF: https://proceedings.mlr.press/v202/ganz23a/ganz23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ganz23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Roy
family: Ganz
- given: Bahjat
family: Kawar
- given: Michael
family: Elad
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10628-10648
id: ganz23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10628
lastpage: 10648
published: 2023-07-03 00:00:00 +0000
- title: 'Solving Linear Programs with Fast Online Learning Algorithms'
abstract: 'This paper presents fast first-order methods for solving linear programs (LPs) approximately. We adapt online linear programming algorithms to offline LPs and obtain algorithms that avoid any matrix multiplication. We also introduce a variable-duplication technique that copies each variable $K$ times and reduces the optimality gap and constraint violation by a factor of $\sqrt{K}$. Furthermore, we show how online algorithms can be effectively integrated into sifting, a column generation scheme for large-scale LPs. Numerical experiments demonstrate that our methods can serve as either an approximate direct solver, or an initialization subroutine for exact LP solving.'
volume: 202
URL: https://proceedings.mlr.press/v202/gao23a.html
PDF: https://proceedings.mlr.press/v202/gao23a/gao23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gao23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Wenzhi
family: Gao
- given: Dongdong
family: Ge
- given: Chunlin
family: Sun
- given: Yinyu
family: Ye
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10649-10675
id: gao23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10649
lastpage: 10675
published: 2023-07-03 00:00:00 +0000
- title: 'Gradient Descent Finds the Global Optima of Two-Layer Physics-Informed Neural Networks'
abstract: 'The main aim of this paper is to conduct the convergence analysis of the gradient descent for two-layer physics-informed neural networks (PINNs). Here, the loss function involves derivatives of neural network outputs with respect to its inputs, so the interaction between the trainable parameters is more complicated compared with simple regression and classification tasks. We first develop the positive definiteness of Gram matrices and prove that the gradient flow finds the global optima of the empirical loss under over-parameterization. Then, we demonstrate that the standard gradient descent converges to the global optima of the loss with proper choices of learning rates. The framework of our analysis works for various categories of PDEs (e.g., linear second-order PDEs) and common types of network initialization (LecunUniform etc.). Our theoretical results do not need a very strict hypothesis for training samples and have a looser requirement on the network width compared with some previous works.'
volume: 202
URL: https://proceedings.mlr.press/v202/gao23b.html
PDF: https://proceedings.mlr.press/v202/gao23b/gao23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gao23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Yihang
family: Gao
- given: Yiqi
family: Gu
- given: Michael
family: Ng
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10676-10707
id: gao23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10676
lastpage: 10707
published: 2023-07-03 00:00:00 +0000
- title: 'Generalizing Neural Wave Functions'
abstract: 'Recent neural network-based wave functions have achieved state-of-the-art accuracies in modeling ab-initio ground-state potential energy surface. However, these networks can only solve different spatial arrangements of the same set of atoms. To overcome this limitation, we present Graph-learned orbital embeddings (Globe), a neural network-based reparametrization method that can adapt neural wave functions to different molecules. Globe learns representations of local electronic structures that generalize across molecules via spatial message passing by connecting molecular orbitals to covalent bonds. Further, we propose a size-consistent wave function Ansatz, the Molecular orbital network (Moon), tailored to jointly solve Schrödinger equations of different molecules. In our experiments, we find Moon converging in 4.5 times fewer steps to similar accuracy as previous methods or to lower energies given the same time. Further, our analysis shows that Moon’s energy estimate scales additively with increased system sizes, unlike previous work where we observe divergence. In both computational chemistry and machine learning, we are the first to demonstrate that a single wave function can solve the Schrödinger equation of molecules with different atoms jointly.'
volume: 202
URL: https://proceedings.mlr.press/v202/gao23c.html
PDF: https://proceedings.mlr.press/v202/gao23c/gao23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gao23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Nicholas
family: Gao
- given: Stephan
family: Günnemann
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10708-10726
id: gao23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10708
lastpage: 10726
published: 2023-07-03 00:00:00 +0000
- title: 'On the Impact of Algorithmic Recourse on Social Segregation'
abstract: 'As predictive models seep into several real-world applications, it has become critical to ensure that individuals who are negatively impacted by the outcomes of these models are provided with a means for recourse. To this end, there has been a growing body of research on algorithmic recourse in recent years. While recourses can be extremely beneficial to affected individuals, their implementation at a large scale can lead to potential data distribution shifts and other unintended consequences. However, there is little to no research on understanding the impact of algorithmic recourse after implementation. In this work, we address the aforementioned gaps by making one of the first attempts at analyzing the delayed societal impact of algorithmic recourse. To this end, we theoretically and empirically analyze the recourses output by state-of-the-art algorithms. Our analysis demonstrates that large-scale implementation of recourses by end users may exacerbate social segregation. To address this problem, we propose novel algorithms which leverage implicit and explicit conditional generative models to not only minimize the chance of segregation but also provide realistic recourses. Extensive experimentation with real-world datasets demonstrates the efficacy of the proposed approaches.'
volume: 202
URL: https://proceedings.mlr.press/v202/gao23d.html
PDF: https://proceedings.mlr.press/v202/gao23d/gao23d.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gao23d.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ruijiang
family: Gao
- given: Himabindu
family: Lakkaraju
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10727-10743
id: gao23d
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10727
lastpage: 10743
published: 2023-07-03 00:00:00 +0000
- title: 'DDGR: Continual Learning with Deep Diffusion-based Generative Replay'
abstract: 'Popular deep-learning models in the field of image classification suffer from catastrophic forgetting—models will forget previously acquired skills when learning new ones. Generative replay (GR), which typically consists of a generator and a classifier, is an efficient way to mitigate catastrophic forgetting. However, conventional GR methods only focus on a single instruction relationship (generator-to-classifier), where the generator synthesizes samples for previous tasks to instruct the training of the classifier, while ignoring the ways in which the classifier can benefit the generator. In addition, most generative replay methods typically reuse the generated samples to update the generator, which causes the samples regenerated by the generator deviating from the distribution of previous tasks. To overcome these two issues, we propose a novel approach, called deep diffusion-based generative replay (DDGR), which adopts a diffusion model as the generator and calculates an instruction-operator through the classifier to instruct the generation of samples. Extensive experiments in class incremental (CI) and class incremental with repetition (CIR) settings demonstrate the advantages of DDGR. Our code is available at https://github.com/xiaocangshengGR/DDGR.'
volume: 202
URL: https://proceedings.mlr.press/v202/gao23e.html
PDF: https://proceedings.mlr.press/v202/gao23e/gao23e.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gao23e.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Rui
family: Gao
- given: Weiwei
family: Liu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10744-10763
id: gao23e
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10744
lastpage: 10763
published: 2023-07-03 00:00:00 +0000
- title: 'PAL: Program-aided Language Models'
abstract: 'Large language models (LLMs) have demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time ("few-shot prompting"). Much of this success can be attributed to prompting methods such as "chain-of-thought", which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is decomposed correctly. In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. With PAL, decomposing the natural language problem into runnable steps remains the only learning task for the LLM, while solving is delegated to the interpreter. We demonstrate this synergy between a neural LLM and a symbolic interpreter across 13 mathematical, symbolic, and algorithmic reasoning tasks from BIG-Bench Hard and others. In all these natural language reasoning tasks, generating code using an LLM and reasoning using a Python interpreter leads to more accurate results than much larger models. For example, PAL using Codex achieves state-of-the-art few-shot accuracy on GSM8K, surpassing PaLM which uses chain-of-thought by absolute 15% top-1.'
volume: 202
URL: https://proceedings.mlr.press/v202/gao23f.html
PDF: https://proceedings.mlr.press/v202/gao23f/gao23f.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gao23f.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Luyu
family: Gao
- given: Aman
family: Madaan
- given: Shuyan
family: Zhou
- given: Uri
family: Alon
- given: Pengfei
family: Liu
- given: Yiming
family: Yang
- given: Jamie
family: Callan
- given: Graham
family: Neubig
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10764-10799
id: gao23f
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10764
lastpage: 10799
published: 2023-07-03 00:00:00 +0000
- title: 'Out-of-Domain Robustness via Targeted Augmentations'
abstract: 'Models trained on one set of domains often suffer performance drops on unseen domains, e.g., when wildlife monitoring models are deployed in new camera locations. In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization. In particular, we focus on real-world scenarios in which some domain-dependent features are robust, i.e., some features that vary across domains are predictive OOD. For example, in the wildlife monitoring application above, image backgrounds vary across camera locations but indicate habitat type, which helps predict the species of photographed animals. Motivated by theoretical analysis on a linear setting, we propose targeted augmentations, which selectively randomize spurious domain-dependent features while preserving robust ones. We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In contrast, existing approaches such as generic augmentations, which fail to randomize domain-dependent features, and domain-invariant augmentations, which randomize all domain-dependent features, both perform poorly OOD. In experiments on three real-world datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2-15.2%.'
volume: 202
URL: https://proceedings.mlr.press/v202/gao23g.html
PDF: https://proceedings.mlr.press/v202/gao23g/gao23g.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gao23g.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Irena
family: Gao
- given: Shiori
family: Sagawa
- given: Pang Wei
family: Koh
- given: Tatsunori
family: Hashimoto
- given: Percy
family: Liang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10800-10834
id: gao23g
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10800
lastpage: 10834
published: 2023-07-03 00:00:00 +0000
- title: 'Scaling Laws for Reward Model Overoptimization'
abstract: 'In reinforcement learning from human feedback, it is common to optimize against a reward model trained to predict human preferences. Because the reward model is an imperfect proxy, optimizing its value too much can hinder ground truth performance, in accordance with Goodhart’s law. This effect has been frequently observed, but not carefully measured due to the expense of collecting human preference data. In this work, we use a synthetic setup in which a fixed “gold-standard” reward model plays the role of humans, providing labels used to train a proxy reward model. We study how the gold reward model score changes as we optimize against the proxy reward model using either reinforcement learning or best-of-$n$ sampling. We find that this relationship follows a different functional form depending on the method of optimization, and that in both cases its coefficients scale smoothly with the number of reward model parameters. We also study the effect on this relationship of the size of the reward model dataset, the number of reward model and policy parameters, and the coefficient of the KL penalty added to the reward in the reinforcement learning setup. We explore the implications of these empirical results for theoretical considerations in AI alignment.'
volume: 202
URL: https://proceedings.mlr.press/v202/gao23h.html
PDF: https://proceedings.mlr.press/v202/gao23h/gao23h.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gao23h.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Leo
family: Gao
- given: John
family: Schulman
- given: Jacob
family: Hilton
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10835-10866
id: gao23h
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10835
lastpage: 10866
published: 2023-07-03 00:00:00 +0000
- title: 'The Unreasonable Effectiveness of Few-shot Learning for Machine Translation'
abstract: 'We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT’21 English-Chinese news translation task by only using five examples of English-Chinese parallel data at inference. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation — we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.'
volume: 202
URL: https://proceedings.mlr.press/v202/garcia23a.html
PDF: https://proceedings.mlr.press/v202/garcia23a/garcia23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-garcia23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Xavier
family: Garcia
- given: Yamini
family: Bansal
- given: Colin
family: Cherry
- given: George
family: Foster
- given: Maxim
family: Krikun
- given: Melvin
family: Johnson
- given: Orhan
family: Firat
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10867-10878
id: garcia23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10867
lastpage: 10878
published: 2023-07-03 00:00:00 +0000
- title: 'RLSbench: Domain Adaptation Under Relaxed Label Shift'
abstract: 'Despite the emergence of principled methods for domain adaptation under label shift, their sensitivity to shifts in class conditional distributions is precariously under explored. Meanwhile, popular deep domain adaptation heuristics tend to falter when faced with label proportions shifts. While several papers modify these heuristics in attempts to handle label proportions shifts, inconsistencies in evaluation standards, datasets, and baselines make it difficult to gauge the current best practices. In this paper, we introduce RLSbench, a large-scale benchmark for *relaxed label shift*, consisting of $>$500 distribution shift pairs spanning vision, tabular, and language modalities, with varying label proportions. Unlike existing benchmarks, which primarily focus on shifts in class-conditional $p(x|y)$, our benchmark also focuses on label marginal shifts. First, we assess 13 popular domain adaptation methods, demonstrating more widespread failures under label proportion shifts than were previously known. Next, we develop an effective two-step meta-algorithm that is compatible with most domain adaptation heuristics: (i) *pseudo-balance* the data at each epoch; and (ii) adjust the final classifier with target label distribution estimate. The meta-algorithm improves existing domain adaptation heuristics under large label proportion shifts, often by 2–10% accuracy points, while conferring minimal effect ($<$0.5%) when label proportions do not shift. We hope that these findings and the availability of RLSbench will encourage researchers to rigorously evaluate proposed methods in relaxed label shift settings. Code is publicly available at https://github.com/acmi-lab/RLSbench.'
volume: 202
URL: https://proceedings.mlr.press/v202/garg23a.html
PDF: https://proceedings.mlr.press/v202/garg23a/garg23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-garg23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Saurabh
family: Garg
- given: Nick
family: Erickson
- given: James
family: Sharpnack
- given: Alex
family: Smola
- given: Sivaraman
family: Balakrishnan
- given: Zachary Chase
family: Lipton
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10879-10928
id: garg23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10879
lastpage: 10928
published: 2023-07-03 00:00:00 +0000
- title: 'RankMe: Assessing the Downstream Performance of Pretrained Self-Supervised Representations by Their Rank'
abstract: 'Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL’s core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non informative loss values to that, it becomes difficult to deploy SSL on a new dataset for which no labels can help to judge the quality of the learned representation. In this study, we develop a simple unsupervised criterion that is indicative of the quality of the learned JE-SSL representations: their effective rank. Albeit simple and computationally friendly, this method —coined RankMe— allows one to assess the performance of JE-SSL representations, even on different downstream datasets, without requiring any labels. A further benefit of RankMe is that it does not have any training or hyper-parameters to tune. Through thorough empirical experiments involving hundreds of training episodes, we demonstrate how RankMe can be used for hyperparameter selection with nearly no reduction in final performance compared to the current selection method that involve a dataset’s labels. We hope that RankMe will facilitate the deployment of JE-SSL towards domains that do not have the opportunity to rely on labels for representations’ quality assessment.'
volume: 202
URL: https://proceedings.mlr.press/v202/garrido23a.html
PDF: https://proceedings.mlr.press/v202/garrido23a/garrido23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-garrido23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Quentin
family: Garrido
- given: Randall
family: Balestriero
- given: Laurent
family: Najman
- given: Yann
family: Lecun
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10929-10974
id: garrido23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10929
lastpage: 10974
published: 2023-07-03 00:00:00 +0000
- title: 'Self-supervised learning of Split Invariant Equivariant representations'
abstract: 'Recent progress has been made towards learning invariant or equivariant representations with self-supervised learning. While invariant methods are evaluated on large scale datasets, equivariant ones are evaluated in smaller, more controlled, settings. We aim at bridging the gap between the two in order to learn more diverse representations that are suitable for a wide range of tasks. We start by introducing a dataset called 3DIEBench, consisting of renderings from 3D models over 55 classes and more than 2.5 million images where we have full control on the transformations applied to the objects. We further introduce a predictor architecture based on hypernetworks to learn equivariant representations with no possible collapse to invariance. We introduce SIE (Split Invariant-Equivariant) which combines the hypernetwork-based predictor with representations split in two parts, one invariant, the other equivariant, to learn richer representations. We demonstrate significant performance gains over existing methods on equivariance related tasks from both a qualitative and quantitative point of view. We further analyze our introduced predictor and show how it steers the learned latent space. We hope that both our introduced dataset and approach will enable learning richer representations without supervision in more complex scenarios. Code and data are available at https://github.com/garridoq/SIE.'
volume: 202
URL: https://proceedings.mlr.press/v202/garrido23b.html
PDF: https://proceedings.mlr.press/v202/garrido23b/garrido23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-garrido23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Quentin
family: Garrido
- given: Laurent
family: Najman
- given: Yann
family: Lecun
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10975-10996
id: garrido23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10975
lastpage: 10996
published: 2023-07-03 00:00:00 +0000
- title: 'Federated Heavy Hitter Recovery under Linear Sketching'
abstract: 'Motivated by real-life deployments of multi-round federated analytics with secure aggregation, we investigate the fundamental communication-accuracy tradeoffs of the heavy hitter discovery and approximate (open-domain) histogram problems under a linear sketching constraint. We propose efficient algorithms based on local subsampling and invertible bloom look-up tables (IBLTs). We also show that our algorithms are information-theoretically optimal for a broad class of interactive schemes. The results show that the linear sketching constraint does increase the communication cost for both tasks by introducing an extra linear dependence on the number of users in a round. Moreover, our results also establish a separation between the communication cost for heavy hitter discovery and approximate histogram in the multi-round setting. The dependence on the number of rounds $R$ is at most logarithmic for heavy hitter discovery whereas that of approximate histogram is $\Theta(\sqrt{R})$. We also empirically demonstrate our findings.'
volume: 202
URL: https://proceedings.mlr.press/v202/gascon23a.html
PDF: https://proceedings.mlr.press/v202/gascon23a/gascon23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gascon23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Adria
family: Gascon
- given: Peter
family: Kairouz
- given: Ziteng
family: Sun
- given: Ananda Theertha
family: Suresh
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 10997-11012
id: gascon23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 10997
lastpage: 11012
published: 2023-07-03 00:00:00 +0000
- title: 'On the Global Convergence of Fitted Q-Iteration with Two-layer Neural Network Parametrization'
abstract: 'Deep Q-learning based algorithms have been applied successfully in many decision making problems, while their theoretical foundations are not as well understood. In this paper, we study a Fitted Q-Iteration with two-layer ReLU neural network parameterization, and find the sample complexity guarantees for the algorithm. Our approach estimates the Q-function in each iteration using a convex optimization problem. We show that this approach achieves a sample complexity of $\tilde{\mathcal{O}}(1/\epsilon^{2})$, which is order-optimal. This result holds for a countable state-spaces and does not require any assumptions such as a linear or low rank structure on the MDP.'
volume: 202
URL: https://proceedings.mlr.press/v202/gaur23a.html
PDF: https://proceedings.mlr.press/v202/gaur23a/gaur23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gaur23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mudit
family: Gaur
- given: Vaneet
family: Aggarwal
- given: Mridul
family: Agarwal
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11013-11049
id: gaur23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11013
lastpage: 11049
published: 2023-07-03 00:00:00 +0000
- title: 'A Reinforcement Learning Framework for Dynamic Mediation Analysis'
abstract: 'Mediation analysis learns the causal effect transmitted via mediator variables between treatments and outcomes, and receives increasing attention in various scientific domains to elucidate causal relations. Most existing works focus on point-exposure studies where each subject only receives one treatment at a single time point. However, there are a number of applications (e.g., mobile health) where the treatments are sequentially assigned over time and the dynamic mediation effects are of primary interest. Proposing a reinforcement learning (RL) framework, we are the first to evaluate dynamic mediation effects in settings with infinite horizons. We decompose the average treatment effect into an immediate direct effect, an immediate mediation effect, a delayed direct effect, and a delayed mediation effect. Upon the identification of each effect component, we further develop robust and semi-parametrically efficient estimators under the RL framework to infer these causal effects. The superior performance of the proposed method is demonstrated through extensive numerical studies, theoretical results, and an analysis of a mobile health dataset. A Python implementation of the proposed procedure is available at https://github.com/linlinlin97/MediationRL.'
volume: 202
URL: https://proceedings.mlr.press/v202/ge23a.html
PDF: https://proceedings.mlr.press/v202/ge23a/ge23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ge23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Lin
family: Ge
- given: Jitao
family: Wang
- given: Chengchun
family: Shi
- given: Zhenke
family: Wu
- given: Rui
family: Song
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11050-11097
id: ge23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11050
lastpage: 11097
published: 2023-07-03 00:00:00 +0000
- title: 'Compositional Score Modeling for Simulation-Based Inference'
abstract: 'Neural Posterior Estimation methods for simulation-based inference can be ill-suited for dealing with posterior distributions obtained by conditioning on multiple observations, as they tend to require a large number of simulator calls to learn accurate approximations. In contrast, Neural Likelihood Estimation methods can handle multiple observations at inference time after learning from individual observations, but they rely on standard inference methods, such as MCMC or variational inference, which come with certain performance drawbacks. We introduce a new method based on conditional score modeling that enjoys the benefits of both approaches. We model the scores of the (diffused) posterior distributions induced by individual observations, and introduce a way of combining the learned scores to approximately sample from the target posterior distribution. Our approach is sample-efficient, can naturally aggregate multiple observations at inference time, and avoids the drawbacks of standard inference methods.'
volume: 202
URL: https://proceedings.mlr.press/v202/geffner23a.html
PDF: https://proceedings.mlr.press/v202/geffner23a/geffner23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-geffner23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tomas
family: Geffner
- given: George
family: Papamakarios
- given: Andriy
family: Mnih
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11098-11116
id: geffner23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11098
lastpage: 11116
published: 2023-07-03 00:00:00 +0000
- title: 'Cramming: Training a Language Model on a single GPU in one day.'
abstract: 'Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers and practitioners. While most in the community are asking how to push the limits of extreme computation, we ask the opposite question: How far can we get with a single GPU in just one day? We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting. We provide code to reproduce all experiments at github.com/JonasGeiping/cramming .'
volume: 202
URL: https://proceedings.mlr.press/v202/geiping23a.html
PDF: https://proceedings.mlr.press/v202/geiping23a/geiping23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-geiping23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jonas
family: Geiping
- given: Tom
family: Goldstein
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11117-11143
id: geiping23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11117
lastpage: 11143
published: 2023-07-03 00:00:00 +0000
- title: 'Transformers Meet Directed Graphs'
abstract: 'Transformers were originally proposed as a sequence-to-sequence model for text but have become vital for a wide range of modalities, including images, audio, video, and undirected graphs. However, transformers for directed graphs are a surprisingly underexplored topic, despite their applicability to ubiquitous domains, including source code and logic circuits. In this work, we propose two direction- and structure-aware positional encodings for directed graphs: (1) the eigenvectors of the Magnetic Laplacian — a direction-aware generalization of the combinatorial Laplacian; (2) directional random walk encodings. Empirically, we show that the extra directionality information is useful in various downstream tasks, including correctness testing of sorting networks and source code understanding. Together with a data-flow-centric graph construction, our model outperforms the prior state of the art on the Open Graph Benchmark Code2 relatively by 14.7%.'
volume: 202
URL: https://proceedings.mlr.press/v202/geisler23a.html
PDF: https://proceedings.mlr.press/v202/geisler23a/geisler23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-geisler23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Simon
family: Geisler
- given: Yujia
family: Li
- given: Daniel J
family: Mankowitz
- given: Ali Taylan
family: Cemgil
- given: Stephan
family: Günnemann
- given: Cosmin
family: Paduraru
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11144-11172
id: geisler23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11144
lastpage: 11172
published: 2023-07-03 00:00:00 +0000
- title: 'Memory-Based Meta-Learning on Non-Stationary Distributions'
abstract: 'Memory-based meta-learning is a technique for approximating Bayes-optimal predictors. Under fairly general conditions, minimizing sequential prediction error, measured by the log loss, leads to implicit meta-learning. The goal of this work is to investigate how far this interpretation can be realized by current sequence prediction models and training regimes. The focus is on piecewise stationary sources with unobserved switching-points, which arguably capture an important characteristic of natural language and action-observation sequences in partially observable environments. We show that various types of memory-based neural models, including Transformers, LSTMs, and RNNs can learn to accurately approximate known Bayes-optimal algorithms and behave as if performing Bayesian inference over the latent switching-points and the latent parameters governing the data distribution within each segment.'
volume: 202
URL: https://proceedings.mlr.press/v202/genewein23a.html
PDF: https://proceedings.mlr.press/v202/genewein23a/genewein23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-genewein23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Tim
family: Genewein
- given: Gregoire
family: Deletang
- given: Anian
family: Ruoss
- given: Li Kevin
family: Wenliang
- given: Elliot
family: Catt
- given: Vincent
family: Dutordoir
- given: Jordi
family: Grau-Moya
- given: Laurent
family: Orseau
- given: Marcus
family: Hutter
- given: Joel
family: Veness
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11173-11195
id: genewein23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11173
lastpage: 11195
published: 2023-07-03 00:00:00 +0000
- title: 'Towards Reliable Neural Specifications'
abstract: 'Having reliable specifications is an unavoidable challenge in achieving verifiable correctness, robustness, and interpretability of AI systems. Existing specifications for neural networks are in the paradigm of data as specification. That is, the local neighborhood centering around a reference input is considered to be correct (or robust). While existing specifications contribute to verifying adversarial robustness, a significant problem in many research domains, our empirical study shows that those verified regions are somewhat tight, and thus fail to allow verification of test set inputs, making them impractical for some real-world applications. To this end, we propose a new family of specifications called neural representation as specification. This form of specifications uses the intrinsic information of neural networks, specifically neural activation patterns (NAPs), rather than input data to specify the correctness and/or robustness of neural network predictions. We present a simple statistical approach to mining neural activation patterns. To show the effectiveness of discovered NAPs, we formally verify several important properties, such as various types of misclassifications will never happen for a given NAP, and there is no ambiguity between different NAPs. We show that by using NAP, we can verify a significant region of the input space, while still recalling 84% of the data on MNIST. Moreover, we can push the verifiable bound to 10 times larger on the CIFAR10 benchmark. Thus, we argue that NAPs can potentially be used as a more reliable and extensible specification for neural network verification.'
volume: 202
URL: https://proceedings.mlr.press/v202/geng23a.html
PDF: https://proceedings.mlr.press/v202/geng23a/geng23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-geng23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chuqin
family: Geng
- given: Nham
family: Le
- given: Xiaojie
family: Xu
- given: Zhaoyue
family: Wang
- given: Arie
family: Gurfinkel
- given: Xujie
family: Si
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11196-11212
id: geng23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11196
lastpage: 11212
published: 2023-07-03 00:00:00 +0000
- title: 'Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning'
abstract: 'Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework.'
volume: 202
URL: https://proceedings.mlr.press/v202/gerstgrasser23a.html
PDF: https://proceedings.mlr.press/v202/gerstgrasser23a/gerstgrasser23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gerstgrasser23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Matthias
family: Gerstgrasser
- given: David C.
family: Parkes
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11213-11236
id: gerstgrasser23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11213
lastpage: 11236
published: 2023-07-03 00:00:00 +0000
- title: 'Approximately Optimal Core Shapes for Tensor Decompositions'
abstract: 'This work studies the combinatorial optimization problem of finding an optimal core tensor shape, also called multilinear rank, for a size-constrained Tucker decomposition. We give an algorithm with provable approximation guarantees for its reconstruction error via connections to higher-order singular values. Specifically, we introduce a novel Tucker packing problem, which we prove is NP-hard, and give a polynomial-time approximation scheme based on a reduction to the 2-dimensional knapsack problem with a matroid constraint. We also generalize our techniques to tree tensor network decompositions. We implement our algorithm using an integer programming solver, and show that its solution quality is competitive with (and sometimes better than) the greedy algorithm that uses the true Tucker decomposition loss at each step, while also running up to 1000x faster.'
volume: 202
URL: https://proceedings.mlr.press/v202/ghadiri23a.html
PDF: https://proceedings.mlr.press/v202/ghadiri23a/ghadiri23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ghadiri23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Mehrdad
family: Ghadiri
- given: Matthew
family: Fahrbach
- given: Gang
family: Fu
- given: Vahab
family: Mirrokni
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11237-11254
id: ghadiri23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11237
lastpage: 11254
published: 2023-07-03 00:00:00 +0000
- title: 'GAT: Guided Adversarial Training with Pareto-optimal Auxiliary Tasks'
abstract: 'While leveraging additional training data is well established to improve adversarial robustness, it incurs the unavoidable cost of data collection and the heavy computation to train models. To mitigate the costs, we propose *Guided Adversarial Training * (GAT), a novel adversarial training technique that exploits auxiliary tasks under a limited set of training data. Our approach extends single-task models into multi-task models during the min-max optimization of adversarial training, and drives the loss optimization with a regularization of the gradient curvature across multiple tasks. GAT leverages two types of auxiliary tasks: self-supervised tasks, where the labels are generated automatically, and domain-knowledge tasks, where human experts provide additional labels. Experimentally, under limited data, GAT increases the robust accuracy on CIFAR-10 up to four times (from 11% to 42% robust accuracy) and the robust AUC of CheXpert medical imaging dataset from 50% to 83%. On the full CIFAR-10 dataset, GAT outperforms eight state-of-the-art adversarial training strategies. Our large study across five datasets and six tasks demonstrates that task augmentation is an efficient alternative to data augmentation, and can be key to achieving both clean and robust performances.'
volume: 202
URL: https://proceedings.mlr.press/v202/ghamizi23a.html
PDF: https://proceedings.mlr.press/v202/ghamizi23a/ghamizi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ghamizi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Salah
family: Ghamizi
- given: Jingfeng
family: Zhang
- given: Maxime
family: Cordy
- given: Mike
family: Papadakis
- given: Masashi
family: Sugiyama
- given: Yves
family: Le Traon
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11255-11282
id: ghamizi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11255
lastpage: 11282
published: 2023-07-03 00:00:00 +0000
- title: 'On User-Level Private Convex Optimization'
abstract: 'We introduce a new mechanism for stochastic convex optimization (SCO) with user-level differential privacy guarantees. The convergence rates of this mechanism are similar to those in the prior work of Levy et al. 2021 and Narayanan et al. 2022, but with two important improvements. Our mechanism does not require any smoothness assumptions on the loss. Furthermore, our bounds are also the first where the minimum number of users needed for user-level privacy has no dependence on the dimension and only a logarithmic dependence on the desired excess error. The main idea underlying the new mechanism is to show that the optimizers of strongly convex losses have low local deletion sensitivity, along with a new output perturbation method for functions with low local deletion sensitivity, which could be of independent interest.'
volume: 202
URL: https://proceedings.mlr.press/v202/ghazi23a.html
PDF: https://proceedings.mlr.press/v202/ghazi23a/ghazi23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ghazi23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Badih
family: Ghazi
- given: Pritish
family: Kamath
- given: Ravi
family: Kumar
- given: Pasin
family: Manurangsi
- given: Raghu
family: Meka
- given: Chiyuan
family: Zhang
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11283-11299
id: ghazi23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11283
lastpage: 11299
published: 2023-07-03 00:00:00 +0000
- title: 'Contextual Reliability: When Different Features Matter in Different Contexts'
abstract: 'Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars—we don’t want our car to run a red light if a neighboring car does so. However, we cannot simply enforce invariance to next-lane speed, since it could provide valuable information about an unobservable pedestrian at a crosswalk. Thus, universally ignoring features that are sometimes (but not always) reliable can lead to non-robust performance. We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context. We propose and analyze a two-stage framework called Explicit Non-spurious feature Prediction (ENP) which first identifies the relevant features to use for a given context, then trains a model to rely exclusively on these features. Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability.'
volume: 202
URL: https://proceedings.mlr.press/v202/ghosal23a.html
PDF: https://proceedings.mlr.press/v202/ghosal23a/ghosal23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ghosal23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Gaurav Rohit
family: Ghosal
- given: Amrith
family: Setlur
- given: Daniel S.
family: Brown
- given: Anca
family: Dragan
- given: Aditi
family: Raghunathan
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11300-11320
id: ghosal23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11300
lastpage: 11320
published: 2023-07-03 00:00:00 +0000
- title: 'Reinforcement Learning from Passive Data via Latent Intentions'
abstract: 'Passive observational data, such as human videos, is abundant and rich in information, yet remains largely untapped by current RL methods. Perhaps surprisingly, we show that passive data, despite not having reward or action labels, can still be used to learn features that accelerate downstream RL. Our approach learns from passive data by modeling intentions: measuring how the likelihood of future outcomes change when the agent acts to achieve a particular task. We propose a temporal difference learning objective to learn about intentions, resulting in an algorithm similar to conventional RL, but which learns entirely from passive data. When optimizing this objective, our agent simultaneously learns representations of states, of policies, and of possible outcomes in an environment, all from raw observational data. Both theoretically and empirically, this scheme learns features amenable for value prediction for downstream tasks, and our experiments demonstrate the ability to learn from many forms of passive data, including cross-embodiment video data and YouTube videos.'
volume: 202
URL: https://proceedings.mlr.press/v202/ghosh23a.html
PDF: https://proceedings.mlr.press/v202/ghosh23a/ghosh23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ghosh23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dibya
family: Ghosh
- given: Chethan Anand
family: Bhateja
- given: Sergey
family: Levine
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11321-11339
id: ghosh23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11321
lastpage: 11339
published: 2023-07-03 00:00:00 +0000
- title: 'Harmonic Neural Networks'
abstract: 'Harmonic functions are abundant in nature, appearing in limiting cases of Maxwell’s, Navier-Stokes equations, the heat and the wave equation. Consequently, there are many applications of harmonic functions from industrial process optimisation to robotic path planning and the calculation of first exit times of random walks. Despite their ubiquity and relevance, there have been few attempts to incorporate inductive biases towards harmonic functions in machine learning contexts. In this work, we demonstrate effective means of representing harmonic functions in neural networks and extend such results also to quantum neural networks to demonstrate the generality of our approach. We benchmark our approaches against (quantum) physics-informed neural networks, where we show favourable performance.'
volume: 202
URL: https://proceedings.mlr.press/v202/ghosh23b.html
PDF: https://proceedings.mlr.press/v202/ghosh23b/ghosh23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ghosh23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Atiyo
family: Ghosh
- given: Antonio Andrea
family: Gentile
- given: Mario
family: Dagrada
- given: Chul
family: Lee
- given: Seong-Hyok Sean
family: Kim
- given: Hyukgeun
family: Cha
- given: Yunjun
family: Choi
- given: Dongho
family: Kim
- given: Jeong-Il
family: Kye
- given: Vincent Emanuel
family: Elfving
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11340-11359
id: ghosh23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11340
lastpage: 11359
published: 2023-07-03 00:00:00 +0000
- title: 'Dividing and Conquering a BlackBox to a Mixture of Interpretable Models: Route, Interpret, Repeat'
abstract: 'ML model design either starts with an interpretable model or a Blackbox and explains it post hoc. Blackbox models are flexible but difficult to explain, while interpretable models are inherently explainable. Yet, interpretable models require extensive ML knowledge and tend to be less flexible, potentially underperforming than their Blackbox equivalents. This paper aims to blur the distinction between a post hoc explanation of a Blackbox and constructing interpretable models. Beginning with a Blackbox, we iteratively *carve out* a mixture of interpretable models and a *residual network*. The interpretable models identify a subset of samples and explain them using First Order Logic (FOL), providing basic reasoning on concepts from the Blackbox. We route the remaining samples through a flexible residual. We repeat the method on the residual network until all the interpretable models explain the desired proportion of data. Our extensive experiments show that our *route, interpret, and repeat* approach (1) identifies a richer diverse set of instance-specific concepts with high concept completeness via interpretable models by specializing in various subsets of data without compromising in performance, (2) identifies the relatively “harder” samples to explain via residuals, (3) outperforms the interpretable by-design models by significant margins during test-time interventions, (4) can be used to fix the shortcut learned by the original Blackbox.'
volume: 202
URL: https://proceedings.mlr.press/v202/ghosh23c.html
PDF: https://proceedings.mlr.press/v202/ghosh23c/ghosh23c.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-ghosh23c.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shantanu
family: Ghosh
- given: Ke
family: Yu
- given: Forough
family: Arabshahi
- given: Kayhan
family: Batmanghelich
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11360-11397
id: ghosh23c
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11360
lastpage: 11397
published: 2023-07-03 00:00:00 +0000
- title: 'Looped Transformers as Programmable Computers'
abstract: 'We present a framework for using transformer networks as universal computers by programming them with specific weights and placing them in a loop. Our input sequence acts as a punchcard, consisting of instructions and memory for data read/writes. We demonstrate that a constant number of encoder layers can emulate basic computing blocks, including lexicographic operations, non-linear functions, function calls, program counters, and conditional branches. Using this framework, we emulate a computer using a simple instruction-set architecture, which allows us to map iterative algorithms to programs that can be executed by a constant depth looped transformer network. We show how a single frozen transformer, instructed by its input, can emulate a basic calculator, a basic linear algebra library, and even a full backpropagation, in-context learning algorithm. Our findings reveal the potential of transformer networks as programmable compute units and offer insight into the mechanics of attention.'
volume: 202
URL: https://proceedings.mlr.press/v202/giannou23a.html
PDF: https://proceedings.mlr.press/v202/giannou23a/giannou23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-giannou23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Angeliki
family: Giannou
- given: Shashank
family: Rajput
- given: Jy-Yong
family: Sohn
- given: Kangwook
family: Lee
- given: Jason D.
family: Lee
- given: Dimitris
family: Papailiopoulos
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11398-11442
id: giannou23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11398
lastpage: 11442
published: 2023-07-03 00:00:00 +0000
- title: 'Generalized Disparate Impact for Configurable Fairness Solutions in ML'
abstract: 'We make two contributions in the field of AI fairness over continuous protected attributes. First, we show that the Hirschfeld-Gebelein-Renyi (HGR) indicator (the only one currently available for such a case) is valuable but subject to a few crucial limitations regarding semantics, interpretability, and robustness. Second, we introduce a family of indicators that are: 1) complementary to HGR in terms of semantics; 2) fully interpretable and transparent; 3) robust over finite samples; 4) configurable to suit specific applications. Our approach also allows us to define fine-grained constraints to permit certain types of dependence and forbid others selectively. By expanding the available options for continuous protected attributes, our approach represents a significant contribution to the area of fair artificial intelligence.'
volume: 202
URL: https://proceedings.mlr.press/v202/giuliani23a.html
PDF: https://proceedings.mlr.press/v202/giuliani23a/giuliani23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-giuliani23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Luca
family: Giuliani
- given: Eleonora
family: Misino
- given: Michele
family: Lombardi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11443-11458
id: giuliani23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11443
lastpage: 11458
published: 2023-07-03 00:00:00 +0000
- title: 'Multicalibration as Boosting for Regression'
abstract: 'We study the connection between multicalibration and boosting for squared error regression. First we prove a useful characterization of multicalibration in terms of a “swap regret” like condition on squared error. Using this characterization, we give an exceedingly simple algorithm that can be analyzed both as a boosting algorithm for regression and as a multicalibration algorithm for a class $\mathcal{H}$ that makes use only of a standard squared error regression oracle for $\mathcal{H}$. We give a weak learning assumption on $\mathcal{H}$ that ensures convergence to Bayes optimality without the need to make any realizability assumptions — giving us an agnostic boosting algorithm for regression. We then show that our weak learning assumption on $\mathcal{H}$ is both necessary and sufficient for multicalibration with respect to $\mathcal{H}$ to imply Bayes optimality, answering an open question. We also show that if $\mathcal{H}$ satisfies our weak learning condition relative to another class $\mathcal{C}$ then multicalibration with respect to $\mathcal{H}$ implies multicalibration with respect to $\mathcal{C}$. Finally we investigate the empirical performance of our algorithm experimentally.'
volume: 202
URL: https://proceedings.mlr.press/v202/globus-harris23a.html
PDF: https://proceedings.mlr.press/v202/globus-harris23a/globus-harris23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-globus-harris23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Ira
family: Globus-Harris
- given: Declan
family: Harrison
- given: Michael
family: Kearns
- given: Aaron
family: Roth
- given: Jessica
family: Sorrell
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11459-11492
id: globus-harris23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11459
lastpage: 11492
published: 2023-07-03 00:00:00 +0000
- title: 'Adversarial robustness of amortized Bayesian inference'
abstract: 'Bayesian inference usually requires running potentially costly inference procedures separately for every new observation. In contrast, the idea of amortized Bayesian inference is to initially invest computational cost in training an inference network on simulated data, which can subsequently be used to rapidly perform inference (i.e., to return estimates of posterior distributions) for new observations. This approach has been applied to many real-world models in the sciences and engineering, but it is unclear how robust the approach is to adversarial perturbations in the observed data. Here, we study the adversarial robustness of amortized Bayesian inference, focusing on simulation-based estimation of multi-dimensional posterior distributions. We show that almost unrecognizable, targeted perturbations of the observations can lead to drastic changes in the predicted posterior and highly unrealistic posterior predictive samples, across several benchmark tasks and a real-world example from neuroscience. We propose a computationally efficient regularization scheme based on penalizing the Fisher information of the conditional density estimator, and show how it improves the adversarial robustness of amortized Bayesian inference.'
volume: 202
URL: https://proceedings.mlr.press/v202/gloeckler23a.html
PDF: https://proceedings.mlr.press/v202/gloeckler23a/gloeckler23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gloeckler23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Manuel
family: Gloeckler
- given: Michael
family: Deistler
- given: Jakob H.
family: Macke
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11493-11524
id: gloeckler23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11493
lastpage: 11524
published: 2023-07-03 00:00:00 +0000
- title: 'Efficient RL via Disentangled Environment and Agent Representations'
abstract: 'Agents that are aware of the separation between the environments and themselves can leverage this understanding to form effective representations of visual input. We propose an approach for learning such structured representations for RL algorithms, using visual knowledge of the agent, which is often inexpensive to obtain, such as its shape or mask. This is incorporated into the RL objective using a simple auxiliary loss. We show that our method, SEAR (Structured Environment-Agent Representations), outperforms state-of-the-art model-free approaches over 18 different challenging visual simulation environments spanning 5 different robots.'
volume: 202
URL: https://proceedings.mlr.press/v202/gmelin23a.html
PDF: https://proceedings.mlr.press/v202/gmelin23a/gmelin23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gmelin23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kevin
family: Gmelin
- given: Shikhar
family: Bahl
- given: Russell
family: Mendonca
- given: Deepak
family: Pathak
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11525-11545
id: gmelin23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11525
lastpage: 11545
published: 2023-07-03 00:00:00 +0000
- title: 'Aligning Language Models with Preferences through $f$-divergence Minimization'
abstract: 'Aligning language models with preferences can be posed as approximating a target distribution representing some desired behavior. Existing approaches differ both in the functional form of the target distribution and the algorithm used to approximate it. For instance, Reinforcement Learning from Human Feedback (RLHF) corresponds to minimizing a reverse KL from an implicit target distribution arising from a KL penalty in the objective. On the other hand, Generative Distributional Control (GDC) has an explicit target distribution and minimizes a forward KL from it using the Distributional Policy Gradient (DPG) algorithm. In this paper, we propose a new approach, $f$-DPG, which allows the use of any $f$-divergence to approximate any target distribution that can be evaluated. $f$-DPG unifies both frameworks (RLHF, GDC) and the approximation methods (DPG, RL with KL penalties). We show the practical benefits of various choices of divergence objectives and demonstrate that there is no universally optimal objective but that different divergences present different alignment and diversity trade-offs. We show that Jensen-Shannon divergence strikes a good balance between these objectives, and frequently outperforms forward KL divergence by a wide margin, leading to significant improvements over prior work. These distinguishing characteristics between divergences persist as the model size increases, highlighting the importance of selecting appropriate divergence objectives.'
volume: 202
URL: https://proceedings.mlr.press/v202/go23a.html
PDF: https://proceedings.mlr.press/v202/go23a/go23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-go23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Dongyoung
family: Go
- given: Tomasz
family: Korbak
- given: Germàn
family: Kruszewski
- given: Jos
family: Rozen
- given: Nahyeon
family: Ryu
- given: Marc
family: Dymetman
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11546-11583
id: go23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11546
lastpage: 11583
published: 2023-07-03 00:00:00 +0000
- title: 'Robust Consensus in Ranking Data Analysis: Definitions, Properties and Computational Issues'
abstract: 'As the issue of robustness in AI systems becomes vital, statistical learning techniques that are reliable even in presence of partly contaminated data have to be developed. Preference data, in the form of (complete) rankings in the simplest situations, are no exception and the demand for appropriate concepts and tools is all the more pressing given that technologies fed by or producing this type of data ($\textit{e.g.}$ search engines, recommending systems) are now massively deployed. However, the lack of vector space structure for the set of rankings ($\textit{i.e.}$ the symmetric group $\mathfrak{S}_n$) and the complex nature of statistics considered in ranking data analysis make the formulation of robustness objectives in this domain challenging. In this paper, we introduce notions of robustness, together with dedicated statistical methods, for $\textit{Consensus Ranking}$ the flagship problem in ranking data analysis, aiming at summarizing a probability distribution on $\mathfrak{S}_n$ by a $\textit{median}$ ranking. Precisely, we propose specific extensions of the popular concept of *breakdown point*, tailored to consensus ranking, and address the related computational issues. Beyond the theoretical contributions, the relevance of the approach proposed is supported by an experimental study.'
volume: 202
URL: https://proceedings.mlr.press/v202/goibert23a.html
PDF: https://proceedings.mlr.press/v202/goibert23a/goibert23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-goibert23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Morgane
family: Goibert
- given: Clément
family: Calauzènes
- given: Ekhine
family: Irurozki
- given: Stephan
family: Clémençon
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11584-11597
id: goibert23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11584
lastpage: 11597
published: 2023-07-03 00:00:00 +0000
- title: 'Learning Distributions over Quantum Measurement Outcomes'
abstract: 'Shadow tomography for quantum states provides a sample efficient approach for predicting the measurement outcomes of quantum systems. However, these shadow tomography procedures yield poor bounds if there are more than two outcomes per measurement. In this paper, we consider a general problem of learning properties from quantum states: given an unknown $d$-dimensional quantum state $\rho$ and $M$ unknown quantum measurements $\mathcal{M}_1,...,\mathcal{M}_M$ with $K\geq 2$ outcomes, estimating the probability distribution for applying $\mathcal{M}_i$ on $\rho$ to within total variation distance $\epsilon$. Compared to the special case when $K=2$, we have to learn unknown distributions instead of values. Here, we propose an online shadow tomography procedure that solves this problem with high success probability requiring $\tilde{O}(K\log^2M\log d/\epsilon^4)$ copies of $\rho$. We further prove an information-theoretic lower bound showing that at least $\Omega(\min\{d^2,K+\log M\}/\epsilon^2)$ copies of $\rho$ are required to solve this problem with high success probability. Our shadow tomography procedure requires sample complexity with only logarithmic dependence on $M$ and $d$ and is sample-optimal concerning the dependence on $K$.'
volume: 202
URL: https://proceedings.mlr.press/v202/gong23a.html
PDF: https://proceedings.mlr.press/v202/gong23a/gong23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gong23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Weiyuan
family: Gong
- given: Scott
family: Aaronson
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11598-11613
id: gong23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11598
lastpage: 11613
published: 2023-07-03 00:00:00 +0000
- title: 'Convergence of Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity'
abstract: 'Algorithms for min-max optimization and variational inequalities are often studied under monotonicity assumptions. Motivated by non-monotone machine learning applications, we follow the line of works (Diakonikolas et al., 2021; Lee & Kim, 2021; Pethick et al., 2022; Bohm,2022) aiming at going beyond monotonicity by considering the weaker *negative comonotonicity* assumption. In this work, we provide tight complexity analyses for the Proximal Point (PP), Extragradient (EG), and Optimistic Gradient (OG) methods in this setup, closing several questions on their working guarantees beyond monotonicity. In particular, we derive the first non-asymptotic convergence rates for PP under negative comonotonicity and star-negative comonotonicity and show their tightness via constructing worst-case examples; we also relax the assumptions for the last-iterate convergence guarantees for EG and OG and prove the tightness of the existing best-iterate guarantees for EG and OG via constructing counter-examples.'
volume: 202
URL: https://proceedings.mlr.press/v202/gorbunov23a.html
PDF: https://proceedings.mlr.press/v202/gorbunov23a/gorbunov23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gorbunov23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Eduard
family: Gorbunov
- given: Adrien
family: Taylor
- given: Samuel
family: Horváth
- given: Gauthier
family: Gidel
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11614-11641
id: gorbunov23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11614
lastpage: 11641
published: 2023-07-03 00:00:00 +0000
- title: 'Adaptive Annealed Importance Sampling with Constant Rate Progress'
abstract: 'Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution given its unnormalized density function. This algorithm relies on a sequence of interpolating distributions bridging the target to an initial tractable distribution such as the well-known geometric mean path of unnormalized distributions which is assumed to be suboptimal in general. In this paper, we prove that the geometric annealing corresponds to the distribution path that minimizes the KL divergence between the current particle distribution and the desired target when the feasible change in the particle distribution is constrained. Following this observation, we derive the constant rate discretization schedule for this annealing sequence, which adjusts the schedule to the difficulty of moving samples between the initial and the target distributions. We further extend our results to $f$-divergences and present the respective dynamics of annealing sequences based on which we propose the Constant Rate AIS (CR-AIS) algorithm and its efficient implementation for $\alpha$-divergences. We empirically show that CR-AIS performs well on multiple benchmark distributions while avoiding the computationally expensive tuning loop in existing Adaptive AIS.'
volume: 202
URL: https://proceedings.mlr.press/v202/goshtasbpour23a.html
PDF: https://proceedings.mlr.press/v202/goshtasbpour23a/goshtasbpour23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-goshtasbpour23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Shirin
family: Goshtasbpour
- given: Victor
family: Cohen
- given: Fernando
family: Perez-Cruz
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11642-11658
id: goshtasbpour23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11642
lastpage: 11658
published: 2023-07-03 00:00:00 +0000
- title: 'Formalizing Preferences Over Runtime Distributions'
abstract: 'When trying to solve a computational problem, we are often faced with a choice between algorithms that are guaranteed to return the right answer but differ in their runtime distributions (e.g., SAT solvers, sorting algorithms). This paper aims to lay theoretical foundations for such choices by formalizing preferences over runtime distributions. It might seem that we should simply prefer the algorithm that minimizes expected runtime. However, such preferences would be driven by exactly how slow our algorithm is on bad inputs, whereas in practice we are typically willing to cut off occasional, sufficiently long runs before they finish. We propose a principled alternative, taking a utility-theoretic approach to characterize the scoring functions that describe preferences over algorithms. These functions depend on the way our value for solving our problem decreases with time and on the distribution from which captimes are drawn. We describe examples of realistic utility functions and show how to leverage a maximum-entropy approach for modeling underspecified captime distributions. Finally, we show how to efficiently estimate an algorithm’s expected utility from runtime samples.'
volume: 202
URL: https://proceedings.mlr.press/v202/graham23a.html
PDF: https://proceedings.mlr.press/v202/graham23a/graham23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-graham23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Devon R.
family: Graham
- given: Kevin
family: Leyton-Brown
- given: Tim
family: Roughgarden
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11659-11682
id: graham23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11659
lastpage: 11682
published: 2023-07-03 00:00:00 +0000
- title: 'Topological Point Cloud Clustering'
abstract: 'We present Topological Point Cloud Clustering (TPCC), a new method to cluster points in an arbitrary point cloud based on their contribution to global topological features. TPCC synthesizes desirable features from spectral clustering and topological data analysis and is based on considering the spectral properties of a simplicial complex associated to the considered point cloud. As it is based on considering sparse eigenvector computations, TPCC is similarly easy to interpret and implement as spectral clustering. However, by focusing not just on a single matrix associated to a graph created from the point cloud data, but on a whole set of Hodge-Laplacians associated to an appropriately constructed simplicial complex, we can leverage a far richer set of topological features to characterize the data points within the point cloud and benefit from the relative robustness of topological techniques against noise. We test the performance of TPCC on both synthetic and real-world data and compare it with classical spectral clustering.'
volume: 202
URL: https://proceedings.mlr.press/v202/grande23a.html
PDF: https://proceedings.mlr.press/v202/grande23a/grande23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-grande23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Vincent Peter
family: Grande
- given: Michael T
family: Schaub
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11683-11697
id: grande23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11683
lastpage: 11697
published: 2023-07-03 00:00:00 +0000
- title: 'On Sampling with Approximate Transport Maps'
abstract: 'Transport maps can ease the sampling of distributions with non-trivial geometries by transforming them into distributions that are easier to handle. The potential of this approach has risen with the development of Normalizing Flows (NF) which are maps parameterized with deep neural networks trained to push a reference distribution towards a target. NF-enhanced samplers recently proposed blend (Markov chain) Monte Carlo methods with either (i) proposal draws from the flow or (ii) a flow-based reparametrization. In both cases, the quality of the learned transport conditions performance. The present work clarifies for the first time the relative strengths and weaknesses of these two approaches. Our study concludes that multimodal targets can be reliably handled with flow-based proposals up to moderately high dimensions. In contrast, methods relying on reparametrization struggle with multimodality but are more robust otherwise in high-dimensional settings and under poor training. To further illustrate the influence of target-proposal adequacy, we also derive a new quantitative bound for the mixing time of the Independent Metropolis-Hastings sampler.'
volume: 202
URL: https://proceedings.mlr.press/v202/grenioux23a.html
PDF: https://proceedings.mlr.press/v202/grenioux23a/grenioux23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-grenioux23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Louis
family: Grenioux
- given: Alain
family: Oliviero Durmus
- given: Eric
family: Moulines
- given: Marylou
family: Gabrié
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11698-11733
id: grenioux23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11698
lastpage: 11733
published: 2023-07-03 00:00:00 +0000
- title: 'Hidden Symmetries of ReLU Networks'
abstract: 'The parameter space for any fixed architecture of feedforward ReLU neural networks serves as a proxy during training for the associated class of functions - but how faithful is this representation? It is known that many different parameter settings $\theta$ can determine the same function $f$. Moreover, the degree of this redundancy is inhomogeneous: for some networks, the only symmetries are permutation of neurons in a layer and positive scaling of parameters at a neuron, while other networks admit additional hidden symmetries. In this work, we prove that, for any network architecture where no layer is narrower than the input, there exist parameter settings with no hidden symmetries. We also describe a number of mechanisms through which hidden symmetries can arise, and empirically approximate the functional dimension of different network architectures at initialization. These experiments indicate that the probability that a network has no hidden symmetries decreases towards 0 as depth increases, while increasing towards 1 as width and input dimension increase.'
volume: 202
URL: https://proceedings.mlr.press/v202/grigsby23a.html
PDF: https://proceedings.mlr.press/v202/grigsby23a/grigsby23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-grigsby23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Elisenda
family: Grigsby
- given: Kathryn
family: Lindsey
- given: David
family: Rolnick
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11734-11760
id: grigsby23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11734
lastpage: 11760
published: 2023-07-03 00:00:00 +0000
- title: 'EF21-P and Friends: Improved Theoretical Communication Complexity for Distributed Optimization with Bidirectional Compression'
abstract: 'In this work we focus our attention on distributed optimization problems in the context where the communication time between the server and the workers is non-negligible. We obtain novel methods supporting bidirectional compression (both from the server to the workers and vice versa) that enjoy new state-of-the-art theoretical communication complexity for convex and nonconvex problems. Our bounds are the first that manage to decouple the variance/error coming from the workers-to-server and server-to-workers compression, transforming a multiplicative dependence to an additive one. Moreover, in the convex regime, we obtain the first bounds that match the theoretical communication complexity of gradient descent. Even in this convex regime, our algorithms work with biased gradient estimators, which is non-standard and requires new proof techniques that may be of independent interest. Finally, our theoretical results are corroborated through suitable experiments.'
volume: 202
URL: https://proceedings.mlr.press/v202/gruntkowska23a.html
PDF: https://proceedings.mlr.press/v202/gruntkowska23a/gruntkowska23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gruntkowska23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Kaja
family: Gruntkowska
- given: Alexander
family: Tyurin
- given: Peter
family: Richtárik
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11761-11807
id: gruntkowska23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11761
lastpage: 11807
published: 2023-07-03 00:00:00 +0000
- title: 'NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from 3D-aware Diffusion'
abstract: 'Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input. Existing approaches condition neural radiance fields (NeRF) on local image features, projecting points to the input image plane, and aggregating 2D features to perform volume rendering. However, under severe occlusion, this projection fails to resolve uncertainty, resulting in blurry renderings that lack details. In this work, we propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test-time. We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views. Our approach significantly outperforms existing NeRF-based and geometry-free approaches on challenging datasets including ShapeNet, ABO, and Clevr3D.'
volume: 202
URL: https://proceedings.mlr.press/v202/gu23a.html
PDF: https://proceedings.mlr.press/v202/gu23a/gu23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-gu23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiatao
family: Gu
- given: Alex
family: Trevithick
- given: Kai-En
family: Lin
- given: Joshua M.
family: Susskind
- given: Christian
family: Theobalt
- given: Lingjie
family: Liu
- given: Ravi
family: Ramamoorthi
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11808-11826
id: gu23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11808
lastpage: 11826
published: 2023-07-03 00:00:00 +0000
- title: 'DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design'
abstract: 'Designing 3D ligands within a target binding site is a fundamental task in drug discovery. Existing structured-based drug design methods treat all ligand atoms equally, which ignores different roles of atoms in the ligand for drug design and can be less efficient for exploring the large drug-like molecule space. In this paper, inspired by the convention in pharmaceutical practice, we decompose the ligand molecule into two parts, namely arms and scaffold, and propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold. In order to facilitate the decomposed generation and improve the properties of the generated molecules, we incorporate both bond diffusion in the model and additional validity guidance in the sampling phase. Extensive experiments on CrossDocked2020 show that our approach achieves state-of-the-art performance in generating high-affinity molecules while maintaining proper molecular properties and conformational stability, with up to $-8.39$ Avg. Vina Dock score and $24.5%$ Success Rate. The code is provided at https://github.com/bytedance/DecompDiff'
volume: 202
URL: https://proceedings.mlr.press/v202/guan23a.html
PDF: https://proceedings.mlr.press/v202/guan23a/guan23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-guan23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Jiaqi
family: Guan
- given: Xiangxin
family: Zhou
- given: Yuwei
family: Yang
- given: Yu
family: Bao
- given: Jian
family: Peng
- given: Jianzhu
family: Ma
- given: Qiang
family: Liu
- given: Liang
family: Wang
- given: Quanquan
family: Gu
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11827-11846
id: guan23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11827
lastpage: 11846
published: 2023-07-03 00:00:00 +0000
- title: 'On Excess Mass Behavior in Gaussian Mixture Models with Orlicz-Wasserstein Distances'
abstract: 'Dirichlet Process mixture models (DPMM) in combination with Gaussian kernels have been an important modeling tool for numerous data domains arising from biological, physical, and social sciences. However, this versatility in applications does not extend to strong theoretical guarantees for the underlying parameter estimates, for which only a logarithmic rate is achieved. In this work, we (re)introduce and investigate a metric, named Orlicz-Wasserstein distance, in the study of the Bayesian contraction behavior for the parameters. We show that despite the overall slow convergence guarantees for all the parameters, posterior contraction for parameters happens at almost polynomial rates in outlier regions of the parameter space. Our theoretical results provide new insight in understanding the convergence behavior of parameters arising from various settings of hierarchical Bayesian nonparametric models. In addition, we provide an algorithm to compute the metric by leveraging Sinkhorn divergences and validate our findings through a simulation study.'
volume: 202
URL: https://proceedings.mlr.press/v202/guha23a.html
PDF: https://proceedings.mlr.press/v202/guha23a/guha23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-guha23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Aritra
family: Guha
- given: Nhat
family: Ho
- given: Xuanlong
family: Nguyen
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11847-11870
id: guha23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11847
lastpage: 11870
published: 2023-07-03 00:00:00 +0000
- title: 'Conformalization of Sparse Generalized Linear Models'
abstract: 'Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.'
volume: 202
URL: https://proceedings.mlr.press/v202/guha23b.html
PDF: https://proceedings.mlr.press/v202/guha23b/guha23b.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-guha23b.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Etash Kumar
family: Guha
- given: Eugene
family: Ndiaye
- given: Xiaoming
family: Huo
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11871-11887
id: guha23b
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11871
lastpage: 11887
published: 2023-07-03 00:00:00 +0000
- title: 'Privacy-Aware Compression for Federated Learning Through Numerical Mechanism Design'
abstract: 'In private federated learning (FL), a server aggregates differentially private updates from a large number of clients in order to train a machine learning model. The main challenge in this setting is balancing privacy with both classification accuracy of the learnt model as well as the number of bits communicated between the clients and server. Prior work has achieved a good trade-off by designing a privacy-aware compression mechanism, called the minimum variance unbiased (MVU) mechanism, that numerically solves an optimization problem to determine the parameters of the mechanism. This paper builds upon it by introducing a new interpolation procedure in the numerical design process that allows for a far more efficient privacy analysis. The result is the new Interpolated MVU mechanism that is more scalable, has a better privacy-utility trade-off, and provides SOTA results on communication-efficient private FL on a variety of datasets.'
volume: 202
URL: https://proceedings.mlr.press/v202/guo23a.html
PDF: https://proceedings.mlr.press/v202/guo23a/guo23a.pdf
edit: https://github.com/mlresearch//v202/edit/gh-pages/_posts/2023-07-03-guo23a.md
series: 'Proceedings of Machine Learning Research'
container-title: 'Proceedings of the 40th International Conference on Machine Learning'
publisher: 'PMLR'
author:
- given: Chuan
family: Guo
- given: Kamalika
family: Chaudhuri
- given: Pierre
family: Stock
- given: Michael
family: Rabbat
editor:
- given: Andreas
family: Krause
- given: Emma
family: Brunskill
- given: Kyunghyun
family: Cho
- given: Barbara
family: Engelhardt
- given: Sivan
family: Sabato
- given: Jonathan
family: Scarlett
page: 11888-11904
id: guo23a
issued:
date-parts:
- 2023
- 7
- 3
firstpage: 11888
lastpage: 11904
published: 2023-07-03 00:00:00 +0000
- title: 'Out-of-Distribution Generalization of Federated Learning via Implicit Invariant Relationships'
abstract: 'Out-of-distribution generalization is challenging for non-participating clients of federated learning under distribution shifts. A proven strategy is to explore those invariant relationships between input and target variables, working equally well for non-participating clients. However, learning invariant relationships is often in an explicit manner from data, representation, and distribution, which violates the federated principles of privacy-preserving and limited communication. In this paper, we propose FedIIR, which implicitly learns invariant relationships from parameter for out-of-distribution generalization, adhering to the above principles. Specifically, we utilize the prediction disagreement to quantify invariant relationships and implicitly reduce it through inter-cl